top of page

THE BIG BANG - The rapid progress of AI image + video generation (2020 → early 2026): what changed, who’s winning, and what’s next

AI image and video generation has gone from “cool but obviously fake” to “useful in real creative pipelines”—and in some cases, genuinely hard to distinguish from traditional production at a glance. The biggest shifts since ~2020 aren’t just “higher resolution.” They’re about consistency, controllability, speed, and workflow integration—plus the business reality: compute costs, licensing/IP risk, and distribution.

Below is a deep (and intentionally wide) tour of the space—major platforms, examples you can try, notable stats, company valuations / stock prices, and practical predictions for where this goes next.

video and image generatio
video and image generatio

1) The core breakthroughs that made modern generative media possible (image video generation)

1.1 Diffusion → DiT → “world-model-ish” video

Early diffusion models made images explode in quality because they could learn a strong “denoise-to-image” prior. Video then lagged because time adds brutal complexity:

  • You need temporal coherence (objects don’t “melt” between frames).

  • You need identity persistence (the same character remains the same).

  • You need camera + physics consistency (lighting, shadows, motion rules).

  • You need editing/iteration controls (so creators can direct outcomes).

Many top video systems now lean on architectures like diffusion transformers (DiT) and increasingly talk about “world simulators / world models”—Runway explicitly frames Gen-3/Gen-4 as steps toward “General World Models.” OpenAI similarly frames Sora as learning to “understand and simulate the physical world in motion,” generating up to a minute.

1.2 The “control revolution”: from prompt-only to multi-handle direction

Modern tools add “handles” beyond text:

  • Image-to-video (animate a keyframe / storyboard)

  • Video-to-video (stylize or transform footage)

  • Masks / inpainting (edit only parts)

  • Reference inputs (character/object references)

  • Camera + shot instructions (lens, dolly, pan, rack focus)

  • Multi-shot / scene consistency (the big unlock for narrative work)

Runway Gen-4 emphasizes consistent characters/locations/objects “across scenes.” Google’s Veo line emphasizes cinematic realism and is being productized into Gemini/Vertex, including tooling like VideoFX/Whisk/Whisk Animate.

1.3 Distribution is half the battle

A model’s raw quality matters—but who can ship it to millions matters just as much:

  • Adobe embeds generation inside Photoshop/After Effects workflows.

  • Google embeds Veo inside Gemini and the developer ecosystem (Vertex/Gemini API).

  • Runway and Pika go direct-to-creators with “social-first” creation UX.

  • Open-source models spread via ComfyUI, Automatic1111, and API hosts.

2) Market map: the “big buckets” of platforms (video and image generation)

Bucket A — Creator-first “AI video studios”

  • Runway (Gen-2 → Gen-3 Alpha → Gen-4)

  • Pika

  • Luma Dream Machine (video + 3D-ish roots)

  • Kling AI (Kuaishou; massive consumer video DNA)

  • Others (regional/vertical tools, mobile-first editors)

Bucket B — Frontier labs shipping flagship generators

  • OpenAI (Sora)

  • Google DeepMind (Veo family)

  • Meta (research + product features across apps; not a single “Veo-like” standalone that dominates, but huge distribution)

  • Stability AI (open-ish ecosystem; image leadership historically)

Bucket C — Image-first leaders + “the image economy”

  • Midjourney

  • Adobe Firefly

  • Stable Diffusion ecosystem

  • Flux (Black Forest Labs)

  • Ideogram (especially strong at text/logos/typography-style generations)

  • Many “wrapper” products (Canva, design suites, marketing tools)

video and image generatio

3) The stats that matter (adoption + content volume)

3.1 AI images have already reached “internet-scale”

A widely cited estimate (Everypixel) suggests 15+ billion AI-generated images across major platforms (Stable Diffusion, Firefly, Midjourney, DALL·E-2) by mid-2023. That number is surely much higher by 2026, but even that earlier figure shows the inflection: generative media is not niche anymore—it’s a content substrate.

3.2 Adobe Firefly: billions of generations as a product KPI

Adobe stated that users generated 6.5+ billion images with Firefly since launch (as of April 2024). Adobe also reiterated “over 7 billion” images created with Firefly by April 2024 (MAX London comms).

3.3 Video is behind images in volume—but catching up fast

Video generation is much more compute-expensive, so it tends to monetize sooner (credits/subscriptions) and the volumes are constrained by cost and runtime limits (e.g., 8 seconds vs 60 seconds). Google’s Veo 2 rollout to Gemini Advanced centered on short clips (e.g., 8 seconds) and distribution via Gemini.

4) Platform deep dives: video generation (with lots of examples)

4.1 OpenAI Sora (frontier “text-to-video”, world-simulation framing)

What it is

Sora is OpenAI’s text-to-video model introduced publicly in Feb 2024, positioned as generating videos up to a minute while maintaining adherence and visual quality.

What Sora represents (even if you never use it)

Sora’s impact was partly “product” and partly “signal”:

  • Signal to the market that minute-long coherence is feasible.

  • Signal that “world model” language is moving from research to product narrative.

Example prompts (practical + cinematic)

Use prompts like these to test a model’s physics, continuity, and cinematography:

  1. Character continuity + reflections

“A woman on a train at night; her reflection overlays neon signage outside. Slow push-in. Realistic skin texture, subtle motion blur, consistent facial features.”
  1. Complex motion + environment

“A golden retriever running across wet sand at sunrise; water droplets fly; camera tracks low and smooth; realistic paw prints; consistent lighting.”
  1. Multi-object interaction

“A chef flips vegetables in a wok; steam rises; flames flicker; the chef’s hands remain anatomically stable; close-up macro lens look.”
  1. Temporal logic

“A paper airplane folds itself from a blank sheet on a desk and launches out a window; continuous uncut shot; accurate shadows.”

Where Sora-level systems still struggle (industry-wide)

  • Hands, small text, brand logos (improving, still brittle)

  • Long-horizon story logic (characters remember what happened)

  • Precise edits (“change only the jacket color, keep everything else identical”)

  • Audio sync (some systems now add audio; see Veo 3 line notes)

Business notes: OpenAI valuation context

OpenAI reportedly hit ~$500B valuation after a major secondary share sale (Oct 2025). Reuters also reported OpenAI earmarked an employee stock grant pool based on that $500B valuation, alongside talk of potential higher valuations in preliminary discussions. (There is no public stock ticker for OpenAI as of early 2026.)

video and image generatio

4.2 Google Veo (DeepMind): productizing video generation into Gemini + APIs

What it is

Google’s Veo is DeepMind’s flagship video model line, continuously updated and integrated into:

  • Gemini consumer subscriptions (Veo 2 rollout)

  • Vertex AI / Google Cloud for enterprise and developers

  • Google Labs tools (VideoFX / ImageFX / Whisk / Whisk Animate) DeepMind’s public model page references Veo 3.1, including “Video, meet audio” messaging.

The “production detail” that matters: cost + formats + watermarking

Google’s Veo ecosystem has pushed on:

  • SynthID watermarking (to mark AI video) noted in Veo 2 rollout coverage

  • Developer-friendly options like aspect ratios; Google added vertical 9:16 support and discussed price changes in Veo 3 API context.

Example prompts tailored for Veo-style systems

  1. Ad creative (short, punchy, product-friendly)

“A 7-second vertical video: a runner ties neon shoes; close-up; quick cuts; high-contrast lighting; end frame: empty space on right for text overlay.”
  1. Cinematic test (lens + motion)

“35mm film look, handheld camera, shallow depth of field. A cyclist rides through fog, headlights bloom, gentle rack focus from handlebars to face.”
  1. Physics stress test

“A glass of ice water on a wooden table; condensation forms; a hand slides the glass; ice shifts realistically; sunlight refracts.”

Business notes (public stock)

Alphabet is public. As of the latest market snapshot returned here: GOOGL ~$328.57 with market cap ~$2.94T.

4.3 Runway (the “creator studio” that matured fastest)

Why Runway matters

Runway isn’t just a model; it’s a workflow product. It’s one of the clearest examples of “AI video generation” becoming a real tool for creators.

Runway’s research posts:

  • Gen-3 Alpha (June 2024) positioned as next-gen multimodal foundation model, major improvements in fidelity/consistency/motion.

  • Gen-4 positioned around consistent characters/locations/objects across scenes (a major unlock for narrative + campaigns).

Example prompts for Runway-style workflows

Runway users often get the best results when they specify:

  • Shot type + camera

  • Subject + action

  • Environment + lighting

  • Style references (without naming living artists if ToS prohibits)

  • Duration intent (“8 seconds”, “loopable”, “single shot”)

Examples

  1. Music video shot

“Slow-motion close-up of glitter floating in air around a singer’s face, stage lighting, bokeh, 85mm lens look, gentle camera sway.”
  1. Brand spot

“Minimal studio background, a soda can rotates on a turntable, softbox reflections, product highlights clean and consistent, premium commercial look.”
  1. World continuity (Gen-4 style test)

“Same character in three shots: (1) enters a diner, (2) sits by window, (3) walks out into rain; maintain same outfit and face; cinematic lighting.”

Valuation / funding

Reuters reported Runway raised $308M in a round that valued it at over $3B (Runway declined to comment on valuation). (Private company: no stock ticker.)

4.4 Pika (fast iteration, social-first creation)

What Pika is known for

Pika positioned itself as “social-first” creation—quickly generating stylized clips, meme formats, transformations, and short-form content.

Funding/valuation signals

Depending on source/round framing, Pika has been reported around $470M valuation after its Series B and also discussed at higher figures (reports vary by outlet).

Examples: the kinds of prompts Pika-style tools excel at

  1. Meme transformation

“Turn my photo into a claymation character, blinking and smiling, soft studio lighting, 6 seconds.”
  1. Stylized mini-scene

“A tiny robot watering a houseplant in a cozy apartment, pastel anime style, gentle camera pan.”
  1. Hook-first vertical

“A close-up of a face reacting in surprise; quick zoom; sparkles burst; text-safe area at top.”
video and image generatio

4.5 Luma Dream Machine (and the “3D-aware” creative lineage)

Why Luma is different

Luma’s roots in 3D capture / neural rendering culture shaped expectations: users want models that “feel spatial,” with more believable camera motion and structure.

Big funding headline

Luma announced a $900M Series C (Nov 2025) to accelerate its roadmap, tied to massive compute infrastructure ambitions. (Valuation in public reporting varies and isn’t consistently disclosed in primary sources; treat third-party valuation chatter as directional unless confirmed.)

Example prompts for Luma-style “space and camera” tests

  1. Interior camera move

“A smooth dolly shot through a sunlit kitchen into a living room, realistic shadows, wide-angle lens, subtle dust in the air.”
  1. Outdoor parallax

“Walking through a market at dusk, lanterns glowing, shallow depth of field foreground, strong parallax as camera passes stalls.”

4.6 Kling AI (Kuaishou): “video-native company ships video-native AI”

Why Kling matters

Kuaishou is fundamentally a short-video ecosystem company. That shapes:

  • Data intuition (what people watch/share)

  • Product UX (mobile creation loops)

  • Distribution channels

Kuaishou’s own investor relations materials highlight Kling AI model iteration (e.g., Kling 2.0 rollout, “global users,” and performance framing).

Stock + market effect

Kuaishou is public (Hong Kong: 1024). For a concrete snapshot: Investing.com showed ~74.70 HKD on Jan 9, 2026. (Always re-check live quotes if you’re trading—HK prices move daily.)

Example prompts for Kling-style “short-video engine” tests

  1. Creator template

“Vertical 9:16, 6 seconds: a streetwear outfit reveal, quick cuts, cinematic lighting, smooth stabilization.”
  1. Stylized effect

“A city street turns into watercolor paint as the camera moves forward; buildings melt into brush strokes; seamless transition.”

5) Platform deep dives: image generation (and why it’s still the bigger market)

5.1 Adobe Firefly (enterprise-safe positioning + workflow dominance)

Why Firefly matters

Adobe sells not just generation—but where generation lives: Photoshop, Illustrator, After Effects, Express, etc. That is distribution and retention.

Adoption stat (from Adobe)

Adobe said people generated 6.5+ billion images since Firefly’s introduction (as of April 2024).

Practical examples you can use immediately

  1. Product photography cleanup

“Remove background, replace with clean white seamless, keep soft shadow under product, maintain label sharpness.”
  1. Lifestyle ad set

“A bright kitchen, morning sunlight, steaming mug on wooden table, clean minimal look, space for headline on left.”
  1. Brand-safe variations

“Generate 10 colorways of the same composition: teal, coral, monochrome, warm neutral; keep framing identical.”

Public stock snapshot

Adobe (ADBE) last snapshot here: ~$333.95.

video and image generatio

5.2 Midjourney (quality king, high IP pressure, massive community)

Why Midjourney still wins mindshare

Midjourney consistently leads on:

  • Aesthetics “out of the box”

  • Style richness

  • Fast iteration for concept art / mood boards

Third-party research profiles have cited huge Discord-scale community membership. But Midjourney also faces legal and IP pressure: for instance, Warner Bros. Discovery filed a lawsuit accusing Midjourney of copyright infringement related to training and outputs (per coverage).

Example prompts that show Midjourney-style strengths

  1. Concept art

“A bioluminescent forest with giant mushrooms, mist, cinematic lighting, ultra-detailed, wide shot.”
  1. Fashion editorial

“Studio portrait, high-fashion styling, dramatic shadow, soft film grain, editorial composition.”
  1. Packaging ideation

“Minimalist coffee bag packaging, modern typography, matte finish mockup, product photo realism.”

Business notes

Midjourney is private and historically self-funded (no public ticker). Revenue/user figures often come from estimates/secondary analysis; treat them as approximate unless audited.

5.3 Stable Diffusion ecosystem (open tooling, infinite customization)

Why open ecosystems matter

Stable Diffusion-like ecosystems win on:

  • Fine-tuning / LoRAs

  • ControlNet / pose/depth controls

  • ComfyUI pipelines

  • Private on-device / on-prem workflows (IP concerns)

This “ecosystem moat” is hard for closed platforms to replicate because creators build reusable graphs/pipelines and share components.

Examples: what open pipelines can do exceptionally well

  1. Character sheet consistency

  2. Use a consistent LoRA + reference image + fixed seed strategies to generate turnarounds and expression sheets.

  3. Product placement

  4. Inpaint a product into scenes with controlled lighting and perspective.

  5. Storyboards

  6. Generate consistent frames for a short sequence, then animate with an image-to-video model.

5.4 Flux (Black Forest Labs): the “new power lab” in images

Black Forest Labs is strongly associated with the Flux family of foundation models. It attracted major funding: TechCrunch reported a $300M Series B at a $3.25B valuation (Dec 2025). That’s a major “signal” that top-tier image generation is becoming infrastructure—not just a novelty.

(Private company: no stock ticker.)

5.5 Ideogram (text + typography: the underrated battleground)

Ideogram drew attention for strong text rendering (logos, poster text, design-like outputs) and raised $80M Series A (reported widely).

Examples where Ideogram-type models shine

  1. Poster

“A concert poster with readable headline ‘NEON NIGHT’, subtext ‘Friday 10PM’, synthwave design, clean layout.”
  1. Logo exploration

“Minimal geometric logo for ‘Aurora Coffee’, readable wordmark, vector-like simplicity, black and white set.”
video and image generatio

6) Company value, stock prices, and “who captures the money”

6.1 The public-market winners are mostly “picks and shovels”

Even if you believe the best generator wins, compute still dominates economics:

  • NVIDIA (NVDA) is a core beneficiary of generative workloads.

  • Microsoft (MSFT) benefits from cloud + OpenAI distribution.

  • Alphabet (GOOGL) benefits from Gemini/Vertex + consumer distribution.

  • Meta (META) benefits from distribution + ad products.

  • Adobe (ADBE) benefits from creative workflow lock-in.

Here are the latest snapshots returned in this session:

  • NVDA ~$184.86

  • MSFT ~$479.28

  • GOOGL ~$328.57

  • META ~$653.06

  • ADBE ~$333.95

And a visual chart for one of the most “AI-exposed” infrastructure beneficiaries:

NVIDIA Corp (NVDA)

$184.86

-$0.15(-0.08%)January 9

$184.98+$0.12(+0.06%)After Hours

1D5D1M6MYTD1Y5Ymax

Open185.02

Volume131.3M

Day Low183.69

Day High186.97

Year Low86.62

Year High212.19

6.2 Private valuations: the new unicorn stack

  • OpenAI ~ $500B valuation (reported) 

  • Runway valued at over $3B in 2025 funding context (reported) 

  • Black Forest Labs $3.25B valuation (reported) 

  • Pika reported around $470M valuation after Series B (varies by source) 

  • Kuaishou (public) benefiting from Kling narrative + consumer AI video with price snapshot around 74.70 HKD (Jan 9, 2026) 

7) Practical “wide examples” of real use cases by industry

7.1 Marketing & ads

What’s changed: A/B creative testing is now limited more by brand policy than production.

  • Generate 20 variants of a product hero shot (backgrounds, moods)

  • Generate 10 micro-video hooks for TikTok/Reels

  • Localize assets with language/region variants (and keep layout consistent)

Best-fit platforms:

  • Firefly for “workflow + enterprise comfort”

  • Runway/Kling/Pika for short-form performance creative

  • Veo for developer-driven pipelines and scale

7.2 Film previsualization

What’s changed: directors can prototype shots without a full crew.

  • Mood boards (Midjourney/Flux)

  • Storyboards (Stable ecosystem)

  • Animatics (Runway/Veo/Luma)

7.3 Game dev & worldbuilding

  • Environment concepts

  • Item sheets

  • NPC portrait sets

  • “Style bible” exploration

7.4 E-commerce

  • On-model fashion shots from a base mannequin photo

  • Background replacement for marketplaces

  • Seasonal variations without reshoots

8) The biggest remaining problems (and how platforms are tackling them)

8.1 Copyright, licensing, and training-data disputes

This is the #1 business constraint for many enterprises. Litigation pressure (e.g., major studios suing generators) is becoming part of the landscape. Expect more:

  • licensed training datasets,

  • provenance tooling,

  • “style similarity” guardrails,

  • watermarking and detection.

8.2 Truth, fraud, and provenance

As video realism rises, provenance becomes essential. Google’s SynthID watermarking in Veo 2 rollout is one example of mainstream watermarking attempts.

8.3 Consistency and editability

The market is converging on:

  • scene consistency (Runway Gen-4 positioning)

  • prompt adherence

  • reference-based generation

  • layered editing (like Photoshop, but for video)

9) Predictions (2026–2030): where this is headed

9.1 Video becomes “editable media,” not just “generated clips”

The winning tools will behave less like slot machines and more like editors:

  • timeline-aware generation

  • object-level layers (“select the car → change color”)

  • re-lighting and re-camera after generation

9.2 “World models” become the marketing label for long-horizon coherence

You’ll see more claims like:

  • “persistent worlds”

  • “story continuity”

  • “multi-shot narrative memory”Runway and OpenAI already market toward this idea.

9.3 Audio + lip sync becomes standard

Veo’s model page explicitly pushes “Video, meet audio.” By 2030, silent generation will feel like “image-only” did after people expected quick animation.

9.4 The money splits three ways

  1. Compute + cloud (NVDA/MSFT/GOOGL)

  2. Workflow incumbents (ADBE)

  3. Consumer distribution giants (META/ByteDance/Kuaishou-type ecosystems)

10) Scorecards (1–10) across major categories

Scoring categories

  • Quality (visual fidelity)

  • Consistency (identity + temporal stability)

  • Control (editability + references + constraints)

  • Speed (time-to-output)

  • Cost efficiency (value per dollar)

  • Workflow integration (real creator pipelines)

  • Enterprise readiness (governance, licensing posture)

  • Ecosystem (community, plugins, interoperability)

  • Innovation velocity (how fast it improves)

Note: Scores are based on publicly described capabilities, product direction, and adoption signals—not internal benchmarks. The goal is comparative usefulness, not a “universal truth.”

10.1 Video platforms

OpenAI Sora

  • Quality: 10

  • Consistency: 9

  • Control: 7

  • Speed: 6

  • Cost efficiency: 6

  • Workflow integration: 6

  • Enterprise readiness: 7

  • Ecosystem: 7

  • Innovation velocity: 10Why: frontier output + world-sim framing; distribution and hands-on controls depend on product packaging.

Google Veo (Veo 2/3 line)

  • Quality: 9

  • Consistency: 8

  • Control: 8

  • Speed: 7

  • Cost efficiency: 7 (pricing and formats evolving)

  • Workflow integration: 8 (Gemini + Vertex)

  • Enterprise readiness: 9

  • Ecosystem: 8

  • Innovation velocity: 9

Runway (Gen-3/Gen-4)

  • Quality: 9

  • Consistency: 9 (explicit multi-scene positioning)

  • Control: 9

  • Speed: 8

  • Cost efficiency: 7

  • Workflow integration: 9

  • Enterprise readiness: 7

  • Ecosystem: 8

  • Innovation velocity: 9

Kling AI (Kuaishou)

  • Quality: 8

  • Consistency: 8

  • Control: 7

  • Speed: 9

  • Cost efficiency: 8

  • Workflow integration: 8 (consumer creation loop)

  • Enterprise readiness: 6

  • Ecosystem: 7

  • Innovation velocity: 8 

Pika

  • Quality: 7

  • Consistency: 7

  • Control: 7

  • Speed: 9

  • Cost efficiency: 8

  • Workflow integration: 7

  • Enterprise readiness: 6

  • Ecosystem: 7

  • Innovation velocity: 8 

Luma Dream Machine

  • Quality: 8

  • Consistency: 8

  • Control: 8

  • Speed: 7

  • Cost efficiency: 7

  • Workflow integration: 7

  • Enterprise readiness: 6

  • Ecosystem: 7

  • Innovation velocity: 8 

10.2 Image platforms

Adobe Firefly

  • Quality: 8

  • Consistency: 8

  • Control: 9

  • Speed: 9

  • Cost efficiency: 8

  • Workflow integration: 10

  • Enterprise readiness: 10

  • Ecosystem: 9

  • Innovation velocity: 8 

Midjourney

  • Quality: 10

  • Consistency: 8

  • Control: 7

  • Speed: 8

  • Cost efficiency: 8

  • Workflow integration: 6

  • Enterprise readiness: 5 (IP pressure is a factor in enterprise decisions)

  • Ecosystem: 9

  • Innovation velocity: 7

Stable Diffusion ecosystem

  • Quality: 8

  • Consistency: 8

  • Control: 10

  • Speed: 7

  • Cost efficiency: 9 (depending on hardware/self-hosting)

  • Workflow integration: 8

  • Enterprise readiness: 7 (varies by vendor/hosting/licensing)

  • Ecosystem: 10

  • Innovation velocity: 9

Flux (Black Forest Labs)

  • Quality: 9

  • Consistency: 8

  • Control: 8

  • Speed: 8

  • Cost efficiency: 7

  • Workflow integration: 7

  • Enterprise readiness: 7

  • Ecosystem: 7

  • Innovation velocity: 9 

Ideogram

  • Quality: 8

  • Consistency: 7

  • Control: 8

  • Speed: 8

  • Cost efficiency: 8

  • Workflow integration: 7

  • Enterprise readiness: 6

  • Ecosystem: 7

  • Innovation velocity: 8 

11) Quick “who should use what” cheat sheet

  • If you want the best “creative pipeline” video tool today: Runway

  • If you want frontier “world-sim” vibes / long coherent clips: Sora-class systems

  • If you want enterprise-scale + API integration: Veo via Vertex/Gemini ecosystem

  • If you want ad/marketing workflows: Firefly + Adobe suite

  • If you want “instant beautiful images” without tweaking: Midjourney

  • If you want maximum control and customization: Stable Diffusion ecosystem

  • If you want typography/posters/logos with readable text: Ideogram

lord_of_the_wix

© 2025 BY LORD OF THE WIX

©
bottom of page