THE BIG BANG - The rapid progress of AI image + video generation (2020 → early 2026): what changed, who’s winning, and what’s next
- Lord Of The Wix
- Jan 12
- 12 min read
AI image and video generation has gone from “cool but obviously fake” to “useful in real creative pipelines”—and in some cases, genuinely hard to distinguish from traditional production at a glance. The biggest shifts since ~2020 aren’t just “higher resolution.” They’re about consistency, controllability, speed, and workflow integration—plus the business reality: compute costs, licensing/IP risk, and distribution.
Below is a deep (and intentionally wide) tour of the space—major platforms, examples you can try, notable stats, company valuations / stock prices, and practical predictions for where this goes next.

1) The core breakthroughs that made modern generative media possible (image video generation)
1.1 Diffusion → DiT → “world-model-ish” video
Early diffusion models made images explode in quality because they could learn a strong “denoise-to-image” prior. Video then lagged because time adds brutal complexity:
You need temporal coherence (objects don’t “melt” between frames).
You need identity persistence (the same character remains the same).
You need camera + physics consistency (lighting, shadows, motion rules).
You need editing/iteration controls (so creators can direct outcomes).
Many top video systems now lean on architectures like diffusion transformers (DiT) and increasingly talk about “world simulators / world models”—Runway explicitly frames Gen-3/Gen-4 as steps toward “General World Models.” OpenAI similarly frames Sora as learning to “understand and simulate the physical world in motion,” generating up to a minute.
1.2 The “control revolution”: from prompt-only to multi-handle direction
Modern tools add “handles” beyond text:
Image-to-video (animate a keyframe / storyboard)
Video-to-video (stylize or transform footage)
Masks / inpainting (edit only parts)
Reference inputs (character/object references)
Camera + shot instructions (lens, dolly, pan, rack focus)
Multi-shot / scene consistency (the big unlock for narrative work)
Runway Gen-4 emphasizes consistent characters/locations/objects “across scenes.” Google’s Veo line emphasizes cinematic realism and is being productized into Gemini/Vertex, including tooling like VideoFX/Whisk/Whisk Animate.
1.3 Distribution is half the battle
A model’s raw quality matters—but who can ship it to millions matters just as much:
Adobe embeds generation inside Photoshop/After Effects workflows.
Google embeds Veo inside Gemini and the developer ecosystem (Vertex/Gemini API).
Runway and Pika go direct-to-creators with “social-first” creation UX.
Open-source models spread via ComfyUI, Automatic1111, and API hosts.
2) Market map: the “big buckets” of platforms (video and image generation)
Bucket A — Creator-first “AI video studios”
Runway (Gen-2 → Gen-3 Alpha → Gen-4)
Pika
Luma Dream Machine (video + 3D-ish roots)
Kling AI (Kuaishou; massive consumer video DNA)
Others (regional/vertical tools, mobile-first editors)
Bucket B — Frontier labs shipping flagship generators
OpenAI (Sora)
Google DeepMind (Veo family)
Meta (research + product features across apps; not a single “Veo-like” standalone that dominates, but huge distribution)
Stability AI (open-ish ecosystem; image leadership historically)
Bucket C — Image-first leaders + “the image economy”
Midjourney
Adobe Firefly
Stable Diffusion ecosystem
Flux (Black Forest Labs)
Ideogram (especially strong at text/logos/typography-style generations)
Many “wrapper” products (Canva, design suites, marketing tools)

3) The stats that matter (adoption + content volume)
3.1 AI images have already reached “internet-scale”
A widely cited estimate (Everypixel) suggests 15+ billion AI-generated images across major platforms (Stable Diffusion, Firefly, Midjourney, DALL·E-2) by mid-2023. That number is surely much higher by 2026, but even that earlier figure shows the inflection: generative media is not niche anymore—it’s a content substrate.
3.2 Adobe Firefly: billions of generations as a product KPI
Adobe stated that users generated 6.5+ billion images with Firefly since launch (as of April 2024). Adobe also reiterated “over 7 billion” images created with Firefly by April 2024 (MAX London comms).
3.3 Video is behind images in volume—but catching up fast
Video generation is much more compute-expensive, so it tends to monetize sooner (credits/subscriptions) and the volumes are constrained by cost and runtime limits (e.g., 8 seconds vs 60 seconds). Google’s Veo 2 rollout to Gemini Advanced centered on short clips (e.g., 8 seconds) and distribution via Gemini.
4) Platform deep dives: video generation (with lots of examples)
4.1 OpenAI Sora (frontier “text-to-video”, world-simulation framing)
What it is
Sora is OpenAI’s text-to-video model introduced publicly in Feb 2024, positioned as generating videos up to a minute while maintaining adherence and visual quality.
What Sora represents (even if you never use it)
Sora’s impact was partly “product” and partly “signal”:
Signal to the market that minute-long coherence is feasible.
Signal that “world model” language is moving from research to product narrative.
Example prompts (practical + cinematic)
Use prompts like these to test a model’s physics, continuity, and cinematography:
Character continuity + reflections
“A woman on a train at night; her reflection overlays neon signage outside. Slow push-in. Realistic skin texture, subtle motion blur, consistent facial features.”
Complex motion + environment
“A golden retriever running across wet sand at sunrise; water droplets fly; camera tracks low and smooth; realistic paw prints; consistent lighting.”
Multi-object interaction
“A chef flips vegetables in a wok; steam rises; flames flicker; the chef’s hands remain anatomically stable; close-up macro lens look.”
Temporal logic
“A paper airplane folds itself from a blank sheet on a desk and launches out a window; continuous uncut shot; accurate shadows.”
Where Sora-level systems still struggle (industry-wide)
Hands, small text, brand logos (improving, still brittle)
Long-horizon story logic (characters remember what happened)
Precise edits (“change only the jacket color, keep everything else identical”)
Audio sync (some systems now add audio; see Veo 3 line notes)
Business notes: OpenAI valuation context
OpenAI reportedly hit ~$500B valuation after a major secondary share sale (Oct 2025). Reuters also reported OpenAI earmarked an employee stock grant pool based on that $500B valuation, alongside talk of potential higher valuations in preliminary discussions. (There is no public stock ticker for OpenAI as of early 2026.)

4.2 Google Veo (DeepMind): productizing video generation into Gemini + APIs
What it is
Google’s Veo is DeepMind’s flagship video model line, continuously updated and integrated into:
Gemini consumer subscriptions (Veo 2 rollout)
Vertex AI / Google Cloud for enterprise and developers
Google Labs tools (VideoFX / ImageFX / Whisk / Whisk Animate) DeepMind’s public model page references Veo 3.1, including “Video, meet audio” messaging.
The “production detail” that matters: cost + formats + watermarking
Google’s Veo ecosystem has pushed on:
SynthID watermarking (to mark AI video) noted in Veo 2 rollout coverage
Developer-friendly options like aspect ratios; Google added vertical 9:16 support and discussed price changes in Veo 3 API context.
Example prompts tailored for Veo-style systems
Ad creative (short, punchy, product-friendly)
“A 7-second vertical video: a runner ties neon shoes; close-up; quick cuts; high-contrast lighting; end frame: empty space on right for text overlay.”
Cinematic test (lens + motion)
“35mm film look, handheld camera, shallow depth of field. A cyclist rides through fog, headlights bloom, gentle rack focus from handlebars to face.”
Physics stress test
“A glass of ice water on a wooden table; condensation forms; a hand slides the glass; ice shifts realistically; sunlight refracts.”
Business notes (public stock)
Alphabet is public. As of the latest market snapshot returned here: GOOGL ~$328.57 with market cap ~$2.94T.
4.3 Runway (the “creator studio” that matured fastest)
Why Runway matters
Runway isn’t just a model; it’s a workflow product. It’s one of the clearest examples of “AI video generation” becoming a real tool for creators.
Runway’s research posts:
Gen-3 Alpha (June 2024) positioned as next-gen multimodal foundation model, major improvements in fidelity/consistency/motion.
Gen-4 positioned around consistent characters/locations/objects across scenes (a major unlock for narrative + campaigns).
Example prompts for Runway-style workflows
Runway users often get the best results when they specify:
Shot type + camera
Subject + action
Environment + lighting
Style references (without naming living artists if ToS prohibits)
Duration intent (“8 seconds”, “loopable”, “single shot”)
Examples
Music video shot
“Slow-motion close-up of glitter floating in air around a singer’s face, stage lighting, bokeh, 85mm lens look, gentle camera sway.”
Brand spot
“Minimal studio background, a soda can rotates on a turntable, softbox reflections, product highlights clean and consistent, premium commercial look.”
World continuity (Gen-4 style test)
“Same character in three shots: (1) enters a diner, (2) sits by window, (3) walks out into rain; maintain same outfit and face; cinematic lighting.”
Valuation / funding
Reuters reported Runway raised $308M in a round that valued it at over $3B (Runway declined to comment on valuation). (Private company: no stock ticker.)
4.4 Pika (fast iteration, social-first creation)
What Pika is known for
Pika positioned itself as “social-first” creation—quickly generating stylized clips, meme formats, transformations, and short-form content.
Funding/valuation signals
Depending on source/round framing, Pika has been reported around $470M valuation after its Series B and also discussed at higher figures (reports vary by outlet).
Examples: the kinds of prompts Pika-style tools excel at
Meme transformation
“Turn my photo into a claymation character, blinking and smiling, soft studio lighting, 6 seconds.”
Stylized mini-scene
“A tiny robot watering a houseplant in a cozy apartment, pastel anime style, gentle camera pan.”
Hook-first vertical
“A close-up of a face reacting in surprise; quick zoom; sparkles burst; text-safe area at top.”

4.5 Luma Dream Machine (and the “3D-aware” creative lineage)
Why Luma is different
Luma’s roots in 3D capture / neural rendering culture shaped expectations: users want models that “feel spatial,” with more believable camera motion and structure.
Big funding headline
Luma announced a $900M Series C (Nov 2025) to accelerate its roadmap, tied to massive compute infrastructure ambitions. (Valuation in public reporting varies and isn’t consistently disclosed in primary sources; treat third-party valuation chatter as directional unless confirmed.)
Example prompts for Luma-style “space and camera” tests
Interior camera move
“A smooth dolly shot through a sunlit kitchen into a living room, realistic shadows, wide-angle lens, subtle dust in the air.”
Outdoor parallax
“Walking through a market at dusk, lanterns glowing, shallow depth of field foreground, strong parallax as camera passes stalls.”
4.6 Kling AI (Kuaishou): “video-native company ships video-native AI”
Why Kling matters
Kuaishou is fundamentally a short-video ecosystem company. That shapes:
Data intuition (what people watch/share)
Product UX (mobile creation loops)
Distribution channels
Kuaishou’s own investor relations materials highlight Kling AI model iteration (e.g., Kling 2.0 rollout, “global users,” and performance framing).
Stock + market effect
Kuaishou is public (Hong Kong: 1024). For a concrete snapshot: Investing.com showed ~74.70 HKD on Jan 9, 2026. (Always re-check live quotes if you’re trading—HK prices move daily.)
Example prompts for Kling-style “short-video engine” tests
Creator template
“Vertical 9:16, 6 seconds: a streetwear outfit reveal, quick cuts, cinematic lighting, smooth stabilization.”
Stylized effect
“A city street turns into watercolor paint as the camera moves forward; buildings melt into brush strokes; seamless transition.”
5) Platform deep dives: image generation (and why it’s still the bigger market)
5.1 Adobe Firefly (enterprise-safe positioning + workflow dominance)
Why Firefly matters
Adobe sells not just generation—but where generation lives: Photoshop, Illustrator, After Effects, Express, etc. That is distribution and retention.
Adoption stat (from Adobe)
Adobe said people generated 6.5+ billion images since Firefly’s introduction (as of April 2024).
Practical examples you can use immediately
Product photography cleanup
“Remove background, replace with clean white seamless, keep soft shadow under product, maintain label sharpness.”
Lifestyle ad set
“A bright kitchen, morning sunlight, steaming mug on wooden table, clean minimal look, space for headline on left.”
Brand-safe variations
“Generate 10 colorways of the same composition: teal, coral, monochrome, warm neutral; keep framing identical.”
Public stock snapshot
Adobe (ADBE) last snapshot here: ~$333.95.

5.2 Midjourney (quality king, high IP pressure, massive community)
Why Midjourney still wins mindshare
Midjourney consistently leads on:
Aesthetics “out of the box”
Style richness
Fast iteration for concept art / mood boards
Third-party research profiles have cited huge Discord-scale community membership. But Midjourney also faces legal and IP pressure: for instance, Warner Bros. Discovery filed a lawsuit accusing Midjourney of copyright infringement related to training and outputs (per coverage).
Example prompts that show Midjourney-style strengths
Concept art
“A bioluminescent forest with giant mushrooms, mist, cinematic lighting, ultra-detailed, wide shot.”
Fashion editorial
“Studio portrait, high-fashion styling, dramatic shadow, soft film grain, editorial composition.”
Packaging ideation
“Minimalist coffee bag packaging, modern typography, matte finish mockup, product photo realism.”
Business notes
Midjourney is private and historically self-funded (no public ticker). Revenue/user figures often come from estimates/secondary analysis; treat them as approximate unless audited.
5.3 Stable Diffusion ecosystem (open tooling, infinite customization)
Why open ecosystems matter
Stable Diffusion-like ecosystems win on:
Fine-tuning / LoRAs
ControlNet / pose/depth controls
ComfyUI pipelines
Private on-device / on-prem workflows (IP concerns)
This “ecosystem moat” is hard for closed platforms to replicate because creators build reusable graphs/pipelines and share components.
Examples: what open pipelines can do exceptionally well
Character sheet consistency
Use a consistent LoRA + reference image + fixed seed strategies to generate turnarounds and expression sheets.
Product placement
Inpaint a product into scenes with controlled lighting and perspective.
Storyboards
Generate consistent frames for a short sequence, then animate with an image-to-video model.
5.4 Flux (Black Forest Labs): the “new power lab” in images
Black Forest Labs is strongly associated with the Flux family of foundation models. It attracted major funding: TechCrunch reported a $300M Series B at a $3.25B valuation (Dec 2025). That’s a major “signal” that top-tier image generation is becoming infrastructure—not just a novelty.
(Private company: no stock ticker.)
5.5 Ideogram (text + typography: the underrated battleground)
Ideogram drew attention for strong text rendering (logos, poster text, design-like outputs) and raised $80M Series A (reported widely).
Examples where Ideogram-type models shine
Poster
“A concert poster with readable headline ‘NEON NIGHT’, subtext ‘Friday 10PM’, synthwave design, clean layout.”
Logo exploration
“Minimal geometric logo for ‘Aurora Coffee’, readable wordmark, vector-like simplicity, black and white set.”

6) Company value, stock prices, and “who captures the money”
6.1 The public-market winners are mostly “picks and shovels”
Even if you believe the best generator wins, compute still dominates economics:
NVIDIA (NVDA) is a core beneficiary of generative workloads.
Microsoft (MSFT) benefits from cloud + OpenAI distribution.
Alphabet (GOOGL) benefits from Gemini/Vertex + consumer distribution.
Meta (META) benefits from distribution + ad products.
Adobe (ADBE) benefits from creative workflow lock-in.
Here are the latest snapshots returned in this session:
NVDA ~$184.86
MSFT ~$479.28
GOOGL ~$328.57
META ~$653.06
ADBE ~$333.95
And a visual chart for one of the most “AI-exposed” infrastructure beneficiaries:
NVIDIA Corp (NVDA)
$184.86
-$0.15(-0.08%)January 9
$184.98+$0.12(+0.06%)After Hours
1D5D1M6MYTD1Y5Ymax
Open185.02
Volume131.3M
Day Low183.69
Day High186.97
Year Low86.62
Year High212.19
6.2 Private valuations: the new unicorn stack
OpenAI ~ $500B valuation (reported)
Runway valued at over $3B in 2025 funding context (reported)
Black Forest Labs $3.25B valuation (reported)
Pika reported around $470M valuation after Series B (varies by source)
Kuaishou (public) benefiting from Kling narrative + consumer AI video with price snapshot around 74.70 HKD (Jan 9, 2026)
7) Practical “wide examples” of real use cases by industry
7.1 Marketing & ads
What’s changed: A/B creative testing is now limited more by brand policy than production.
Generate 20 variants of a product hero shot (backgrounds, moods)
Generate 10 micro-video hooks for TikTok/Reels
Localize assets with language/region variants (and keep layout consistent)
Best-fit platforms:
Firefly for “workflow + enterprise comfort”
Runway/Kling/Pika for short-form performance creative
Veo for developer-driven pipelines and scale
7.2 Film previsualization
What’s changed: directors can prototype shots without a full crew.
Mood boards (Midjourney/Flux)
Storyboards (Stable ecosystem)
Animatics (Runway/Veo/Luma)
7.3 Game dev & worldbuilding
Environment concepts
Item sheets
NPC portrait sets
“Style bible” exploration
7.4 E-commerce
On-model fashion shots from a base mannequin photo
Background replacement for marketplaces
Seasonal variations without reshoots
8) The biggest remaining problems (and how platforms are tackling them)
8.1 Copyright, licensing, and training-data disputes
This is the #1 business constraint for many enterprises. Litigation pressure (e.g., major studios suing generators) is becoming part of the landscape. Expect more:
licensed training datasets,
provenance tooling,
“style similarity” guardrails,
watermarking and detection.
8.2 Truth, fraud, and provenance
As video realism rises, provenance becomes essential. Google’s SynthID watermarking in Veo 2 rollout is one example of mainstream watermarking attempts.
8.3 Consistency and editability
The market is converging on:
scene consistency (Runway Gen-4 positioning)
prompt adherence
reference-based generation
layered editing (like Photoshop, but for video)
9) Predictions (2026–2030): where this is headed
9.1 Video becomes “editable media,” not just “generated clips”
The winning tools will behave less like slot machines and more like editors:
timeline-aware generation
object-level layers (“select the car → change color”)
re-lighting and re-camera after generation
9.2 “World models” become the marketing label for long-horizon coherence
You’ll see more claims like:
“persistent worlds”
“story continuity”
“multi-shot narrative memory”Runway and OpenAI already market toward this idea.
9.3 Audio + lip sync becomes standard
Veo’s model page explicitly pushes “Video, meet audio.” By 2030, silent generation will feel like “image-only” did after people expected quick animation.
9.4 The money splits three ways
Compute + cloud (NVDA/MSFT/GOOGL)
Workflow incumbents (ADBE)
Consumer distribution giants (META/ByteDance/Kuaishou-type ecosystems)
10) Scorecards (1–10) across major categories
Scoring categories
Quality (visual fidelity)
Consistency (identity + temporal stability)
Control (editability + references + constraints)
Speed (time-to-output)
Cost efficiency (value per dollar)
Workflow integration (real creator pipelines)
Enterprise readiness (governance, licensing posture)
Ecosystem (community, plugins, interoperability)
Innovation velocity (how fast it improves)
Note: Scores are based on publicly described capabilities, product direction, and adoption signals—not internal benchmarks. The goal is comparative usefulness, not a “universal truth.”
10.1 Video platforms
OpenAI Sora
Quality: 10
Consistency: 9
Control: 7
Speed: 6
Cost efficiency: 6
Workflow integration: 6
Enterprise readiness: 7
Ecosystem: 7
Innovation velocity: 10Why: frontier output + world-sim framing; distribution and hands-on controls depend on product packaging.
Google Veo (Veo 2/3 line)
Quality: 9
Consistency: 8
Control: 8
Speed: 7
Cost efficiency: 7 (pricing and formats evolving)
Workflow integration: 8 (Gemini + Vertex)
Enterprise readiness: 9
Ecosystem: 8
Innovation velocity: 9
Runway (Gen-3/Gen-4)
Quality: 9
Consistency: 9 (explicit multi-scene positioning)
Control: 9
Speed: 8
Cost efficiency: 7
Workflow integration: 9
Enterprise readiness: 7
Ecosystem: 8
Innovation velocity: 9
Kling AI (Kuaishou)
Quality: 8
Consistency: 8
Control: 7
Speed: 9
Cost efficiency: 8
Workflow integration: 8 (consumer creation loop)
Enterprise readiness: 6
Ecosystem: 7
Innovation velocity: 8
Pika
Quality: 7
Consistency: 7
Control: 7
Speed: 9
Cost efficiency: 8
Workflow integration: 7
Enterprise readiness: 6
Ecosystem: 7
Innovation velocity: 8
Luma Dream Machine
Quality: 8
Consistency: 8
Control: 8
Speed: 7
Cost efficiency: 7
Workflow integration: 7
Enterprise readiness: 6
Ecosystem: 7
Innovation velocity: 8
10.2 Image platforms
Adobe Firefly
Quality: 8
Consistency: 8
Control: 9
Speed: 9
Cost efficiency: 8
Workflow integration: 10
Enterprise readiness: 10
Ecosystem: 9
Innovation velocity: 8
Midjourney
Quality: 10
Consistency: 8
Control: 7
Speed: 8
Cost efficiency: 8
Workflow integration: 6
Enterprise readiness: 5 (IP pressure is a factor in enterprise decisions)
Ecosystem: 9
Innovation velocity: 7
Stable Diffusion ecosystem
Quality: 8
Consistency: 8
Control: 10
Speed: 7
Cost efficiency: 9 (depending on hardware/self-hosting)
Workflow integration: 8
Enterprise readiness: 7 (varies by vendor/hosting/licensing)
Ecosystem: 10
Innovation velocity: 9
Flux (Black Forest Labs)
Quality: 9
Consistency: 8
Control: 8
Speed: 8
Cost efficiency: 7
Workflow integration: 7
Enterprise readiness: 7
Ecosystem: 7
Innovation velocity: 9
Ideogram
Quality: 8
Consistency: 7
Control: 8
Speed: 8
Cost efficiency: 8
Workflow integration: 7
Enterprise readiness: 6
Ecosystem: 7
Innovation velocity: 8
11) Quick “who should use what” cheat sheet
If you want the best “creative pipeline” video tool today: Runway
If you want frontier “world-sim” vibes / long coherent clips: Sora-class systems
If you want enterprise-scale + API integration: Veo via Vertex/Gemini ecosystem
If you want ad/marketing workflows: Firefly + Adobe suite
If you want “instant beautiful images” without tweaking: Midjourney
If you want maximum control and customization: Stable Diffusion ecosystem
If you want typography/posters/logos with readable text: Ideogram