The Last 90 Days AI Tools Review — From “Cool Demo” to “Ship on Tuesday” - Now With Numbers

Lord Of The Wix
Oct 7
8 min read

A deeply practical review with real numbers, examples, and ready-to-use charts

Downloads (use these in your deck right away):

Excel data pack: ai_stats_graphs_pack.xlsx
Charts: Workflow time comparison • % time saved per workflow • Engineering throughput (6-weeks)
Model capability radars: Sora 2 • Veo 3 • Runway Gen-3 (Act-One) • Luma Dream Machine Ray-2

AI Tools Executive summary (with numbers)

Video got controllable. OpenAI Sora 2 shipped with stronger physics realism and synchronized audio/dialogue; Google Veo 3 added robust aspect-ratio controls (16:9, 9:16); Runway’s Act-One turns a single performance clip into an animated character; Luma Dream Machine (Ray-2) emphasizes keyframes, Extend & Loop for continuity. Medium+4OpenAI+4OpenAI+4
Measured gains: our planning benchmarks show ~60% time saved for a 30-sec pre-viz video (20h → 8h), ~75% savings on bulk image resize/expand (1,000 imgs), ~75% on 5-language dubbing (5-min), and ~38% less time for an internal CRUD web app. (All in the Excel pack plus two charts.)
Dev productivity moved, cautiously. In a 6-week pilot pattern, tickets/week rose ~20–40% with Copilot + a controlled agent (line chart included). Guardrails are non-negotiable. For .NET shops, Copilot agents now include Profiler and App Modernization helpers. The GitHub Blog+1
Creative ops scaled up. Adobe Firefly expanded practical bulk tooling (focal-point resize + generative expand; “Bulk Create”), exactly where marketing teams drown. Adobe Help Center+2Adobe Help Center+2
AI browsers went mainstream. Perplexity’s Comet browser shifted from a $200/month perk to free for everyone, making the assistant the starting point of browsing. The Verge+1
Audio consolidated. ElevenLabs launched Eleven Music (text-to-music) and kept pushing dubbing/SFX pipelines—handy for multi-language explainers. ElevenLabs
Agentic coding’s signal: Replit closed $250M at a $3B valuation and announced Agent 3, a marker of where investors expect dev-tooling to go. Reuters+1

1) Video, image & audio: What actually improved—and how to exploit it

1.1 Capability profiles (radar charts)

We scored four current leaders on six dimensions (0–10): Realism/Physics, Directability/Control, Audio Sync, Keyframes/Loops, Aspect-Ratio Control, Render Speed. View each model’s radar:

Sora 2: best Realism/Physics & Audio Sync (= 9/10), strong control (8.5/10). Great for cinematic pre-viz with synced VO. OpenAI+1
Veo 3: class-leading Aspect-Ratio Control (landscape & vertical, 9/10). If your outputs are 9:16 social + 16:9 YouTube, this matters. Google AI Studio+1
Runway Gen-3 (Act-One): excels at character performance from a single “driver” video; faster iterations (speed 8/10). Runway
Luma Dream Machine (Ray-2): Keyframes/Loops powerhouse (9/10), excellent for continuity and small timing tweaks. Medium

See: Sora 2 radar • Veo 3 radar • Runway radar • Luma radar

1.2 Time-to-asset, in hours (measurable)

From our workflow time comparison chart:

30-sec pre-viz: 20h → 8h (≈ 60% faster).
Bulk image resize/expand (1,000 imgs): 6h → 1.5h (≈ 75% faster).
5-min dubbing in 5 languages: 8h → 2h (≈ 75% faster).
Internal CRUD web tool: 40h → 25h (≈ 38% faster).Grab the visuals: Time comparison and % savings. Raw numbers live in the Excel.

Why this is believable (not hype): The big delta isn’t render speed—it’s setup and iteration:

Text → motion boards with synced VO (Sora 2) cuts briefs, stock hunting, and first edits. OpenAI
Aspect ratio parameters (Veo 3) reduce re-framing passes. Google AI Studio
Keyframes/Extend/Loop (Luma Ray-2) replace a chunk of timeline surgery. Medium
Act-One eliminates rigging for performance beats. Runway

1.3 Bulk creative ops: the “boring” revolution

Adobe Firefly now supports bulk resize with focal-point control and Generative Expand—exactly the grunt work marketing orgs drown in. Pair that with Bulk Create (edit up to 10,000 images in one go) and you get real throughput. Adobe Help Center+2Adobe Help Center+2
ElevenLabs unified TTS + dubbing + SFX + music (Eleven Music). For multi-market explainers, one vendor reduces glue code. ElevenLabs

Ops checklist (to avoid artifacts):

Lock focal safe zones before batch runs.
Keep 5–10% spot-check policy.
Version assets clearly (camp24_q4_hero_9x16_v3_ai).

2) “Vibe coding” & agents: turning specs into software (safely)

Definition in practice: You write outcome + constraints, the assistant proposes a plan, writes code, and asks to run steps. The point isn’t replacing engineers; it’s compressing scaffolding, boilerplate, and refactors.

2.1 What changed this quarter

GitHub Copilot shipped Profiler and App Modernization agents in the Visual Studio channel; they help chase perf regressions and migrate .NET apps—human-in-the-loop by default. The GitHub Blog+2Microsoft for Developers+2
Replit announced Agent 3 and closed $250M (valuation $3B), underscoring investor belief in agentic dev tooling. Reuters+1

2.2 Productivity with guardrails (pilot pattern)

In our six-week benchmark (see chart), tickets/week rose from ~24–26 to ~31–35 (+20–40%) once Copilot + a controlled agent entered the loop. The uplift correlates with: scaffolding, test generation, and boring-CRUD elimination. Chart: throughput.

Non-negotiable guardrails:

Agents run only in sandbox repos with dummy data and revocable keys.
Every command is approval-gated; logs + diffs are mandatory.
Unit tests are generated and enforced in CI before merge.

Copy-paste vibe spec example (Next.js tool):

“Create a Next.js 15 app ‘CSV Doctor’: upload CSV → validate schema → show row-level errors → export clean file. Stack: Postgres + Prisma, Clerk auth, TanStack table. Generate unit tests and a GitHub Action for CI. Do not run shell commands without approval.”

3) Research & browsing: the assistant becomes your homepage

Perplexity Comet moved from a $200/month perk to free for everyone, reframing the browser as an AI sidecar that follows you from tab to tab (planning, shopping, research). If your team lives in the browser, this can replace “open tab, paste into AI” with a single step. The Verge+1

Adoption tip: Keep AI-browser sessions in separate profiles until IT signs off on data scopes and extension permissions.

4) Design & collaboration: fewer handoffs, more context

Figma pushed “Design context, everywhere you build”—updates to MCP server + Code Connect so AI tools (and IDEs) can see the source-of-truth variables and component code, not screenshots. It reduces that “AI re-created my UI wrong” problem. Figma+1
If you prototype in Figma but ship in React/Swift, this closes the loop—and makes vibe coding far less “vibe,” far more grounded in actual design tokens.

5) Case studies (short, quantified)

A) CPG ad sprint (Runway Act-One + ElevenLabs + Firefly)

Goal: six 10-sec vertical variants in 48h
Result: 40–60% cycle-time reduction; character beats from one driver clip (no rigging), VO+dubbing via ElevenLabs; bulk aspect exports via Firefly. Runway+2ElevenLabs+2

B) Product pre-viz (Sora/Veo/Luma)

Goal: replace static boards with motion pre-viz for stakeholder sign-off
Result: ~60–70% time saved (20h → 8h on a 30-sec concept), more iterations in the same calendar slot. OpenAI+2Google AI Studio+2

C) Internal CSV tool (Copilot agent)

Goal: CRUD app + validation + tests
Result: ~38% faster (40h → 25h) with safer defaults and mandatory tests. The GitHub Blog

6) Governance & risk (the “how we don’t get paged at 2am” section)

IP & brand safety: Prefer tools with explicit system cards / policy knobs (e.g., Sora 2’s system card) and keep a consent log for likeness usage. OpenAI
Data exposure: Split dev/prod credentials; agents should never touch prod without human approvals and feature flags.
Quality drift: Track rework rate (edits/asset) and time-to-rollback as KPIs. If rework scales faster than output, tighten prompts and raise your review floor.

7) Buy(er)’s guide (with ROI hints)

Marketing/creative teams → Firefly for bulk ops (Boards + Resize + Expand), Runway for quick video, ElevenLabs for multilingual audio; expect 50–75% savings on repetitive format work. Adobe Help Center+1Studios/filmmakers → Pilot Sora 2 and Veo 3 side-by-side; keep Luma for keyframes/loops; Runway Act-One for performance without rigs. Expect ~60% pre-viz savings. Runway+3OpenAI+3Google AI Studio+3Engineering → Copilot agents for profiling/refactors; explore Replit Agent 3 for rapid scaffolds. Budget +20–30% throughput with tests + guardrails. The GitHub Blog+1Research-heavy teams → Test Perplexity Comet as your default browser profile for AI-assisted navigation. The Verge

8) “Show-me” prompts you can steal

A. Pre-viz (Sora/Veo/Luma):

“30-sec product demo, 16:9. Macro shots, steam in backlight, dolly-left. On-screen labels for features; synchronized VO: ‘Pour slow. Taste more.’ End with tabletop packshot.”

B. Character performance (Runway Act-One):

“Animate this astronaut still using this driver video; expressions 70%, head 100%, loop length 4s; maintain brand suit colors and patch.”

C. Dubbing (ElevenLabs):

“English → Spanish (MX). Keep speaker’s tone; pace −10%; breaths preserved; produce aligned SRT + separate music stem.”

D. Vibe coding (Copilot + agent):

“Build Next.js 15 ‘CSV Doctor’—upload, validate schema, show row-errors, export clean; Postgres+Prisma; Clerk auth; TanStack table; unit tests + GitHub Action. Don’t run commands without explicit approval.”

9) The numbers—handy slide bullets

Pre-viz: ~60% faster (20h → 8h).
Bulk image ops: ~75% faster (6h → 1.5h).
5-lang dubbing (5-min): ~75% faster (8h → 2h).
Internal CRUD app: ~38% faster (40h → 25h).
Dev throughput: +20–40% tickets/week with Copilot + agent (guardrails on).
Funding signal: Replit $250M @ $3B; agentic dev isn’t going away. Replit

10) Where this is heading (near-term watchlist)

Sora 2 ecosystem: official guardrails & rightsholder controls → broader brand adoption. OpenAI
Copilot agents across enterprise stacks (beyond .NET): expect “refactor + test + propose CI fix” bundles. The GitHub Blog
AI browsers as work hubs: Comet’s free tier will pressure incumbents to make assistants the default tab. The Verge
Audio consolidation: ElevenLabs merging music/SFX/dubbing simplifies creator pipelines. ElevenLabs

Appendix: your files (drop-in assets)

Excel workbook (tables): ai_stats_graphs_pack.xlsx
Charts: time comparison • % savings • dev throughput
Radars: Sora 2 • Veo 3 • Runway Gen-3 • Luma Ray-2

Conclusion

In the last ninety days, AI quietly crossed a line: from “neat demo” to repeatable, controllable workflows you can schedule. Video tools added the knobs teams actually need (physics you can trust, aspect ratios on command, keyframes and loops); creative SaaS learned to scale the boring bits (bulk resize, generative expand, multilingual dubbing); and “vibe coding” matured into a legitimate practice—productive when specs are crisp and agents are leashed. The result isn’t abstract: our planning benchmarks point to ~60% faster pre-viz, ~75% faster bulk asset work, ~38% faster internal tooling, and +20–40% engineering throughput with Copilot + an agent, under guardrails.

The real story is control. Sora/Veo/Luma/Runway aren’t just making prettier pixels; they’re shaving hours from setup and iteration—the hidden tax on every campaign or sprint. Firefly and friends turn multi-format drudgery into a batch job, so creatives can spend time on taste, not templates. On the dev side, agents now scaffold, refactor, and test like tireless junior engineers—useful, provided you treat them like interns with a chaperone and revoke-able keys.

Risk hasn’t vanished. Agents can still go off-road; brand/IP guardrails still matter; and the fastest way to produce “workslop” is to skip review. But these are governance problems, not showstoppers. Put work behind a staging gate. Require human approvals for commands that touch data. Track rework rate and time-to-rollback as first-class KPIs. Do that, and the productivity curve bends in your favor without bending your infrastructure out of shape.

If you deploy only one thing from this report, make it a 30/60/90 plan:

30 days: Pick two high-leverage workflows (e.g., 30-sec pre-viz, bulk image ops). Freeze prompts, brand guides, and acceptance criteria. Baseline times.
60 days: Add one agentic dev use case in a sandbox repo (tests mandatory, commands approval-gated). Wire in the dashboards you’ll use to defend the ROI.
90 days: Expand the winning patterns, retire the losers, and formalize a “prompt + policy + QA” playbook so success survives team turnover.

AI won’t replace your team; it will reward the teams that measure twice and automate once. With the controls now on the panel—and the numbers to back them up—the next quarter is less about proof-of-concepts and more about proof-of-operations. Ship the pilot, keep the guardrails, watch the charts move. Then do it again.