Signal or Noise
We run through the week's AI headlines and make the call: is this actual signal worth paying attention to, or just noise clogging the timeline?
Real talk. Unpopular opinions. No hype. Two builders cut through the AI hype cycle every week — and call it like it is.
Hosted by Oscar Gallo & Matt Wozniak
In machine learning, “human in the loop” means a human who provides oversight and feedback in an automated system. That’s the lens of this show: AI is powerful, but humans aren’t leaving the loop. Not yet. Maybe not ever.
This isn’t another “AI is going to change everything” podcast. It’s for builders, operators, and the AI-curious who are tired of breathless hype, doomerism, and surface-level news recaps with no original thought.
We run through the week's AI headlines and make the call: is this actual signal worth paying attention to, or just noise clogging the timeline?
A real use case or product idea lands on the table. We debate whether it's worth building now or if the tech isn't there yet.
Take a complex AI concept and explain it the way you'd actually explain it to a non-technical stakeholder. No jargon allowed.
One tool, library, or workflow change we actually adopted this week. No sponsorship energy. Just what's in the trenches.
AI Engineer & Entrepreneur
Oscar lives at the intersection of engineering and business. He builds AI products, ships code, and runs companies — the daily grind of making AI work in the real world.
Builder, Operator, Relentless Executor
Matt is the operator's operator. He builds, he ships, he scales. His lens is execution: what works, what doesn't, and why most people are thinking about it wrong.
On May 12, 2026, Jackson Oaks — founder of Recursion AI and the self-hosted AI platform Courier — joins the show. Jackson has been running open-source models in production for SMBs since early 2025, back when most of the industry still called local inference a hobbyist toy. Today his company runs production workloads on Apple Mac hardware, processes 600 to 800 thousand API calls a month per machine, and is profitable doing it.
His thesis: eighty percent of businesses have eighty percent of use cases that don't need Opus or GPT-5.5. They need a smaller open-source model, good systems, and predictable pricing. Frontier intelligence, or boring infrastructure? Per-token billing, or a flat rate on hardware you already own?
That's the cold open. Every fifth episode of Human In The Loop is a guest deep-dive — no Signal or Noise, no rotating segment, one topic, one expert. Oscar, Matt, and Jackson spend 58 minutes on what most AI bills are actually paying for, and close with three hot takes you won't hear anywhere else.
Where local models win, where frontier models still matter, and how to tell which category your use case is in.
Time bought back, margin protected on AI products, conversion lift on AI features — and which one is nearly impossible to measure honestly.
Why M-series is the most underrated AI hardware story of the decade. 1/10th the cost of NVIDIA, 30 to 40x more power efficient. Apple has been positioning for this since the M1 dropped — the same week ChatGPT did.
600 to 800,000 API calls a month on a single Mac Studio. Flat-rate cloud pricing starting at $100/mo for 45 to 100K calls. The math on why predictable beats per-token for SMBs.
A Fortune client paid Deloitte $400,000 for what turned out to be a GPT wrapper with an unoptimized RAG database. MIT says 95% of business AI pilots can't measure ROI. We have theories.
The infinite hallucination loop that ate 10,000 requests in a week. Building hallucination detection from scratch. The fine-tuned 14B model that outperformed GPT-4o on a real production task because the data was good and the task was narrow.
Idempotency in non-deterministic systems. Observability. Redundancy. Why your AI feature needs the manual OCR backup you were about to delete.
Why the next wave of AI businesses isn't “AI for everything” — it's $200/mo agents for pressure washers, plumbers, landscapers, attorneys, accountants. Specialize down, charge real money, capture real value.
Matt“One of Anthropic or OpenAI buys the other — or merges with a foreign lab — inside three years. Or open-source overtakes both and they shrink to a fraction of their current valuations. Their burn rate is not sustainable.”
Oscar“Every home will eventually have a Mac Mini running a better Alexa, powered by open-source models — not closed-source ones. That future is closer than people think.”
Jackson“Better systems and structure with smaller open-source models beat throwing frontier models at unstructured problems. People are compensating for a lack of system with a bigger model. We're doing the opposite — we're building better systems so we can use smaller models.”
On April 24, 2026, an AI coding agent hit a credential mismatch in a staging environment. Nine seconds later, the entire production database of a company called PocketOS was gone. Backups too. The agent confessed: “I violated every principle I was given.” The model was Claude Opus 4.6.
Five days later, on April 29, Bloomberg and CNBC reported Anthropic was weighing offers at a $900 billion valuation. Roughly $50 billion in new capital. A round that would put the lab behind Claude above OpenAI as the most valuable AI company in the world. Two stories. Same week. Same lab. We call every one.
That's the cold open. From there, Oscar and Matt spend 46 minutes doing what this show does: filter the week's AI noise, debate three real business ideas, and close with two hot takes you won't hear anywhere else.
$50B raise on the table. Run-rate revenue at $30B. The Mythos withholding story we called in Episode 1 was the brand. The Google $40B in Episode 3 was the funding story. This is the receipt. Treat your model spend like an interest rate, not a software line item.
Nine seconds. Backups gone. The agent went looking for an API token, found one in an unrelated file, and used it. The token was scoped for any operation, including destructive ones. Three holes in the swiss cheese. The agent walked through all of them.
Beijing killed Meta's $2B Singapore-routed AI deal on April 27. Meta acquired humanoid-robotics startup Assured Robot Intelligence on May 1. The thesis didn't change. The substrate did. Robotics is the next frontier-model fight.
Agents can now create accounts, register domains, start paid subscriptions, and deploy apps with no human in the loop. Default spend cap is $100 per provider per month. The PocketOS story is the demand signal for spend caps and audit logs.
The Hangzhou Intermediate People's Court upheld a ruling that Zhou, a QA supervisor verifying LLM outputs, couldn't be fired or pushed to a 40% pay cut just because AI took over his work. First major court ruling anywhere putting limits on AI-driven layoffs.
Per Gustaf Alströmer's RFS. Don't sell SaaS to insurance brokers, accountants, or compliance teams. Become the broker. File the taxes. Run the audit. Charge services prices for AI-margin work. The wedge is bigger than the entire SaaS market. Pushback: you stop being software and start being ops, with E&O liability and license requirements baked in.
Per Tom Blomfield's RFS. The blocker to AI automation isn't the models anymore. It's domain knowledge scattered across heads, email, Slack, tickets, and databases. Build the layer that pulls fragmented company knowledge, keeps it current, and turns it into an executable skills file for agents. Pushback: this is wikis 5.0, and Microsoft and Notion eat the category as a feature in 18 months.
Per Tyler Bosmeny's RFS. $500K interceptors don't work against $500 drones. Build the Cloudflare-shaped layer for swarm defense: distributed sensors, software-first interceptors, autonomy-stack attacks. The thesis is real. The dual-use moral question is not optional.
Oscar“Anthropic at nine hundred billion dollars is the moment the labs stopped being labs. They're sovereign wealth funds with a research division attached. The Mythos withholding story we called in Episode 1 was the brand. The fundraise is the receipt. Price your model spend like an interest rate, not a software line item.”
Matt“Every AI agent in production is one bad scope away from being the PocketOS story. The fix isn't better models. It's permissions, audit logs, and a human approval gate on every destructive operation. If your agent can drop a database, your agent will drop a database. The question isn't if. The question is on what week.”
On April 23, 2026, OpenAI shipped GPT-5.5, the model previously known as Spud. The same day, VentureBeat clocked it narrowly beating Anthropic's gated Claude Mythos Preview on Terminal-Bench 2.0. The day after, DeepSeek released V4, a 1.6 trillion parameter open-source model running natively on Huawei chips, at one-sixth the inference cost of the closed frontier. And Google announced it would invest up to forty billion dollars in Anthropic.
Two weeks ago we called GPT-5.5 noise. Now it's the lead story. The frontier is moving every six weeks, the cloud war is inside the labs, and the open-source ceiling jumped a tier on hardware your security team can't audit.
That's the cold open. From there, Oscar and Matt spend 49 minutes doing what this show does: filter the week's AI noise, review the tools they actually use, and close with two hot takes you won't hear anywhere else.
The first buyable frontier model to beat a gated frontier model on a public benchmark. Same week as the Google–Anthropic announcement. The two-tier story from Episode 2 just collided with the cadence story Fortune was writing all month.
1.6T total parameters, 1M context, MIT license, native on Huawei Ascend 950PR. The open-weights ceiling moved by a tier. Geopolitics moved into the model card.
Google has Gemini in-house and is still writing a forty billion dollar check. That tells you what they actually believe about the next 18 months.
Devin-in-Windsurf is the first time the agent and the IDE feel like one product, not two stitched together. Whether the price tag holds is a different question.
Video and slide generation, no cross-session memory. The price tag is the message.
Browser automation CLI built for agents. Persistent sessions, headless or real Chrome with your profile, annotated screenshots so a model can click “label 7” instead of guessing CSS selectors.
The auth and permissions layer for agent surfaces. Sub-50ms p95 authorization checks. AI Installer in the CLI sets up auth in under five minutes. Last week's Vercel breach was a permissions story with a price tag attached.
Define the agent once, swap the model in one line. The Gateway added GPT-5.5, DeepSeek V4, Kimi K2.6, and GPT Image 2 inside two weeks. The only honest way to evaluate this week's launches on real workloads.
Run claude remote-control on your laptop, approve file changes from your phone, walk to lunch. Work stays local. The walkaway-and-approve workflow is the productivity win of the quarter.
Oscar“Open-source AI just won the cost war and lost the trust war in the same week. DeepSeek V4 is the best open weights anyone has ever shipped. It also runs on chips your security team can't audit.”
Matt“OpenAI shipping GPT-5.5 two weeks after GPT-5.4 is the end of the model launch as a marketing event. From now on, the model layer is a software update, and the only thing that matters is what you ship on top of it.”
On April 16, 2026, Anthropic shipped Claude Opus 4.7 with a 1M token context window at standard API pricing, a 128k max output, and a new “xhigh” effort level tuned for coding and agentic work. In the same announcement, Anthropic said on record that 4.7 is “less risky than Mythos,” and that Mythos stays gated to roughly 50 enterprise partners under Project Glasswing. So the flagship you can buy is, by Anthropic's own words, their second best.
Radical honesty, or the sharpest upsell in enterprise AI since cloud credits?
That's the cold open. From there, Oscar and Matt spend 45 minutes doing what this show does: filtering the week's AI noise, breaking down the concepts decision-makers actually need, and closing with two hot takes you won't hear anywhere else.
1M context at standard pricing is the real story. A meaningful win for long-horizon agent workloads. And Anthropic just taught every enterprise buyer to ask, “what's the gated tier.”
Sandboxed agents as a primitive. Python first, TypeScript later. Good news for builders, and great news for OpenAI's runtime.
Google's native Gemini macOS app plus Perplexity's Personal Computer. The OS is replacing the browser as the AI UX primitive.
Only 10% of Americans are more excited than concerned about AI, versus 56% of experts. Your consumer product has a trust problem, not a capability problem.
A struggling sneaker company rebrands as NewBird AI, sells the footwear business, drops its environmental charter, and the stock jumps 582%. The “pivot to AI” is the new “pivot to crypto.” Price your term sheets accordingly.
Prompt engineering is writing a good email. Context engineering is designing the whole office the recipient works in. When every company can access the same foundation models, the wiring is the moat.
What an agent actually is, and why Gartner says 40% of enterprise agentic AI projects will be canceled by the end of 2027 because of cost, unclear value, and weak risk controls.
The top AI labs now ship a public tier and a gated tier. The best model is probably one you cannot buy. How to build strategy around that reality.
Oscar“Context engineering isn't a new skill. It's ‘write good internal documentation’ rebranded so consultants can charge for it. The companies winning in 2026 had clean internal data in 2022.”
Matt“Every executive asking ‘what's our agentic AI strategy’ in April 2026 is two years from being replaced by one who already shipped something.”
On April 7, 2026, Anthropic unveiled Claude Mythos Preview — a frontier model they describe as a “step change” in capability — and then refused to release it. Instead: Project Glasswing, a gated program for Amazon, Apple, Microsoft, Cisco, CrowdStrike, and a handful of others. The first major withheld frontier model in roughly seven years.
Responsible scaling finally biting? Or enterprise GTM dressed up as safety PR?
That's the cold open. From there, Oscar and Matt spend 45 minutes doing what this show does: filtering the week's AI noise, debating real business ideas, and closing with two hot takes you won't hear anywhere else.
Anthropic used Mythos to identify thousands of zero-days across every major OS and browser, then chose not to ship. If Mythos-wielding attackers find what small shops can't patch, the defender/attacker asymmetry just broke.
First model from Meta Superintelligence Labs. Llama-4 midsize quality at ~10x less compute, shipping into Facebook, Instagram, WhatsApp, and Ray-Ban Meta glasses. If you build consumer or wearable UX, unit economics just moved.
Polymarket gives GPT-5.5/6 a 78% chance of shipping by April 30. Predictions aren't news. The real story is OpenAI's $122B raise.
OpenAI, Anthropic, and Google coordinating against Chinese labs. IP defense or three competitors ganging up on a fourth because it's easier than competing on capability?
Horizontal labs are quietly going vertical. Life sciences. Cybersecurity. Legal next? Finance? Pick your domain before the labs pick it for you.
Via My First Million Ep 811. Productized 30-day AI-ops install for HVAC, dental, property management. Software margins, services delivery. But the real question: are you the next picks-and-shovels play, or just another consultant in a trench coat?
Per-claim pricing instead of per-seat. Eats into labor P&L, not software budget. But insurance = 18-month sales cycles. You die of starvation before your first logo unless you have an insider co-founder.
Clone the $200M ARR playbook for a regulated vertical — legal ops, clinical workflows, HR. Compliance moat is real. But is it also the ceiling on product velocity?
Oscar“Anthropic withholding Mythos is the new moat. Safety-as-marketing is about to become the dominant frontier-lab playbook — because nobody wants to be the lab that shipped the model that broke the internet.”
Matt“The “AI for small businesses” gold rush is 90% consultants in trench coats. The real money isn't selling AI to SMBs — it's selling picks-and-shovels to the 10,000 consultants who are.”
New episodes every week. Reply with the story you want us on next.