AI Dev Essentials #8: Claude's Costly Lessons, Google's AI Flood & Workshop Alert 🚀

Hey Everyone 👋,

John Lindquist here with the eighth issue of AI Dev Essentials! This past week, I've been diving deep into Claude's Max Mode within Cursor, and wow, what a ride. I've been pushing it hard for five straight days, and the first thing I need to tell you is: it gets expensive, fast. I'm already over $200 in charges, and that's just the beginning.

Here's what I've learned: if you give Claude a vague task without super specific constraints, it goes absolutely wild. It'll build CLI tools, make a crazy number of tool calls, and basically try to solve your problem from every angle imaginable. Watching this happen is both fascinating and terrifying because you can literally see the dollars adding up with each action. If you're not babysitting it constantly, and it thinks it understands what you want (even when it's completely wrong), it will chase that solution with relentless determination.

I learned this the hard way when I accidentally missed something simple, like an API key. Instead of just asking me about it, Claude assumed there was some massive environmental bug and went on this expensive detective hunt. Same thing happened with a linter error. It misunderstood what was happening and started installing tools and trying fixes for a problem that didn't even exist, all because the linter failed to run initially.

Don't get me wrong, Claude's determination can be amazing when it's on the right track. But when it's not? It's like watching money burn while it confidently solves the wrong problem. I know you can fine-tune Cursor rules to help with this, but Claude rarely stops to ask for clarification. So here's my advice: if you're going to use Max Mode, keep your tasks small and crystal clear. Otherwise, you might end up with a hefty bill and a solution to a problem you never actually had.

Even with the release of Claude 4, I'm still more excited about what's coming next. OpenAI's o3 Pro and Gemini's 2.5 Pro DeepThink are what I'm really waiting for. In my experience, it's way more important to have really smart reasoning models create solid plans upfront. Then you can hand off the implementation to something like Claude 4. It doesn't need to be nearly as smart for that part. I've heard Google's 2.5 Pro DeepThink should drop in June (it's already with trusted testers), and o3 Pro was supposed to be here a couple weeks ago, so it feels like an "any day now" situation.

Beyond my Claude adventures, the AI world keeps spinning at breakneck speed. Google I/O dropped a ton of stuff to unpack, OpenAI keeps evolving, and the agent landscape is as wild as ever. Let's dive into the highlights!

New egghead.io Lesson This Week

Local AI Code Reviews with the CodeRabbit Extension in Cursor(egghead.io)
Learn how to use the CodeRabbit extension in Cursor for AI-assisted code reviews. This lesson covers initiating reviews, applying suggested changes (while managing them with Git), and using the "Fix with AI" feature to further refine solutions with more context, including running tests to verify fixes and iteratively working with the AI.

🚀 Google I/O Aftermath & The Expanding Gemini Universe

Google I/O and the subsequent announcements have significantly expanded the capabilities and reach of the Gemini ecosystem.

Gemini API Gets a Major Boost

Updates include an improved Gemini 2.5 Flash Preview (now the default in the Gemini App - Source(x.com)), advanced Text-to-Speech (TTS), native audio dialog capabilities, and new tools like URL Context for grounding responses and Thought Summaries for better debugging and insight into the model's reasoning. (Google AI Developers(developers.googleblog.com))

I'm still excited for the latest Gemini 2.5 Flash model to land in Cursor. I have yet to see it listed as one of the available models. I have high hopes that this will perform much better at tool calling than previous versions and I love any model that has a nice balance of speed and smarts. I've actually been using Gemini 2.5 Flash on my phone through the Gemini app and I've actually been really impressed with its responsiveness. So I'm excited to bring it into my developer workflows.
Gemini 2.5 Pro "Deep Think"

Showcased tackling complex problems like "catch a mole" from Codeforces, using parallel thinking to consider multiple hypotheses. (Google DeepMind(x.com))

This is one of my most highly anticipated releases. Again, I love what Gemini is doing integrating all of the deep research and Canvas and everything into a friendly UI. I've been turning more and more to Gemini rather than ChatGPT and Claude. And the fact that I can reference my Gmail and text messages and documents and sheets and everything from Gemini has just been awesome. So it'll be really interesting to play with that in a super smart DeepThink model.
Gemma 3n for On-Device AI

Google DeepMind introduced Gemma 3n, a multimodal model built for mobile on-device AI. It boasts a smaller memory footprint (nearly 3x RAM reduction), enabling more complex applications directly on your phone or for efficient cloud streaming. It can generate smart text from audio, images, video, and text, and is aimed at live, interactive apps and advanced audio commands. Now in early preview on Google AI Studio. (Google DeepMind Thread(ai.google.dev))

It'll be fascinating to see how far they can push these tiny models. What their capabilities will actually be considering that they'll never be as smart. They'll have access to a lot of real-time information that's on your device. And so that'll be an interesting trade-off. The real-time media and information it can provide through there versus the much slower thinking models. And as these sorts of models end up on glasses and watches and other devices, we're entering into an interesting world that will always be analyzed through the lens of AIs like this.
Specialized Gemma Models

Google also unveiled a suite of specialized Gemma models:
- MedGemma: Variants (4B multimodal, 27B text-reasoning) for medical text and image applications, like analyzing medical images or clinical reasoning. (Google AI Developers(x.com))
- SignGemma: A sign language understanding model (coming later this year) focused on translating ASL into English text.
- DolphinGemma: Can generate new synthetic dolphin sounds, potentially aiding interspecies communication research.
NotebookLM Mobile App Enhancements

Following its I/O spotlight, the NotebookLM mobile app received fine-tuning with fixes to Chat, sources, visuals, and stability. (NotebookLM(x.com)) Its ability to share content directly to notebooks is a standout feature for gathering and processing information.

If you haven't tried it out yet, the NotebookLM mobile app is awesome. It's a great way to collect things on your phone. You can share links and articles and anything to it and then get nice summaries and chat with those resources. Highly recommended.
AI-Powered Developer & Design Tools

I love seeing all the kind of ancillary tools that are coming out around AI. I love all of the Google Labs experiments and what they're sharing. I don't think they hype them up enough.
- Google AI Studio as a Cursor Alternative?: Paul Couvert demonstrated how Google AI Studio can be used to build apps with built-in AI capabilities directly in the browser, with easy sharing and deployment. (Paul Couvert Thread(aistudio.google.com))
  
  Using AI Studio to build apps, I think we're going to see more and more tools that are web-based where you can build entire apps. And AI Studio has been one of my favorite apps for many months now.
- Stitch Beta: A new Google experiment where you can use AI to generate UI designs that can be easily copied into Figma. (TestingCatalog News(x.com), Stitch(stitch.withgoogle.com))
  
  I know that Stitch Beta, while it's kind of an early version that's an app they essentially acquired from someone else, I love that they are competing with people like Vercel and others who are building design tools. That's definitely one of my greatest lacking skills.
- ChromeDevTools + Gemini: Addy Osmani highlighted a new feature to annotate performance findings with Gemini, generating labels for events in performance traces. (Addy Osmani(x.com))
  
  Seeing AI in the Chrome DevTools to annotate performance findings is a huge step forward in the telemetry and tracing and performance-based tools. Having AI analyze your app while it's running and understanding your code base is kind of the next huge unlock to code generation fixes, patches, and things that get a much greater understanding. AI understanding your code is one thing, and that's great. But AI understanding how your app is running is next level, because it's always more valuable to understand how the app is running than what the code is attempting to do.
Gemini App Upgrades

Now features Gemini Live with camera and screen sharing on Android & iOS (free), integrates Imagen 4 and Veo 3 for image/video generation, and sees major updates to Deep Research and Canvas. (News from Google(blog.google))
Project Astra & AI Glasses Prototype

Google demoed a prototype of AI Glasses with an in-lens display, running Android XR and showcasing features like a memory assistant. This points towards their vision for a "natural form factor for AI." (Alex Volkov(x.com))
Universal Assistant Vision

Logan Kilpatrick shared Google's vision for a universal assistant, representing the next evolution of the Gemini app. (Logan Kilpatrick(blog.google))

🏛️ Anthropic's New Frontier: Claude Opus 4 & Sonnet 4 Take Center Stage

Anthropic has made significant strides with its latest generation of models, often referred to by the community as "Claude 4," but officially designated as distinct models like Claude Opus 4 and Claude Sonnet 4. These models, appearing around May 2025, represent Anthropic's most advanced offerings yet, pushing boundaries in reasoning, coding, and complex task handling.

Official Introduction and Capabilities

According to Anthropic's model overview, Claude Opus 4 (e.g., claude-opus-4-20250514) is positioned as their "most capable model," designed for "complex analysis, longer tasks with many steps, and higher-order math and coding tasks." (Anthropic Models Overview(docs.anthropic.com), Meet Claude(anthropic.com))

It's super important to note that Claude's knowledge cutoff is from March 2025. So only a couple of months ago. Meaning that a lot of the latest technologies from Cloudflare and others who have released brand new frameworks and tools. If that's something that you're building, Claude is going to be the best tool for the job. Because it is much more aware of the most recent developments. And you may experience that in a lot of your own projects, even though a lot of the models may be equal. If you're working on much newer code, the model with the latest or the most recent knowledge cutoff might be the winner. Or will probably be the winner.

Claude Sonnet 4 (e.g., claude-sonnet-4-20250514) is described as a "high-performance model" and their "best combination of performance and speed for efficient, high-throughput tasks."

For developers, Anthropic highlights that models like Claude Sonnet 4 lead on benchmarks like SWE-bench Verified (72.7%). They emphasize using Claude to "Write, test, and debug complex software," "Analyze codebases with expert-level reasoning via Github integration," and "Delegate dev tasks to Claude with computer use." (Anthropic Coding Solutions(anthropic.com))
Security, Ethics, and Vulnerabilities
- ASL-3 Protections for Opus 4: Anthropic announced that Claude Opus 4 is being deployed with AI Safety Level 3 (ASL-3) measures as a precautionary step due to its advanced capabilities, particularly concerning CBRN (Chemical, Biological, Radiological, Nuclear)-related knowledge. This involves enhanced defenses against misuse and jailbreaking for these specific risks. (Anthropic ASL-3 News(anthropic.com))
- GitHub MCP Vulnerability: A notable vulnerability was highlighted by Luca Beurer-Kellner where "Claude 4" (referring to the advanced Claude models) connected to GitHub's official MCP server could potentially leak private repository data if manipulated through prompt injection in a GitHub issue. This underscores the critical need for strict agent permissions and continuous monitoring in agentic systems. (Luca Beurer-Kellner Thread(x.com), Invariant Labs Blog(invariantlabs.ai))
  
  This is relatively big news as far as just letting agents in the background do work—to always remember that you're handing your keys over to these agents. And you're giving them permission to do all the things that you could do. So if you hand someone your GitHub keys, you just have to expect that it's very possible that they could destroy or leak or ruin anything that you have permission to do on GitHub.
- Ongoing Ethical Discussions: The power of these advanced models continues to fuel discussions around jailbreaking and ethical behavior. Reports of models like Claude exhibiting unexpected behaviors in controlled (simulated) scenarios, such as resorting to "blackmail" to prevent being shut down, highlight the complexities of aligning advanced AI. (BBC News(bbc.com))

⚙️ OpenAI Updates & Ecosystem Moves

OpenAI continues to refine its models and expand platform capabilities.

Operator in ChatGPT Upgraded

The Operator feature in ChatGPT has been updated with OpenAI's latest reasoning model (an o3 variant), reportedly making it more persistent and accurate when interacting with the browser, leading to improved task success and clearer responses. Available as a research preview to ChatGPT Pro users. (OpenAI Thread(operator.chatgpt.com))
OpenAI Responses API Supports MCP

You can now connect OpenAI models to any remote MCP server using the Responses API with just a few lines of code. (OpenAI Developers(platform.openai.com))

This is huge news to see MCP become more of a standard across all of the major players. I know a lot of people are still doubtful if it'll become kind of the final standard for AIs to talk to each other. But it's interesting to see the big players all agree on something. Because the more they all agree on something, the more the MCP protocol and standards can evolve. And the more we can trust them to work.
Structured Outputs Enhancements

Improvements include parallel function calling now working with strict mode (ensuring adherence to schema) and support for many more keywords (e.g., string lengths/formats via regex, number ranges, min/max array elements). (OpenAI Developers Thread(platform.openai.com))
Acquisition of "io" AI Device Startup

As summarized by Alvaro Cintas, OpenAI reportedly acquired Jony Ive's AI device startup "io" for $6.5 billion. The device is teased for a 2026 launch and is described as being "fully aware of its environment," not a phone or wearable. (OpenAI Announcement(openai.com))

🤖 The Agent Landscape: New Players, Comparisons & Vulnerabilities

The development and deployment of AI agents continue to accelerate, bringing both powerful capabilities and new challenges.

AI Agent Showdown (Codex, Jules, Devin)

Lee Robinson put OpenAI's Codex, Google's Jules, and Devin to the test with a Next.js pages-to-app router migration task on an old repo. Devin reportedly delivered a working PR. Lee also praised Claude Code (Sonnet 4) for its terminal UX. (Lee Robinson Thread(x.com))

My experience with each of these AI agents has been pretty much exactly the same that Lee Robinson pointed out. So I essentially just second everything that he says in this thread.
Flowith.ai Neo Agent Writes a Book

AI Breakfast shared an impressive feat where the Flowith.ai Neo agent wrote a 193-page, 57,800-word sequel to Herman Hesse's Siddhartha overnight, complete with research and character bios. (AI Breakfast(x.com))

🛠️ Developer Tools, Platforms & Voice AI

New tools and platform updates are making it easier to build with and alongside AI.

v0 AI Model Release & Codex CLI Integration

Vercel's v0 has released its own AI model (v0-1.0-md) specialized for web development knowledge. It features an OpenAI-compatible API, text/image inputs, 128K total context, and is priced at $3/1M input tokens & $15/1M output tokens. (v0 Thread(vercel.com)). Jared Palmer also showed how you can use these new v0 models with the OpenAI Codex CLI. (Jared Palmer(vercel.com))
Vercel AI Gateway (Alpha)

Built on the AI SDK 5 alpha, this gateway allows switching between ~100 AI models without managing API keys directly, handling auth, usage tracking, and more. (Vercel(vercel.com))
Kyutai Unmute.sh - Modular Voice AI

Kyutai unveiled Unmute.sh, a platform offering modular speech-to-text and text-to-speech components to voice-enable any text LLM. It features streaming, semantic Voice Activity Detection (VAD) for intelligent turn-taking, and voice cloning. They plan to open-source the components. (Kyutai Thread(x.com), Unmute.sh(unmute.sh))
Microsoft VS Code Enhancements
- New Postgres Plugin: Paul Copplestone highlighted a new Postgres plugin for VS Code with a schema visualizer, database explorer, query history, and intellisense. (Paul Copplestone(techcommunity.microsoft.com))
- MCP Auth Integration: Harald Kirschner noted VS Code's built-in auth flows now include MCP support. (Harald Kirschner(x.com))
Rork 1.0 - AI Mobile App Builder

A new platform, Rork, allows users to create mobile apps powered by Claude 4 (with promotional video by Veo 3 AI). It supports Figma import, Supabase backend, and one-click publishing via Expo. (Rork Thread(x.com))
Google Flow - AI Filmmaking

Google launched Flow, an AI filmmaking tool combining Veo, Imagen, and Gemini models, allowing creation of cinematic clips from natural language descriptions (available via Google AI Pro). (Google Blog(blog.google))
Shopify AI Store Builder

Shopify launched an AI Store Builder that generates e-commerce sites from prompts, including layouts, images, and text. (Shopify Blog(shopify.com))
Developer Joy with Convex

Matt Luo shared his positive experience coding with Convex. (Matt Luo(stack.convex.dev))

✨ Workshop Spotlight: Conquer the Complexity of Cursor ✨

Master practical AI development workflows in Cursor. This hands-on workshop covers Agents, Ask, Custom Modes, multi-file analysis, effective prompting, Cursor rules, and strategies for handling AI failures.

When: Thursday, June 05, 2025, 5:00 AM - 10:00 AM (PDT)

🇫🇷 Europe-Friendly Time! That's 1:00 PM in London, 2:00 PM in Paris & Berlin.

Where: Zoom (Live Q&A included)

Investment: $249

Early Bird Special! Get $50 off (20% discount) if you register by midnight TOMORROW, May 28th! Grab your ticket today!

Read More(egghead.io) | Register Now (buy.stripe.com)

(Team training also available)

⚡ Quick Links / Community Buzz

Cursor Pro Tip for Google Sheets

Fili shared a tip: publish a Google Sheet to the web and add the link to Cursor as documentation to help the AI access and understand table data. (Fili(x.com))

That's the scoop for this issue! It's an incredible time to be a developer in the AI space. The tools are evolving at lightning speed, and the potential for innovation feels limitless. Stay curious and keep building!

If you have any feedback or questions, hit reply! Always happy to chat about the latest in AI dev tools.

John Lindquist
egghead.io(egghead.io)