Pondhouse Data AI - Tips & Tutorials for Data & AI 16

Automatic AI Prompt Generation | Multi-Modal Agents with Phidata | Gemini 2.0: Cheaper & Faster - Google is back | OWASP AI Security | AI Agents vs Automation

Hey there,

Welcome to your bi-weekly Pondhouse Data newsletter, where we explore the latest in Data & AI!This week: automated prompt engineering with DSPy, building multi-modal agents with Phidata, a look at Google's speedy Gemini 2.0, and our take on the "AI agent" hype.

Enjoy!

Cheers, Andreas & Sascha

In todays edition:

Automating AI Prompt Generation: Step-by-step DSPy tutorial to save you time and improve results.

Building AI Agents: Get started with Phidata and create multi-modal agents that can see, hear, and act.

Top News: Google releases Gemini 2.0: Google is back: SOTA model at unmatched speed and cheapest costs for LLMs of this quality.

Opinion on the "AI Agent" Hype: Our take on whether agents are truly new or just a new name for automation.

OWASP LLM Security: Essential guides on 10 areas of LLM security. With attack vectors and circumvention strategies.

Other news: Updates on OpenAIs ChatGPT, insights from Andrew Ng around agents and AI trends

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

DSPy: Build Better AI Systems with Automated Prompt Optimization

We've all been there: you spend hours crafting the "perfect" prompt, it works once, then falls apart. Prompt engineering can feel more like guesswork than actual engineering. This makes building reliable AI systems a real challenge.

The Solution: DSPy

DSPy is a framework that brings a systematic, programming-based approach to prompt optimization. It’s about building reliable AI systems instead of endless manual prompt tweaking.

Summarized, DSPy allows you to define WHAT your LLM result should look like - and then automatically creates prompts for you.

Why It Matters:

  • More Reliable: DSPy helps prompts work consistently across different use cases.

  • More Efficient: It automates the optimization process, saving you time.

  • Repeatable Results: Your optimized prompts can be used again and again - and they are optimized for repeatability.

  • Better Performance: DSPy aims for better results compared to just winging it with prompts.

  • Adaptable: It works for different tasks, from chatbots to summarization.

How DSPy Works - 8 Steps:

Here's the gist of how DSPy changes the prompt engineering game:

  1. Define the Task: Start by stating the problem, expected results, and the best LLM. Start by writing some examples. Do this on paper - it’s helpful for the next steps.

  2. Plan the Pipeline: Outline the necessary steps, like a sequence of human-like actions using the dspy.ChainofThought module.

  3. Test it Out: Run examples and see where simple prompts fail.

  4. Get Data: Create a dataset of input/output examples to train the optimizer. Real-world examples are gold! This is the most important step. Make sure to have good examples available. Or create some.

  5. Set the Metric: Create a way to measure how good the results are, with simple, complex or even LLM-based metrics.

  6. Evaluate: Check the system's performance before any optimization.

  7. Optimize: Use a DSPy optimizer (like BootstrapFewShot) to improve your prompt.

  8. Refine: Go back and improve, based on results and new insights.

Find a fully-fledged example for how to improve an AI application by dozens of percentage points by using DSPy to automatically craft a prompt below:

Tool of the week

phidata: Framework to build multi-modal agents

This week, we're spotlighting phidata, a framework for building powerful multi-modal AI agents. If you're looking to go beyond simple chatbots and create sophisticated AI systems, phidata is worth checking out.

Phidata is an open-source framework that lets you build agents that can handle text, images, audio, and video. Think of it as a toolkit for creating AI teammates with:

  • Memory: Agents can remember past interactions.

  • Knowledge: Integrates with knowledge bases, like vector databases.

  • Tools: Connects to external tools (web search, financial data, etc).

  • Reasoning: Some experimental support for step-by-step reasoning.

Why Use Phidata?

  • Simple Start: You can get an agent up and running in just a few lines of code. Example below.

  • Flexible & Powerful: Create complex agents using multiple tools, with structured output and even teams of collaborating agents.

  • Multi-Modal by Default: It naturally handles different data types, not just text.

  • Agentic RAG: Phidata uses "Agentic RAG" to retrieve knowledge, improving quality and saving tokens.

  • Nice UI: Phidata provides a built-in web UI (Agent Playground) to easily test and interact with your agents.

  • Built-in Debugging and Monitoring: Integrated debugging tools and session tracking, making it easier to refine and improve your agents.

Example: Web Search Agent

Here's a simple example of a web search agent using phidata:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo

web_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    show_tool_calls=True,
    markdown=True,
)
web_agent.print_response("Tell me about OpenAI Sora?", stream=True)

Other features:

  • Multi-Agent Teams: Combine multiple agents to solve complex tasks together.

  • Structured Outputs: Get your agent outputs as structured data in Pydantic Models

  • Reasoning Agents: Agents that think step-by-step

  • RAG Agents: Get Agents with knowledge retrieval, saving tokens.

Top News

Google Releases Gemini 2.0 Flash - Faster, Smarter, and Ready for Agents

This week, Google released a major update with the release of Gemini 2.0, and we're particularly excited about it. This new family of models is designed for the "agentic era," and it's showing significant improvements in speed and efficiency.

One thing to note though: The quality of the model itself - so ‘the intelligence part’ is not that much improved - compared to eg. models we already have, like Claude 3.5 Sonnet. The major innovation is providing state of the art quality with high efficiency. What we said earlier - that AI will most likely hit a ceiling when it comes to ‘intelligence’, but that we’ll see dramatic improvements in terms of efficiency - seems to come true.

Why We're Excited:

  • Blazing Fast: Gemini 2.0 Flash, the first model in the 2.0 family, is a "workhorse" model designed for low latency. It's reported to be twice as fast as the previous 1.5 Pro model while outperforming it on key benchmarks.

  • Highly Efficient: Speed often means efficiency, and Gemini 2.0 is no exception. This suggests lower resource consumption and cost, which is good news for everyone.

  • Impressive Performance: Despite its speed, Gemini 2.0 delivers enhanced performance. It's designed for multimodality and complex tasks, suggesting a significant leap in capabilities.

  • Native Multimodality: Gemini 2.0 now handles multimodal output like images and steerable text-to-speech audio, in addition to its strong multimodal input capabilities. It is a truly multimodal model. Especially the realtime video input capabilities are mind blowing and better than anything we had before.

  • Tool Use: It can natively call tools like Google Search, run code, and use third-party functions, opening the door for practical, action-oriented applications.

  • Affordability: While not stated explicitly, its efficiency suggests that using Gemini 2.0 will be cost-effective, making it a good choice for many applications.

Key Takeaways:

  • Agents are Here: Google is clearly pushing towards building agentic AI with Gemini 2.0's capabilities. (Read the full announcement below for more on that)

  • Practical Impact: Gemini 2.0 is already being integrated into Google products like Search, with more integrations coming soon.

  • Developer Focus: Google is making Gemini 2.0 available to developers via Google AI Studio and Vertex AI.

Why This Matters:

Gemini 2.0 isn't just another model upgrade; In fact, the model intelligence isn’t that much of a leap. But the fact that it’s a state of the art model at this speed and cost point is amazing.

Furthermore, the Gemini 2.0 release also shows, that Google is back in the AI game and that the whole AI community moves away from ever-better models no matter the cost. We see more focus on the application layer (say, agents) and the efficiency of the models.

Opinion: Beyond the Buzzword - Are "Agents" Just Automation in a Fancy Suit?

Lately, the term "AI agent" has been everywhere. It's the new hot thing, the buzzword du jour. And while I'm genuinely excited about the potential of agentic AI, I think we need to take a step back and cut through the hype.

Let's be honest: "agent" has become a marketing term as much as a technical one. It's slapped onto everything to make it sound cutting-edge and revolutionary. But when you strip away the jargon, what are we really talking about?

Agents: The Application Layer We've Been Waiting For

Here's my take: agents are, at their core, the application layer of AI models. For a long time, the focus has been on the models themselves – making them bigger, faster, more capable. And while model improvements are important, it feels like we're hitting a point of diminishing returns, or at least a temporary plateau in terms of raw model quality.

The exciting part is that we're finally shifting our attention to how we actually use these models. And that's where agents come in. They're about building applications on top of the foundation that these powerful models provide. Agents represent a move from "look how smart my model is" to "look what my AI can actually do for you."

The Automation Connection: Nothing New Under the Sun?

Now, here's where I might ruffle some feathers. Some people are acting like agents are this completely new, unprecedented thing. But let's be real: agents are, in many ways, just a new form of automation. We've been automating tasks with software for decades. Not even a new form. Just a form of automation.

Think about it: agents are designed to take actions, make decisions, and complete tasks with minimal human intervention. That's the same goal we've always had with automation. The difference is that AI, for example with models like Gemini 2.0, allows us to automate more complex tasks, handle more nuanced situations, and deal with unstructured data in ways that weren't possible before.

So, are agents just "automation" with a fresh coat of AI paint?

Yes and no. Yes, in the sense that the underlying goal is the same: using technology to make our lives easier and more efficient. No, in the sense that AI opens up entirely new possibilities for what we can automate and how we can automate it.

A Word of Caution: Don't Skip the Basics

Here's my final point, and it's an important one. If your company has been lagging on basic IT and data automation, jumping straight into the deep end of multi-agent AI systems might not be the wisest move. Actually a stupid one.

There are likely tons of low-hanging fruit – processes that could be streamlined and automated with good old-fashioned software and data engineering. Don't let the AI hype distract you from these fundamental improvements. Build a solid automation foundation first, then explore how AI can take you to the next level. Don't start with AI if you haven't done your homework yet.

That being said, don’t let this statement drive you into lethargy. Automation is a must to stay competitive. If you can’t think of ways for how to automate “the boring stuff”, but you can think of how to use “agents” - well then use agents. I’m just saying that automation - with or without agents - provides tremendous potentials. And also did so 10 years ago.

The Bottom Line:

Agents are a promising development, but they're not magic. They're a natural evolution of automation, powered by the advancements in AI models. Let's focus on the practical applications, cut through the marketing buzz, and build agentic systems that actually solve real-world problems. And let's not forget the importance of basic automation – it's often the best place to start. It's time to do the next step: Building useful applications. Building useful agents.

Also in the news

OpenAI Releases ChatGPT Canvas and ChatGPT projects - two of the biggest updates to ChatGPT since its launch

OpenAI released two major improvements to their highly successful ChatGPT chatbot application.

First, they released a new tool called “Canvas” - a way to more interactively collaborate with the AI model. Instead of just having a chat window, you now have an editable, rich-text editor and even code window, where you can highlight individual sentences, change code parts or ask for comments on individual text- and code parts. Read more about this here.

The second major feature enhancement was introduced with “ChatGPT projects”. Projects allows to upload files and chats as part of a ‘project’ - any chat can access all materials of one project. While it may sound trivial at first, this is a major UX improvement - as you don’t need to upload the same material again and again when starting a new chat. Read more about this feature here.

Andrew Ng Explores the Rise of AI Agents

Andrew Ng - one of the most renowned scientists around machine learning and AI (Head of Google Brain and Chief Scientist at Baidu) - gave a captivating talk at the BUILD 2024 conference.

He talked about a multitude of highly interesting topics:

  • The AI technology stack

  • Why AI applications now are more important than AI models

  • What are AI agents and how does an AI agentic workflow look like?

  • Challenges in the field

If you have a minute or two, give the video a go - it'll be worth it.

Tip of the week

OWASP Top 10 Security Areas for LLM Applications

The OWASP Top 10 for Large Language Model Applications started in 2023 as a community-driven effort to highlight and address security issues specific to AI applications. Since then, the technology has continued to spread across industries and applications, and so have the associated risks. As LLMs are embedded more deeply in everything from customer interactions to internal operations, developers and security professionals are discovering new vulnerabilities—and ways to counter them.

The latest issue of “OWASP Top 10 for LLM Applications 2025” was just released - and is a must read for any developer and product manager building AI applications.

You can download the issue here - for free.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help