Pondhouse Data AI - Edition 12

OpenAI’s Realtime API redefines live interactions | Crawling websites for LLMs made easy | Tips on generating structured JSON with LLMs | DeepSeek’s Janus challenges DALL-E

Andreas Nigg
22 Oct

Hey there,

We’re excited to bring you the 12th edition of our Pondhouse AI newsletter — your source for tips and tricks around AI and LLMs. Whether you want to learn about AI concepts, use AI tools effectively, or see inspiring examples, we’ve got you covered.

Let’s get started!

Cheers, Andreas & Sascha

In todays edition:

News: OpenAI launches the Realtime API for low-latency, multimodal applications
Tutorial: FireCrawl – Crawling websites for LLMs with ease.
Also in the news: DeepSeek’s Janus model outperforms in multimodal tasks, and DuckDB adds LLM integration via prompt().
Tip of the week: Use OpenAI’s structured outputs with Pydantic models for valid JSON.
Paper of the week: Agent-as-a-Judge – A new way to evaluate agentic systems.

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

OpenAI Releases Realtime API

OpenAI has launched the Realtime API, helping developers to build low-latency, multimodal applications that support both text and audio as input and output. This API is particularly valuable for creating interactive experiences that demand immediate feedback, such as live translation, real-time conversations, and instant audio analysis.

The major innovation is, that the realtime API does not use a voice model like whisper in front of GPT-4o, but the LLM itself is able to handle the voice input. This drastically reduces latency.

The Realtime API stands out with several key features:

Speech-to-Speech: Direct audio input and output without a text intermediary, ensuring faster, more natural interactions.
Natural Voices: The API supports natural-sounding, steerable voices, which can adapt to specific tones, laugh, whisper, or change emotional expressions in real time.
Multimodal Outputs: Developers can use the API for both text-based processing and real-time audio, opening up possibilities for complex, interactive systems.

For those interested in seeing an example in action, the open-source TEN Agent project has integrated OpenAI’s Realtime API within its system. This showcases how the API can be effectively implemented in real-world agent systems for multimodal applications.

Learn more about OpenAI’s Realtime API here, and check out the TEN Agent project here.

Tutorials & Use Cases

FireCrawl – Crawling Websites for LLMs

Web crawling is a key process when gathering data for large language models (LLMs), especially when you need to create datasets from websites. In this tutorial, we’ll guide you through using FireCrawl, a web crawler specifically designed to make this process efficient and effective for LLM projects.

FireCrawl supports crawling both static and dynamic content (like those rendered with JavaScript), making it a robust tool for modern websites. It automates the transformation of web content into structured formats, such as Markdown or HTML, making it easy to feed directly into machine learning pipelines.

Step-by-Step Guide:

Sign Up and Generate API Key:
- Visit FireCrawl and create an account.
- Go to the API Keys section and generate a new API key.
Install FireCrawl: Install the FireCrawl Python client using the following command:
```
pip install firecrawl-py
```
Crawl a Single Page: Once you’ve set up your API key, here’s a simple code snippet to crawl a single URL without subpages:
```
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="your_api_key")

# Scrape a single URL
scrape_result = app.scrape_url('https://example.com', params={'formats': ['markdown', 'html']})
print(scrape_result)
```
This will return the website in both Markdown and HTML formats, making it ready for direct input into language models.
Crawl an Entire Website: If you need to crawl a website with multiple pages, use this code:
```
crawl_status = app.crawl_url(
    'https://example.com',
    params={
        'limit': 100,
        'scrapeOptions': {'formats': ['markdown']}
    },
    poll_interval=10
)
print(crawl_status)
```
This command will recursively crawl the site, polling every 10 seconds, and return the content in Markdown format.

Why FireCrawl is Ideal for LLM Projects:

Dynamic Content Handling: Works seamlessly with sites using JavaScript for rendering, making it suitable for modern web applications.
Automated Content Transformation: Converts crawled content into structured formats like Markdown or JSON, perfect for LLM ingestion.
Scalability: Whether you need data from a single page or an entire domain, FireCrawl adapts to your project's scale.

Curious about crawling websites asynchronously or integrating FireCrawl with LangChain? Dive deeper into the full blog post here

Also in the news

DeepSeek’s Janus Model: A New Approach to Multimodal AI

Janus is a groundbreaking unified multimodal model designed for both visual understanding and generation. Unlike traditional models, Janus decouples visual encoding into separate pathways for each task, improving performance without complicating the architecture.

What Makes Janus Special:

Decoupled Visual Encoding: Separates the processes of understanding and generating images, allowing for better task-specific performance.
Unified Framework: Uses a single transformer for processing, maintaining simplicity and flexibility.
Next-Level Performance: Trained on 500B text tokens, Janus matches or exceeds task-specific models in both multimodal understanding and image generation.

Discover more about Janus here.

DuckDB Adds 'prompt()' Function: Bringing LLMs to SQL

DuckDB, the fast and efficient SQL engine, has introduced a new prompt() function that enables direct integration of large language models (LLMs) into SQL queries. This allows users to incorporate LLM-powered tasks like text summarization, classification, and entity extraction directly within SQL workflows.

Key Features:

Seamless Integration: Directly invoke LLMs from SQL without switching between different tools or environments.
Unstructured to Structured Data: Transform unstructured text into structured formats on the fly, ideal for tasks like summarizing product reviews or extracting data from free-form text.

A real-world use case of this functionality is discussed in our blog post, where we demonstrate how to use this feature to bring AI directly into your PostgreSQL database.

Explore more about this powerful new feature here.

Surya Model Release by DataLab

DataLab has unveiled Surya, a state-of-the-art model that tackles PDF layout recognition, OCR, and reading order detection. It shows promise for anyone working with documents that need to be transformed into LLM-readable formats. Its layout recognition capabilities make it a promising tool for converting documents like scanned PDFs into structured text.

Why It Matters:
- Surya’s advanced layout recognition makes it an asset for data extraction from complex PDFs.
- It’s ideal for preparing documents for downstream AI processes.

Discover more about Surya here.

Tip of the week

Using OpenAI’s Structured Output Feature with Pydantic Models

One of OpenAI's most valuable features is the ability to generate structured outputs, such as valid JSON, which is particularly useful when working with data-rich applications. By incorporating Pydantic models, you can enforce strict data validation and schemas, ensuring consistent and reliable outputs from your AI models.

Why Use Pydantic with OpenAI?

Pydantic allows you to define data models and constraints, which OpenAI can then follow when generating structured outputs. This helps in avoiding incorrect data types or missing fields.

Example: Ensuring Valid JSON Output

Here’s a simple example of how you can use Pydantic with OpenAI to generate valid JSON responses.

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

This ensures that your output aligns with the schema you’ve defined using Pydantic, eliminating concerns about data integrity.

Benefits:

Data Validation: Helps ensure output matches expected formats.
Error Handling: Simplifies the process of handling invalid data generated by the model.
Structured Data: Ensures the AI output is usable directly in applications that require strict data formats.

For more information on OpenAI’s structured outputs and how to work with Pydantic models, check the full guide here.

Paper of the week

Agent-as-a-Judge Framework for Evaluating Agentic Systems

The paper introduces the Agent-as-a-Judge framework, a novel approach to evaluating agentic systems—systems capable of solving complex tasks in a step-by-step manner, much like a human would. Current evaluation techniques tend to focus solely on the final outcomes, ignoring the intermediate steps that are crucial for a thorough assessment. This paper extends the existing LLM-as-a-Judge concept, enabling more dynamic and real-time feedback throughout the problem-solving process.

The authors apply this framework to the task of code generation using the newly introduced DevAI benchmark, which includes 55 realistic tasks annotated with 365 hierarchical user requirements. By using Agent-as-a-Judge, they demonstrate how it outperforms the previous LLM-as-a-Judge method and even matches human-level evaluations.

Key Contributions:

Intermediate Feedback: Agent-as-a-Judge provides step-by-step evaluations, addressing the limitations of outcome-only evaluation methods.
New Benchmark: DevAI consists of 55 AI development tasks, allowing for a more comprehensive and real-world evaluation of agentic systems.
Improved Accuracy: The framework achieves closer alignment with human evaluators compared to previous methods.

Learn more about this breakthrough in agentic evaluation here.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help