Pondhouse Data OG - We know data & AI
Posts
Pondhouse Data AI - Tips & Tutorials for Data & AI 26

Pondhouse Data AI - Tips & Tutorials for Data & AI 26

Auto-generate MCP servers from existing APIs | AlphaEvolve Revolutionizes Algorithm Design | Real-time Video Generation with LTX-Video | GPT-image-1 Takes AI Image Gen to New Heights

Andreas Nigg
20 May

Hey there,

Google's AlphaEvolve is discovering algorithms that boost efficiency across data centers and AI training, while their new Gemini 2.5 Pro I/O Edition is setting new benchmarks in code generation.
We also dive into OpenAI's GPT-image-1, which brings unprecedented text rendering and editing capabilities to AI image creation.
Plus, learn how to automate MCP server creation and generate impressive videos in real-time with LTX-Video.

As always, tools, news, and tips to help you build smarter with AI—let’s get into it.

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Automate MCP Server Creation from OpenAPI and FastAPI

🛠️ Tool Spotlight: OpenAI GPT-image-1 - The Next Evolution in AI Image Generation

📰 Top News: AlphaEvolve - Google's Gemini-Powered Agent Discovers and Optimizes Algorithms

💡 Tips: Create High-Quality AI Videos with LTX-Video

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

Automatically wrap existing APIs in MCP servers - using FastMCP

MCP servers provide a standardized API optimized for LLMs, but creating a wrapper around your existing APIs can feel redundant. Our tutorial shows you how to automatically generate an MCP server from your existing OpenAPI specifications or FastAPI applications, eliminating duplicate work.

Generate an MCP Server from OpenAPI in Minutes

With FastMCP (version 2.0.0+), you can transform any API with an OpenAPI specification into an MCP server with just a few lines of code:

import httpx
from fastmcp import FastMCP

# Connect to your existing API
api_client = httpx.AsyncClient(base_url="https://api.my-api-url.com")

# Load your OpenAPI spec
spec = {...}  # Your OpenAPI specification as a Python dict

# Create an MCP server from your OpenAPI spec
mcp = FastMCP.from_openapi(openapi_spec=spec, client=api_client)

if __name__ == "__main__":
    mcp.run()

FastMCP intelligently maps your API routes to appropriate MCP components:

GET endpoints without path parameters become resources
GET endpoints with path parameters become resource templates
POST, PUT, DELETE endpoints become tools

FastAPI Integration: Even Simpler

If you're using FastAPI, the process is even more straightforward:

from fastmcp import FastMCP
from fastapi import FastAPI

# Your existing FastAPI app
app = FastAPI()

@app.get("/items")
def list_items():
    return [{"id": 1, "name": "Item 1"}, {"id": 2, "name": "Item 2"}]

# Create and run your MCP server
mcp = FastMCP.from_fastapi(app=app)
mcp.run()

This integration runs directly on the ASGI transport with no additional overhead and supports all FastAPI features, including authentication.

Almost unbelievable, but that is all you need to create your own MCP server from existing APIs. For more information as well as for some background on why you’d need an MCP server in the first place, visit our latest article:

Automating MCP Server Creation from OpenAPI and FastAPI

Learn how to automatically generate MCP servers from your existing OpenAPI specifications and FastAPI applications for seamless AI integration.

www.pondhouse-data.com/blog/automating-mcp-server-creation

Tool of the week

Tool Spotlight: OpenAI GPT-image-1 - The Next Evolution in AI Image Generation

OpenAI has raised the bar for AI image generation with their latest model, GPT-image-1. Unlike its DALL-E predecessors, this is a natively multimodal language model that brings superior instruction following, text rendering, and real-world knowledge to image creation.

Key Capabilities

The Image API provides three distinct endpoints:

Generations: Create images from text prompts
Edits: Modify existing images or generate new ones using reference images
Inpainting: Replace specific parts of an image using transparent masks

What Sets GPT-image-1 Apart

Superior instruction following: More precisely follows detailed prompts
Better text rendering: Significantly improved text clarity in generated images
Advanced editing capabilities: More accurate and detailed image modifications
Real-world knowledge integration: Leverages world knowledge for more contextually relevant images

Implementation Example

Here's how to generate an image with GPT-image-1:

from openai import OpenAI
import base64

client = OpenAI()

prompt = "A children's book drawing of a veterinarian using a stethoscope to listen to the heartbeat of a baby otter."

result = client.images.generate(
    model="gpt-image-1",
    prompt=prompt
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

with open("otter.png", "wb") as f:
    f.write(image_bytes)

Advanced Features

The model offers substantial customization options:

Size control: Square (1024×1024), portrait (1024×1536), or landscape (1536×1024)
Quality settings: Low, medium, or high (affecting token usage and cost)
Transparent backgrounds: Create images with transparency (PNG/WebP only)
Moderation control: Adjust content filtering strictness with "auto" or "low" settings

Practical Applications

GPT-image-1 shines for use cases requiring:

Complex visual compositions following specific instructions
Images with accurate text elements
Product mockups and marketing visuals
Custom illustrations with detailed requirements
Image editing with reference materials

Considerations

Be aware of potential limitations including:

Latency: Complex prompts may take up to 2 minutes
Cost structure: Based on token usage (input text + output image tokens)
API access: Requires organization verification in some cases

For developers looking to implement state-of-the-art image generation capabilities with more precision and real-world knowledge, GPT-image-1 offers a significant upgrade over previous image generation models.

Example: Combine multiple products into one image

In this example, we'll use 4 input images to generate a new image of a gift basket containing the items in the reference images.

import base64
from openai import OpenAI
client = OpenAI()

prompt = """
Generate a photorealistic image of a gift basket on a white background 
labeled 'Relax & Unwind' with a ribbon and handwriting-like font, 
containing all the items in the reference pictures.
"""

result = client.images.edit(
    model="gpt-image-1",
    image=[
        open("body-lotion.png", "rb"),
        open("bath-bomb.png", "rb"),
        open("incense-kit.png", "rb"),
        open("soap.png", "rb"),
    ],
    prompt=prompt
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("gift-basket.png", "wb") as f:
    f.write(image_bytes)

The 4 base images and the result

Top News: AlphaEvolve - Google's Gemini-Powered Agent Discovers groundbreaking algorithms

Google has unveiled AlphaEvolve, a sophisticated AI system that combines their Gemini large language models with an evolutionary framework to discover and optimize algorithms across mathematics and computing.

How It Works

AlphaEvolve uses an ensemble approach with complementary LLMs:

Gemini Flash generates a broad range of ideas quickly
Gemini Pro provides deeper, more insightful suggestions

The system proposes solutions as code, then uses automated evaluators to verify and score them. The highest-performing solutions are evolved through further iterations, creating increasingly refined algorithms.

In easier terms: AlphaEvolve is a “self-learning” system which learns through experimentation.

Real-World Impact

AlphaEvolve has already been deployed across Google's computing ecosystem with measurable results:

Data Center Optimization: Recovered 0.7% of Google's worldwide compute resources through improved scheduling in their Borg system
Hardware Acceleration: Improved circuit design for upcoming Tensor Processing Units
AI Training Efficiency: Sped up matrix multiplication in Gemini's architecture by 23%, reducing training time by 1%
GPU Instruction Optimization: Achieved a 32.5% speedup for the FlashAttention kernel in Transformer models

Diagram showing how AlphaEvolve helps Google deliver a more efficient digital ecosystem, from data center scheduling and hardware design to AI model training.

Mathematical Breakthroughs

Beyond practical applications, AlphaEvolve has tackled fundamental mathematical problems:

Discovered an improved algorithm for multiplying 4×4 complex-valued matrices using 48 scalar multiplications, advancing beyond Strassen's 1969 algorithm
Made progress on approximately 20% of the 50+ open mathematical problems it was tested on
Established a new lower bound for the kissing number problem in 11 dimensions with 593 spheres

Future Availability

Google plans to make AlphaEvolve available through an Early Access Program for academic users initially, with potential broader availability later. The company expects the system's capabilities to continue improving alongside advancements in large language models.

Disadvantages

While the results are almost groundbreaking - just imagine that we got a new algorithm in mathematics, for a problem which is out there for decades - there is one slight drawback: The system works well for tightly defined problem spaces with clear testing scenarios (so that the system can learn what is good and what is bad). Whether such an approach is useful for more dynamic areas - like let’s say unclear customer requirements - is to be seen.

Read Google full announcement here:

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators

deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms

Also in the news

Googles next W: Google's Gemini 2.5 Pro I/O Edition Claims AI Coding Crown

Google DeepMind has released a new version of Gemini 2.5 Pro dubbed the "I/O Edition," which has dethroned Anthropic's Claude 3.7 Sonnet as the top-performing AI coding model. The updated model scored 1499.95 on the WebDev Arena Leaderboard, significantly outperforming Claude's 1377.10 score.

Available immediately to developers through Google AI Studio and Vertex AI without pricing changes ($1.25/$10 per million tokens in/out), this version shows remarkable improvements in generating functional web applications and interactive interfaces from single prompts.

Developers are already praising its capabilities, with Cognition's Silas Alberti noting it was "the first model to successfully complete a complex refactoring of a backend routing system," while Cursor CEO Michael Truell reported "a marked decrease in tool call failures." The upgrade comes strategically ahead of Google's annual I/O developer conference scheduled for May 20-21.

Gemini 2.5 Pro

Gemini 2.5 Pro is our most advanced model for complex tasks. With thinking built in, it showcases strong reasoning and coding capabilities.

deepmind.google/technologies/gemini/pro

Microsofts ADeLe: A New Framework for Predicting AI Model Performance

Microsoft researchers have developed ADeLe (annotated-demand-levels), a quite novel approach to AI model evaluation that goes beyond measuring accuracy to predict performance on unfamiliar tasks and explain why models succeed or fail.

The framework assesses AI models across 18 different cognitive and knowledge-based abilities, rating tasks from 0-5 based on how much they demand each capability. By comparing what a task requires with what a model can deliver, ADeLe creates comprehensive "ability profiles" that reveal each AI's specific strengths and weaknesses.

When tested across 16,000 examples from 63 tasks, the system achieved approximately 88% accuracy in predicting the performance of models like GPT-4o and LLaMA-3.1-405B. The research also uncovered significant limitations in current benchmarking approaches, finding that many popular AI tests either don't measure what they claim or only cover a limited range of difficulty levels.

This breakthrough could transform how AI systems are evaluated before deployment, enabling researchers and developers to anticipate potential failures and understand model capabilities with much better clarity.

Read their announcement:

Predicting and explaining AI model performance: A new approach to evaluation

www.microsoft.com/en-us/research/blog/predicting-and-explaining-ai-model-performance-a-new-approach-to-evaluation/?utm_source=alphasignal

Tip of the week

LTX-Video: Stunning hyper-realistic video generation model

Want to generate impressive AI videos without waiting hours for processing? This week, we're exploring LTX-Video, the first DiT-based video generation model capable of creating high-quality videos in real-time.

LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 30 FPS videos at 1216×704 resolution, faster than it takes to watch them. Most importantly, you can generate videos from image as well as text.

Make sure to check out their amazing example videos over at their github page:

Quick Start Options

You can start generating videos immediately through these online interfaces:

LTX-Studio for image-to-video (two model options)
Fal.ai for both text-to-video and image-to-video
Replicate for both generation methods

Prompt Engineering for Better Results

The secret to getting great results lies in how you structure your prompts:

Start with the main action in a single, clear sentence
Add specific movement details - be explicit about gestures and motion
Describe appearances precisely rather than abstractly
Include environmental details and background elements
Specify camera angles and movements for more cinematic control
Keep descriptions literal and chronological - think like a cinematographer

For example, instead of "A beautiful sunset at the beach," try: "A wide-angle shot of golden sunlight reflecting on gentle waves at a sandy beach. The camera slowly pans right as the orange sun descends toward the horizon, casting long shadows across the rippled sand."

Local Installation

If you prefer running it locally:

git clone https://github.com/Lightricks/LTX-Video.git
cd LTX-Video
python -m venv env
source env/bin activate
python -m pip install -e .$$inference-script$$

# Basic text-to-video generation
python inference.py --prompt "Your detailed prompt here" --height 704 --width 1216 --num_frames 121 --seed 42 --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml

Pro Tip

For the absolute best results, use the ComfyUI integration by following the setup at ComfyUI-LTXVideo. This provides greater control over generation parameters and supports advanced workflows like video extension and multi-condition generation.

GitHub - Lightricks/LTX-Video: Official repository for LTX-Video

Official repository for LTX-Video. Contribute to Lightricks/LTX-Video development by creating an account on GitHub.

github.com/Lightricks/LTX-Video

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help