Pondhouse Data OG - We know data & AI
Posts
Pondhouse Data AI - Tips & Tutorials for Data & AI 44

Pondhouse Data AI - Tips & Tutorials for Data & AI 44

Real-Time Coding with OpenAI Codex-Spark | Figma Design Automation | GLM-5: Affordable High-Performance Coding Model | Agentic Document Extraction Tutorial

Andreas Nigg
24 Feb

Hey there,

This week’s edition is packed with exciting advancements in AI and developer tools. We spotlight OpenAI’s GPT-5.3-Codex-Spark, a real-time coding model that’s reshaping interactive workflows, and explore Claude Code to Figma, which bridges the gap between code-driven UI and collaborative design. Dive into DeepLearning.AI’s Document AI course to unlock agentic document extraction, and discover how GLM-5 empowers advanced engineering and coding tasks with open-source flexibility. Plus, we cover new frameworks for AI task delegation, plugin marketplaces, and generative music with Google Lyria 3.

Enjoy the read!

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Agentic Document Extraction with DeepLearning.AI

🛠️ Tool Spotlight: Claude Code to Figma UI integration

📰 Top News: OpenAI launches GPT-5.3-Codex-Spark model

💡 Tip: Quickly deploy GLM-5 for coding tasks

Let's get started!

Tutorial of the week

Unlocking Agentic Document Extraction with AI

DeepLearning.AI’s new short course, “Document AI: From OCR to Agentic Doc Extraction,” offers a practical deep dive into advanced document processing techniques. If you’re tired of traditional OCR’s limitations, this course shows how AI agents can extract structured data from complex documents—transforming unstructured PDFs, forms, and charts into actionable information.

Covers the evolution of OCR, from early shape-based classifiers to modern deep learning models, highlighting why traditional methods often fail with tables, merged cells, and multi-column layouts.
Teaches you to build agentic document processing pipelines using LandingAI’s Agentic Document Extraction (ADE) framework, enabling reliable parsing and visual grounding of fields.
Includes hands-on labs for extracting text, tables, and charts, converting documents to Markdown and JSON, and integrating ADE into Retrieval-Augmented Generation (RAG) applications.
Guides you through deploying event-driven document processing pipelines on AWS, automating extraction and loading parsed data into knowledge bases.
Designed for AI builders, developers, and data scientists with basic Python familiarity who want to automate and scale document intelligence workflows.

If you work with financial invoices, medical records, or research papers, this course will equip you with the skills to unlock value from unstructured data. Dive in to learn cutting-edge techniques for intelligent document extraction.

Document AI: From OCR to Agentic Doc Extraction

Build agentic systems to parse documents and extract information grounded in visual components like charts, tables, and forms.

www.deeplearning.ai/short-courses/document-ai-from-ocr-to-agentic-doc-extraction

Tool of the week

Claude Code to Figma — Instantly convert production code UIs into editable Figma designs

Claude Code to Figma bridges the gap between code-driven UI development and collaborative design iteration. With this new integration, developers can capture live UIs built in Claude Code and import them directly into Figma as fully editable frames—eliminating the tedious process of manual redrawing and enabling seamless transitions between code and design.

Instantly converts rendered browser UIs (from production, staging, or localhost) into structured Figma layers, preserving layout, text, and multi-screen flows.
Empowers teams to annotate, iterate, and explore design alternatives collaboratively—without requiring context switches or code rewrites.
Supports capturing entire user flows in one session, maintaining sequence and context for more effective design reviews and ideation.
Enables roundtrip workflows: changes made in Figma can be synced back to code using the Figma MCP server, ensuring alignment between design and development.
Accelerates feedback cycles and decision-making by providing a shared, high-fidelity artifact for designers, engineers, and PMs.

Adopted by leading product teams and rapidly gaining traction in the design and developer communities, Claude Code to Figma is redefining how modern digital products are built. Learn more in the official Figma blog announcement.

From Claude Code to Figma: Turning Production Code into Editable Figma Designs | Figma Blog

Now you can take workflows that start in Claude Code even further in Figma.

www.figma.com/blog/introducing-claude-code-to-figma

OpenAI Unveils GPT-5.3-Codex-Spark: Real-Time Coding at Unprecedented Speed

OpenAI has launched GPT-5.3-Codex-Spark, its first real-time coding model, marking a significant leap in developer productivity. Designed for ultra-low latency, Codex-Spark delivers over 1000 tokens per second and features a 128k context window, enabling near-instant feedback and rapid iteration for coding tasks. This release is a research preview available to ChatGPT Pro users via the Codex app, CLI, and VS Code extension.

Technically, Codex-Spark is optimized for interactive workflows, making targeted edits and logic tweaks with minimal delay. Powered by Cerebras’ Wafer Scale Engine 3, it leverages a latency-first serving tier, streamlining the request-response pipeline and reducing overhead per client/server roundtrip by 80%. Codex-Spark’s persistent WebSocket connection and optimized Responses API also cut time-to-first-token by 50%, benefiting all OpenAI models. The model is text-only for now, with its own rate limits, and is being rolled out to select API partners for integration testing.

The developer community is already exploring new interaction patterns and use cases enabled by fast inference, with OpenAI planning to expand access and introduce larger, multimodal models in the future. This release sets the stage for a new era of real-time collaboration and high-speed coding.

🔗 Read the official announcement

Introducing GPT-5.3-Codex-Spark

Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.

openai.com/index/introducing-gpt-5-3-codex-spark

Also in the news

Google Lyria 3 Brings AI Music Creation to Gemini App

Google has launched Lyria 3, its most advanced generative music model, now integrated into the Gemini app. Users can create custom 30-second tracks with lyrics and AI-generated cover art simply by describing an idea or uploading a photo. All music is watermarked with SynthID for verification, and the feature is available in multiple languages. Lyria 3 aims to make music creation accessible and fun, with enhanced creative controls and responsible AI safeguards.

🔗 Read more on the Google Blog

Anthropic Publishes Real-World Study of AI Agent Autonomy

Anthropic has released a comprehensive analysis of millions of Claude Code sessions and API tool calls, measuring how much autonomy users grant AI agents in practice. The study found that session durations nearly doubled, experienced users increasingly auto-approve actions, and agents are used in emerging domains like healthcare and finance. The research highlights the need for post-deployment monitoring and adaptive oversight as agent capabilities and risks evolve.

🔗 Explore the research at Anthropic

Google DeepMind Proposes Formal Framework for AI Task Delegation

A new paper from Google DeepMind introduces a structured framework for intelligent AI task delegation in multi-agent systems. The framework covers task decomposition, assignment, monitoring, trust calibration, permission handling, and verifiable completion. It aims to move beyond ad-hoc prompt-based delegation, improving accountability, reliability, and safety for agent-based workflows at scale. The approach is designed to support robust, adaptive coordination and prevent cascading failures.

🔗 Read the paper on arXiv

DialogLab: Open-Source Platform for Simulating Human-AI Group Conversations

Google Research has released DialogLab, an open-source tool for authoring and simulating dynamic multi-party conversations between humans and AI agents. DialogLab lets users configure roles, personas, and conversation flows, blending scripted and improvisational dialogue. Studies show its human control mode offers greater realism and engagement than fully autonomous setups, making it valuable for researchers and developers building advanced conversational AI systems.

🔗 Learn more at Google Research

Cursor Launches Plugin Marketplace for End-to-End Development Workflows

Cursor has introduced a marketplace for plugins, enabling integration with external tools and curated partner solutions across the development lifecycle. Users can build custom workflows by connecting to services like AWS, Stripe, Figma, and Databricks, or create and share their own plugins. This expansion enhances productivity and flexibility for developers, supporting everything from infrastructure deployment to analytics and payments.

🔗 See the announcement on Cursor’s blog

Tip of the week

Quickly Deploy GLM-5 for Advanced Engineering and Coding Tasks

Need a powerful open-source LLM for complex engineering, coding, or agentic workflows? GLM-5, with 744B parameters and Mixture-of-Experts design, is now available for both cloud and local deployment—no vendor lock-in required.

Why GLM-5?
- Excels at multi-step planning, coding, and business simulation tasks, outperforming most open-source models on benchmarks like SWE-bench and Vending Bench 2.
- Supports long-context reasoning (up to 200K tokens) and document generation (Word, PDF, Excel).
How to get started:
- Cloud: Try GLM-5 instantly via Z.ai (switch to "GLM-5" in model options) or integrate via API access.
- Local: Download weights from Hugging Face or ModelScope. Supports vLLM, SGLang, and runs on both NVIDIA and non-NVIDIA chips.
- Coding Agents: Update your agent config (e.g., set "GLM-5" in ~/.claude/settings.json for Claude Code) to leverage the new model.
Pro tip:
- Use GLM-5’s Agent mode to automate document creation and multi-turn tasks—perfect for generating reports, proposals, or spreadsheets directly from prompts.

For full setup instructions and benchmarks, check the official GLM-5 launch post.

GLM-5: From Vibe Coding to Agentic Engineering

z.ai/blog/glm-5

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help