Pondhouse Data AI - Tips & Tutorials for Data & AI 44

Real-Time Coding with OpenAI Codex-Spark | Figma Design Automation | GLM-5: Affordable High-Performance Coding Model | Agentic Document Extraction Tutorial

Hey there,

This week’s edition is packed with exciting advancements in AI and developer tools. We spotlight OpenAI’s GPT-5.3-Codex-Spark, a real-time coding model that’s reshaping interactive workflows, and explore Claude Code to Figma, which bridges the gap between code-driven UI and collaborative design. Dive into DeepLearning.AI’s Document AI course to unlock agentic document extraction, and discover how GLM-5 empowers advanced engineering and coding tasks with open-source flexibility. Plus, we cover new frameworks for AI task delegation, plugin marketplaces, and generative music with Google Lyria 3.

Enjoy the read!

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Agentic Document Extraction with DeepLearning.AI

🛠️ Tool Spotlight: Claude Code to Figma UI integration

📰 Top News: OpenAI launches GPT-5.3-Codex-Spark model

💡 Tip: Quickly deploy GLM-5 for coding tasks

Let's get started!

Tutorial of the week

Unlocking Agentic Document Extraction with AI

DeepLearning.AI’s new short course, “Document AI: From OCR to Agentic Doc Extraction,” offers a practical deep dive into advanced document processing techniques. If you’re tired of traditional OCR’s limitations, this course shows how AI agents can extract structured data from complex documents—transforming unstructured PDFs, forms, and charts into actionable information.

  • Covers the evolution of OCR, from early shape-based classifiers to modern deep learning models, highlighting why traditional methods often fail with tables, merged cells, and multi-column layouts.

  • Teaches you to build agentic document processing pipelines using LandingAI’s Agentic Document Extraction (ADE) framework, enabling reliable parsing and visual grounding of fields.

  • Includes hands-on labs for extracting text, tables, and charts, converting documents to Markdown and JSON, and integrating ADE into Retrieval-Augmented Generation (RAG) applications.

  • Guides you through deploying event-driven document processing pipelines on AWS, automating extraction and loading parsed data into knowledge bases.

  • Designed for AI builders, developers, and data scientists with basic Python familiarity who want to automate and scale document intelligence workflows.

If you work with financial invoices, medical records, or research papers, this course will equip you with the skills to unlock value from unstructured data. Dive in to learn cutting-edge techniques for intelligent document extraction.

Tool of the week

Claude Code to Figma — Instantly convert production code UIs into editable Figma designs

Claude Code to Figma bridges the gap between code-driven UI development and collaborative design iteration. With this new integration, developers can capture live UIs built in Claude Code and import them directly into Figma as fully editable frames—eliminating the tedious process of manual redrawing and enabling seamless transitions between code and design.

  • Instantly converts rendered browser UIs (from production, staging, or localhost) into structured Figma layers, preserving layout, text, and multi-screen flows.

  • Empowers teams to annotate, iterate, and explore design alternatives collaboratively—without requiring context switches or code rewrites.

  • Supports capturing entire user flows in one session, maintaining sequence and context for more effective design reviews and ideation.

  • Enables roundtrip workflows: changes made in Figma can be synced back to code using the Figma MCP server, ensuring alignment between design and development.

  • Accelerates feedback cycles and decision-making by providing a shared, high-fidelity artifact for designers, engineers, and PMs.

Adopted by leading product teams and rapidly gaining traction in the design and developer communities, Claude Code to Figma is redefining how modern digital products are built. Learn more in the official Figma blog announcement.

Top News of the week

OpenAI Unveils GPT-5.3-Codex-Spark: Real-Time Coding at Unprecedented Speed

OpenAI has launched GPT-5.3-Codex-Spark, its first real-time coding model, marking a significant leap in developer productivity. Designed for ultra-low latency, Codex-Spark delivers over 1000 tokens per second and features a 128k context window, enabling near-instant feedback and rapid iteration for coding tasks. This release is a research preview available to ChatGPT Pro users via the Codex app, CLI, and VS Code extension.

Technically, Codex-Spark is optimized for interactive workflows, making targeted edits and logic tweaks with minimal delay. Powered by Cerebras’ Wafer Scale Engine 3, it leverages a latency-first serving tier, streamlining the request-response pipeline and reducing overhead per client/server roundtrip by 80%. Codex-Spark’s persistent WebSocket connection and optimized Responses API also cut time-to-first-token by 50%, benefiting all OpenAI models. The model is text-only for now, with its own rate limits, and is being rolled out to select API partners for integration testing.

The developer community is already exploring new interaction patterns and use cases enabled by fast inference, with OpenAI planning to expand access and introduce larger, multimodal models in the future. This release sets the stage for a new era of real-time collaboration and high-speed coding.

Also in the news

Google Lyria 3 Brings AI Music Creation to Gemini App

Google has launched Lyria 3, its most advanced generative music model, now integrated into the Gemini app. Users can create custom 30-second tracks with lyrics and AI-generated cover art simply by describing an idea or uploading a photo. All music is watermarked with SynthID for verification, and the feature is available in multiple languages. Lyria 3 aims to make music creation accessible and fun, with enhanced creative controls and responsible AI safeguards.

Anthropic Publishes Real-World Study of AI Agent Autonomy

Anthropic has released a comprehensive analysis of millions of Claude Code sessions and API tool calls, measuring how much autonomy users grant AI agents in practice. The study found that session durations nearly doubled, experienced users increasingly auto-approve actions, and agents are used in emerging domains like healthcare and finance. The research highlights the need for post-deployment monitoring and adaptive oversight as agent capabilities and risks evolve.

Google DeepMind Proposes Formal Framework for AI Task Delegation

A new paper from Google DeepMind introduces a structured framework for intelligent AI task delegation in multi-agent systems. The framework covers task decomposition, assignment, monitoring, trust calibration, permission handling, and verifiable completion. It aims to move beyond ad-hoc prompt-based delegation, improving accountability, reliability, and safety for agent-based workflows at scale. The approach is designed to support robust, adaptive coordination and prevent cascading failures.

DialogLab: Open-Source Platform for Simulating Human-AI Group Conversations

Google Research has released DialogLab, an open-source tool for authoring and simulating dynamic multi-party conversations between humans and AI agents. DialogLab lets users configure roles, personas, and conversation flows, blending scripted and improvisational dialogue. Studies show its human control mode offers greater realism and engagement than fully autonomous setups, making it valuable for researchers and developers building advanced conversational AI systems.

Cursor Launches Plugin Marketplace for End-to-End Development Workflows

Cursor has introduced a marketplace for plugins, enabling integration with external tools and curated partner solutions across the development lifecycle. Users can build custom workflows by connecting to services like AWS, Stripe, Figma, and Databricks, or create and share their own plugins. This expansion enhances productivity and flexibility for developers, supporting everything from infrastructure deployment to analytics and payments.

Tip of the week

Quickly Deploy GLM-5 for Advanced Engineering and Coding Tasks

Need a powerful open-source LLM for complex engineering, coding, or agentic workflows? GLM-5, with 744B parameters and Mixture-of-Experts design, is now available for both cloud and local deployment—no vendor lock-in required.

  • Why GLM-5?

    • Excels at multi-step planning, coding, and business simulation tasks, outperforming most open-source models on benchmarks like SWE-bench and Vending Bench 2.

    • Supports long-context reasoning (up to 200K tokens) and document generation (Word, PDF, Excel).

  • How to get started:

    • Cloud: Try GLM-5 instantly via Z.ai (switch to "GLM-5" in model options) or integrate via API access.

    • Local: Download weights from Hugging Face or ModelScope. Supports vLLM, SGLang, and runs on both NVIDIA and non-NVIDIA chips.

    • Coding Agents: Update your agent config (e.g., set "GLM-5" in ~/.claude/settings.json for Claude Code) to leverage the new model.

  • Pro tip:

    • Use GLM-5’s Agent mode to automate document creation and multi-turn tasks—perfect for generating reports, proposals, or spreadsheets directly from prompts.

For full setup instructions and benchmarks, check the official GLM-5 launch post.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help