Pondhouse Data AI - Tips & Tutorials for Data & AI 30

BigQuery via MCP | Open 1T Model Kimi K2 | Markdown from Anything | Context7 Live Docs

Hey there,

This week, we’re spotlighting real tools that make agents and AI assistants actually useful—from integrating BigQuery into Claude with just a few config lines, to turning any document into clean Markdown for better LLM inputs.

We also take a look at Kimi K2, a new 1T-parameter open model optimized for agent use, and Context7, a tool that keeps your LLMs grounded in the latest library docs.

Enjoy the read!

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Connect BigQuery to Claude with an MCP Server – full setup guide

🛠️ Tool Spotlight: MarkItDown – Convert anything to clean, LLM-ready Markdown

📰 Top News: Moonshot AI releases Kimi K2 – a 1T open model built for agent workflows

💡 Tips: Context7 – Reliable, up-to-date API docs inside your LLM prompts

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

MCP Server for BigQuery — Connect Your Data Warehouse to AI Agents

In this week’s tutorial, we guide you through setting up an MCP (Model Context Protocol) server in front of BigQuery, allowing AI agents like Claude to access and query your data warehouse seamlessly. Using Google’s MCP Toolbox for Databases, you can expose BigQuery datasets via secure, standardized tools—no custom APIs required.

🚀 What You’ll Learn:

  • How to install the MCP Toolbox binary or Docker and authenticate BigQuery access

  • How to configure tools.yaml to expose SQL-based tools (e.g., get_article_information)

  • How to run the MCP server locally and connect it through the MCP Inspector

  • How to structure tools for dynamic querying: schema inspection and natural-language SQL generation

💡 Why This Matters:

  • Standardized access to BigQuery via MCP enables AI models to query warehouse data without plumbing

  • Secure and scalable with OAuth2/OpenID, connection pooling, and observability

  • Extensible structure—start with a single query tool, then expand into full SQL support

  • Prototype fast, with production-grade defaults from Google’s Toolbox

If you’ve got a data warehouse and want your LLMs to access it directly, this setup is a fast track to LLM-powered analytics.

Tool of the week

MarkItDown — Convert Anything to Markdown, Ready for LLMs

If you work with mixed document formats and need clean, LLM-ready text, MarkItDown is your go-to tool. This lightweight Python utility converts PDFs, Word docs, PowerPoints, Excel sheets, images, audio, HTML, YouTube captions, EPUBs, and more into structured Markdown that LLMs like Claude and GPT-4o can understand effortlessly.

🚀 What Makes It Shine

  • Multi-format support: Handles documents (PDF, DOCX, PPTX), audio transcription, images with OCR, HTML, ZIP, EPUB—even YouTube transcripts.

  • Markdown-first output preserves headings, lists, tables, links—ideal for downstream LLM use.

  • Built-in MCP server support lets you deploy it as an LLM-accessible service with one command via markitdown-mcp.

🧰 Quick Start

pip install 'markitdown[all]'
markitdown path/to/file.pdf -o file.md

Or run it as an MCP server for direct integration with Claude Desktop and other agents.

🔧 Latest Upgrades

  • Plugin architecture, so you can extend support for new file types

  • Stream-friendly, no temp files—converts in memory

  • Grouped dependencies, install only what you need (e.g., PDF, audio)

MarkItDown simplifies the “data ingestion” step for LLM apps with broad document support and LLM-ready output. Deploying it as a tool server via MCP turns any document corpus into an agent-accessible knowledge base.

Top News of the week

Meet Kimi K2 – A 1T-Parameter Open Agentic Model 

Moonshot AI has released Kimi K2, a 1 trillion‑parameter mixture‑of‑experts language model optimized for agentic use, meaning it excels at tool use, multi‑step tasks, and autonomous problem-solving. With an enormous 128 K token context window and a license under a modified MIT (commercial-friendly), Kimi K2 supports OpenAI-style API endpoints and integrates with inference engines like vLLM and TensorRT-LLM.

🧠 Key Performance Highlights:

  • Lead in STEM benchmarks: 97% on MATH‑500, 65.8% on SWE-bench Verified, and solid tool-use results (e.g., 70% on Tau2 and 76% on AceBench).

  • Agentic demos: performs multi-step workflows like salary analysis and trip planning with tool calls and interactive outputs across web, code, and charts.

Moonshot also introduced the Mu and MuonClip optimizers, enabling a stable training run across 15.5 trillion tokens—including a record 1 T-parameter training—while preventing optimizer collapse.

Why It Matters:

  • Sets a new bar for open, agent-capable models, rivaling GPT‑4.1 in coding and reasoning.

  • Ultra-long contexts expand possibilities for document processing and multi-turn agent conversations.

  • Open and compatibility ready: easy migration for existing OpenAI or Anthropic workflows—ideal for enterprise and research use.

Also in the news

OpenAI Introduces Deep Research API

OpenAI has launched the Deep Research API, enabling GPT to run long, multi-step research tasks with structured memory and modular planning. It’s designed for complex queries that require reasoning, synthesis, and tool use across large content sets.

Hugging Face Shares How They Built Their MCP Server

Hugging Face published a detailed breakdown of their experience building an MCP server for the Hub. They share decisions around transport protocols, streaming modes, and handling session state—along with challenges in client interaction and tool updates. It’s a practical guide for anyone designing agent-ready infrastructure.

Perplexity Launches Comet – an AI-Native Browser

Perplexity has introduced Comet, a browser with built-in AI agents that can search, summarize, interact with websites, and even shop for you. Users can give natural commands like “take control of my browser,” and the agent handles the task across tabs.

🔗 Learn more

OpenAI Launches ChatGPT Agent

OpenAI has released ChatGPT Agent, a multimodal assistant that can browse the web, execute code, interact with apps, and create documents—using a secure virtual computer—on behalf of Pro, Plus, and Team users. It merges capabilities from Operator and Deep Research into a unified agentic workflow. Before making any irreversible actions, it requests permission and allows users to interrupt at any time.

Tip of the week

Context7 – Real-Time Library Docs for Your Agent Workflows

Context7 acts as an MCP server that brings the latest docs and working code samples directly into your prompts, ensuring LLMs don’t hallucinate outdated APIs. Whether you're working with React, Next.js, Upstash, or any npm library, Context7 fetches version-specific snippets straight from the source.

🚀 Why It Matters

  • LLMs often rely on old training data or invent APIs—Context7 eliminates hallucinations by providing fresh, accurate information.

  • Ideal for fast-moving libraries—no more wasted time correcting broken code.

🔧 How to Use It

  • Install via npx -y @upstash/context7-mcp or use the one-click add in clients like Cursor, Windsurf, VS Code, Eclipse, Claude, etc.

  • In your prompt, simply write use context7 and the tool injects accurate docs on demand.

🧰 Pro Tips

  • Use prompt rules (e.g., in Cursor or Windsurf) to auto-call Context7 whenever you're asking for code examples or API usage.

  • Context7 is MIT‑licensed, free for personal use, and supports most LLM clients that follow MCP.

Bottom line: For reliable code output and less debugging of hallucinated APIs, Context7 is a must-have tool in any LLM-powered developer workflow.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help