Pondhouse Data OG - We know data & AI
Posts
Pondhouse Data AI - Tips & Tutorials for Data & AI 30

Pondhouse Data AI - Tips & Tutorials for Data & AI 30

BigQuery via MCP | Open 1T Model Kimi K2 | Markdown from Anything | Context7 Live Docs

Andreas Nigg
22 Jul

Hey there,

This week, we’re spotlighting real tools that make agents and AI assistants actually useful—from integrating BigQuery into Claude with just a few config lines, to turning any document into clean Markdown for better LLM inputs.

We also take a look at Kimi K2, a new 1T-parameter open model optimized for agent use, and Context7, a tool that keeps your LLMs grounded in the latest library docs.

Enjoy the read!

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Connect BigQuery to Claude with an MCP Server – full setup guide

🛠️ Tool Spotlight: MarkItDown – Convert anything to clean, LLM-ready Markdown

📰 Top News: Moonshot AI releases Kimi K2 – a 1T open model built for agent workflows

💡 Tips: Context7 – Reliable, up-to-date API docs inside your LLM prompts

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

MCP Server for BigQuery — Connect Your Data Warehouse to AI Agents

In this week’s tutorial, we guide you through setting up an MCP (Model Context Protocol) server in front of BigQuery, allowing AI agents like Claude to access and query your data warehouse seamlessly. Using Google’s MCP Toolbox for Databases, you can expose BigQuery datasets via secure, standardized tools—no custom APIs required.

🚀 What You’ll Learn:

How to install the MCP Toolbox binary or Docker and authenticate BigQuery access
How to configure tools.yaml to expose SQL-based tools (e.g., get_article_information)
How to run the MCP server locally and connect it through the MCP Inspector
How to structure tools for dynamic querying: schema inspection and natural-language SQL generation

💡 Why This Matters:

Standardized access to BigQuery via MCP enables AI models to query warehouse data without plumbing
Secure and scalable with OAuth2/OpenID, connection pooling, and observability
Extensible structure—start with a single query tool, then expand into full SQL support
Prototype fast, with production-grade defaults from Google’s Toolbox

📘 Read the full tutorial

If you’ve got a data warehouse and want your LLMs to access it directly, this setup is a fast track to LLM-powered analytics.

Running an MCP Server for BigQuery: Connect Your Data Warehouse to AI Agents

Learn how to set up and run an MCP (Model Context Protocol) server for BigQuery, enabling your AI agents to query and analyze data directly from your data warehouse.

www.pondhouse-data.com/blog/mcp-server-bigquery

Tool of the week

MarkItDown — Convert Anything to Markdown, Ready for LLMs

If you work with mixed document formats and need clean, LLM-ready text, MarkItDown is your go-to tool. This lightweight Python utility converts PDFs, Word docs, PowerPoints, Excel sheets, images, audio, HTML, YouTube captions, EPUBs, and more into structured Markdown that LLMs like Claude and GPT-4o can understand effortlessly.

🚀 What Makes It Shine

Multi-format support: Handles documents (PDF, DOCX, PPTX), audio transcription, images with OCR, HTML, ZIP, EPUB—even YouTube transcripts.
Markdown-first output preserves headings, lists, tables, links—ideal for downstream LLM use.
Built-in MCP server support lets you deploy it as an LLM-accessible service with one command via markitdown-mcp.

🧰 Quick Start

pip install 'markitdown[all]'
markitdown path/to/file.pdf -o file.md

Or run it as an MCP server for direct integration with Claude Desktop and other agents.

🔧 Latest Upgrades

Plugin architecture, so you can extend support for new file types
Stream-friendly, no temp files—converts in memory
Grouped dependencies, install only what you need (e.g., PDF, audio)

MarkItDown simplifies the “data ingestion” step for LLM apps with broad document support and LLM-ready output. Deploying it as a tool server via MCP turns any document corpus into an agent-accessible knowledge base.

🔗 Explore it on GitHub

GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.

Python tool for converting files and office documents to Markdown. - microsoft/markitdown

github.com/microsoft/markitdown

Meet Kimi K2 – A 1T-Parameter Open Agentic Model

Moonshot AI has released Kimi K2, a 1 trillion‑parameter mixture‑of‑experts language model optimized for agentic use, meaning it excels at tool use, multi‑step tasks, and autonomous problem-solving. With an enormous 128 K token context window and a license under a modified MIT (commercial-friendly), Kimi K2 supports OpenAI-style API endpoints and integrates with inference engines like vLLM and TensorRT-LLM.

🧠 Key Performance Highlights:

Lead in STEM benchmarks: 97% on MATH‑500, 65.8% on SWE-bench Verified, and solid tool-use results (e.g., 70% on Tau2 and 76% on AceBench).
Agentic demos: performs multi-step workflows like salary analysis and trip planning with tool calls and interactive outputs across web, code, and charts.

Moonshot also introduced the Mu and MuonClip optimizers, enabling a stable training run across 15.5 trillion tokens—including a record 1 T-parameter training—while preventing optimizer collapse.

Why It Matters:

Sets a new bar for open, agent-capable models, rivaling GPT‑4.1 in coding and reasoning.
Ultra-long contexts expand possibilities for document processing and multi-turn agent conversations.
Open and compatibility ready: easy migration for existing OpenAI or Anthropic workflows—ideal for enterprise and research use.

🔗 Explore Kimi K2 on GitHub and Moonshot’s site

Kimi K2: Open Agentic Intelligence

Kimi K2 is our latest Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. It achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models.

moonshotai.github.io/Kimi-K2

Also in the news

OpenAI Introduces Deep Research API

OpenAI has launched the Deep Research API, enabling GPT to run long, multi-step research tasks with structured memory and modular planning. It’s designed for complex queries that require reasoning, synthesis, and tool use across large content sets.

🔗 Explore the API

Hugging Face Shares How They Built Their MCP Server

Hugging Face published a detailed breakdown of their experience building an MCP server for the Hub. They share decisions around transport protocols, streaming modes, and handling session state—along with challenges in client interaction and tool updates. It’s a practical guide for anyone designing agent-ready infrastructure.

🔗 Read the full post

Perplexity Launches Comet – an AI-Native Browser

Perplexity has introduced Comet, a browser with built-in AI agents that can search, summarize, interact with websites, and even shop for you. Users can give natural commands like “take control of my browser,” and the agent handles the task across tabs.

🔗 Learn more

OpenAI Launches ChatGPT Agent

OpenAI has released ChatGPT Agent, a multimodal assistant that can browse the web, execute code, interact with apps, and create documents—using a secure virtual computer—on behalf of Pro, Plus, and Team users. It merges capabilities from Operator and Deep Research into a unified agentic workflow. Before making any irreversible actions, it requests permission and allows users to interrupt at any time.

🔗 Learn more about ChatGPT Agent

Tip of the week

Context7 – Real-Time Library Docs for Your Agent Workflows

Context7 acts as an MCP server that brings the latest docs and working code samples directly into your prompts, ensuring LLMs don’t hallucinate outdated APIs. Whether you're working with React, Next.js, Upstash, or any npm library, Context7 fetches version-specific snippets straight from the source.

🚀 Why It Matters

LLMs often rely on old training data or invent APIs—Context7 eliminates hallucinations by providing fresh, accurate information.
Ideal for fast-moving libraries—no more wasted time correcting broken code.

🔧 How to Use It

Install via npx -y @upstash/context7-mcp or use the one-click add in clients like Cursor, Windsurf, VS Code, Eclipse, Claude, etc.
In your prompt, simply write use context7 and the tool injects accurate docs on demand.

🧰 Pro Tips

Use prompt rules (e.g., in Cursor or Windsurf) to auto-call Context7 whenever you're asking for code examples or API usage.
Context7 is MIT‑licensed, free for personal use, and supports most LLM clients that follow MCP.

Bottom line: For reliable code output and less debugging of hallucinated APIs, Context7 is a must-have tool in any LLM-powered developer workflow.

🔗 Explore Context7 on GitHub

Context7 - Up-to-date documentation for LLMs and AI code editors

Generate context with up-to-date documentation for LLMs and AI code editors

context7.com

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help