Pondhouse Data AI - Tips & Tutorials for Data & AI 20

Real-World AI Usage and Use-Case Statistics | No-Framework, easy to use AI Agents | Local Vision AI Now Possible | Learn about the best coding assistant

Hey there,

This week's newsletter focuses on practical AI implementations. Our main tutorial breaks down AI agents into simple, understandable components - no fancy frameworks needed. We're also looking at Anthropic's new Economic Index, which provides actual data on how companies are using AI today, and we'll show you how to run vision AI locally with SmolVLM's impressively small models.

Plus: Our take on finding realistic AI use cases that actually save time.

Enjoy the read!

Cheers, Andreas & Sascha

In todays edition:

📚 Tutorial: Build AI Agents Without Frameworks: A step-by-step guide to creating agents using plain Python

🛠️ Tool of the Week: Aider: Why it's currently the best AI coding assistant

📊 Top News: Anthropic's Economic Index: First real data on how companies actually use AI

💡 Also in the News: Microsoft's OmniParser V2 for GUI automation and xAI releases Grok 3

💪 Tip of the Week: SmolVLM: Run vision AI locally with just 256MB

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

Build AI Agents From Scratch – No Frameworks Needed!

This week, we're once again talking about AI Agents. No wonder - these systems are one of the major value propositions of AI. We’re showing you how to build one without relying on complex frameworks like LangChain or AutoGPT. Why go framework-free? Because understanding the fundamentals gives you ultimate control, easier debugging, and better performance.

What You'll Learn (and Why It Matters):

The linked blog post breaks down agent creation into its essential components. You can create an agent that can, for example, query PostgreSQL databases and search Wikipedia, all with clean, understandable Python code.

  • The Core Concept of AI Agents: It's simpler than you think! An agent is essentially a language model that can:

    • Understand available "tools" (like database queries or web searches).

    • Decide when to use those tools.

    • Remember past interactions (though this is optional, and often unnecessary!).

    • Make decisions based on all of this information.

  • Tools are Just Functions: The article explains "tools" by showing they're simply Python functions with clear descriptions, inputs, and outputs.

  • The Agent Interaction Loop: The tutorial then shows and implements the major ‘secret sauce’ of agents - the agent loop:

    1. User input is received.

    2. The LLM decides if a tool is needed.

    3. If so, the tool is executed.

    4. The LLM formulates a response, potentially using the tool's output. Repeat until done.

  • Why Skip the Framework? While frameworks can speed up initial prototyping, the article highlights the significant benefits of a from-scratch approach:

    • Transparency: You see exactly what the LLM is being told, how decisions are made, and when tools are called.

    • Flexibility: Customize everything – prompts, tool structure, memory – without fighting the framework.

    • Debugging: Pinpoint errors quickly because you control the entire process. Frameworks adds unneccessary complexity.

    • Performance: Load only what you need, optimizing for your specific use case.

Who Should Read the Full Tutorial:

  • Technical Managers: Gain a deep understanding of agent architecture to make informed decisions about AI projects.

  • CEOs: Understand AI agents and see how they can be built with clear, manageable code, not just magic black boxes.

  • Anyone building production-ready AI agents: This framework-free approach is ideal for systems where control, reliability, and performance are of major concern. Or at least a great starting point for anyone wanting to learn how agents work.

Read the full tutorial in the link below.

Tool of the week

Aider - the best AI coding assistant

Creating code is probably one of the best-researched and most stable use-case for AI. While there are still some kinks to work out - AI models are generally very impressive when it comes to creating code.

According to our own tests, the best coding assistant at the moment is Aider. While its terminal-based interface might seem basic, Aider stands out due to its exceptional performance and reliability, particularly when paired with Claude 3.5 Sonnet and o1. It consistently outperforms other AI coding tools in real-world development tasks, scoring among the highest on the SWE Bench, a challenging software engineering benchmark.

Key Features:

  • Best-in-class Multi-file editing capabilities

  • Works with most popular programming languages

  • Compatible with leading LLMs (Claude 3.5 Sonnet, DeepSeek, GPT-4o)

  • Maintains context of your entire git repository

Why It's Special:

  1. Top Performance: Ranks among the highest on SWE Bench, solving real GitHub issues from major open-source projects

  2. Simple Setup: Quick installation via pip and straightforward configuration

  3. Practical Integration: Works alongside your favorite editor or IDE

  4. Voice Coding Support: Code with voice commands

  5. Extensive Compatibility: Supports various LLMs, including local models

Pro Tip: For best results, try it with Claude 3.5 Sonnet or GPT-4o. The tool truly shines when handling complex codebase modifications and multi-file changes.

🔍 Why we recommend it: Among the numerous AI coding assistants we've tested, Aider consistently delivers the most practical and reliable results. Its terminal-based approach might seem basic, but it's precisely this simplicity that makes it so effective for real-world development tasks.

Top News of the week

Anthropic's Economic Index Reveals Real-World AI Usage Patterns

Based on millions of anonymized conversations with Claude.ai, Anthropic released a highly intersting study, offering concrete data about AI usage in the economy rather than speculative forecasts.

Key Findings:

  • 36% of occupations use AI for at least a quarter of their tasks

  • Only 4% of jobs use AI for three-quarters or more of their tasks

  • AI usage splits between augmentation (57%) and automation (43%)

  • Highest adoption in mid-to-high wage technical roles

  • Software development and technical writing lead current usage

Most Interesting Insight: Rather than fully automating jobs, AI is being selectively adopted for specific tasks across different occupations. The data suggests we're heading toward job evolution rather than wholesale replacement.

Usage Distribution:

  1. Computer & Mathematical: 37.2%

  2. Arts & Media: 10.3%

  3. Education & Library: 9.3%

  4. Office & Administrative: 7.9%

  5. Life Sciences: 6.4%

Implications for Business Leaders: The data suggests focusing on strategic AI integration for specific high-value tasks rather than complete process automation. Mid-level technical roles appear to be the sweet spot for current AI implementation.

Why It Matters: This isn't speculation. It's real-world data. It shows AI is currently a collaborative tool, impacting specific tasks in mid-to-high wage jobs, not wholesale job replacement. The Index will be updated regularly, providing more insights for all of us. It moves beyond theory to show specific current adoption patterns.

Opinion: Stop Looking for AI Unicorns, Start Finding Minutes

The AI industry is caught between hyperbolic marketing ("25x productivity increases!") and sobering reality (Google reporting productivity decreases with AI). Meanwhile, businesses struggle with a simple yet profound question: "What can we actually do with AI?"

Here's my take: We're thinking too big.

Instead of seeking revolutionary transformations, ask two simple questions:

  1. What do your employees do every day?

  2. What do they hate doing?

The real power of AI lies in addressing these small, mundane tasks. Let me share some examples from our own experience:

  • Time Tracking: We have an agent that lets employees log time via MS Teams chat (and even voice input). Savings? 30 seconds per day, plus 15 minutes monthly on grammar checks. Tiny? Yes. Valuable? Absolutely. (As we don’t need to log in to our expense tracking app, fill out 2FA, navigate to time tracking and enter the client).

  • Bookkeeping Assistant: Automated the monthly dance of expense exports, receipt checks, and bookkeeper summaries. Saves maybe 30 minutes a month. But again reduces the amount of use-less repetitive tasks.

  • Research Helper: A simple agent that monitors trusted websites for newsletter content. Saves maybe two minutes a day - but saves a lot of dread and procrastination.

  • Document Parsing and Handling: Everyone does it, few realize how much time they spend on it. Of all the ‘tiny’ use-cases advertised here, this is by far the biggest. But arguably also the most complex. One of the few complex use-cases which are currently really ROI-positivie

The Math: These "insignificant" time savings add up if you have 10 of them per day. But the even bigger benefit? Mental bandwidth. When you remove these small frustrations, people have more capacity for high-value work. That’s where the benefit of generative AI - as for 2025 - is. This also aligns with the findings of the Anthropic Economic Index: AI (at least for now) is a tool - a very versatile one, but still just a tool. But like any tool, its value lies not in grand possibilities but in practical, everyday applications.

Start small. Start with minutes. The impact will grow from there.

Also in the news

Microsoft Releases OmniParser V2: AI Gets Better at Using Computers

Microsoft Research has launched OmniParser V2, an upgrade that allows LLMs to better interact with graphical user interfaces. The tool achieves a 60% reduction in latency compared to its predecessor and dramatically improves accuracy in detecting small interface elements.

Key points:

  • Increases GPT-4o's GUI interaction accuracy from 0.8% to 39.6%

  • Works with major LLMs including GPT-4o, Claude Sonnet, and DeepSeek

  • Available as a dockerized Windows system (OmniTool)

This development could accelerate the development of AI agents that can effectively operate computer interfaces, potentially leading to more sophisticated automation tools. More in their announcement blog post.

xAI Releases Grok 3 with Bold Claims

xAI has launched Grok 3, claiming it's "the world's smartest AI." The model introduces two new features: DeepSearch for information synthesis and Think for enhanced reasoning. While available free initially, premium access will be restricted to X Premium+ and SuperGrok subscribers.

Editor's Note: While xAI's marketing claims are notably ambitious, independent testing hasn't validated their performance claims. Nevertheless, early user feedback suggests Grok 3 is a capable model, even if not quite the revolution advertised.

xAIs Benchmark results seem impressive (xAI vendor benchmarks)

Tip of the week

Run Vision AI on Your Phone: SmolVLM Makes It Possible

Want to experiment with vision AI locally? The new SmolVLM models (256M and 500M) from Hugging Face make it possible to run sophisticated vision-language models on consumer devices, including phones. Here's why this matters and how to get started.

Why It's Important:

  • Ship AI with your app: Bundle a 400MB model like you would a database

  • Deep integration: AI becomes a native feature, not an external service

  • Simplified architecture: No complex cloud dependencies or API management

  • Cost predictability: One-time cost (model size) vs ongoing API expenses

  • Offline-first design: AI features work regardless of connectivity

Development Implications:

Due to the very small size - but very good performance, developers benefit in many ways:

  • Treat AI like any other library or component

  • Focus on features rather than infrastructure

  • Simplify testing and deployment

  • Reduce dependency on external services

Quick Start Guide:

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-500M-Instruct")
model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceTB/SmolVLM-500M-Instruct",
    torch_dtype=torch.bfloat16,
    _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Can you describe this image?"}
        ]
    },
]

prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")

generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
)

Pro Tip: For mobile devices, use the WebGPU version available here. It's optimized for mobile hardware and runs directly in your browser.

This approach helps normalize AI as just another tool in our software development toolkit - not a mysterious cloud service requiring special infrastructure and ongoing costs.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help