Pondhouse Data AI - Tips & Tutorials for Data & AI 22

Manus AI: Is This What AGI Looks Like? | Build Low-Code AI Agents with n8n | Privacy-Friendly Document Extraction | Add Memory to Your Agents

Hey there,

This week, we bring you practical tools for AI development: from building agents with n8n's low-code platform to examining China's impressive Manus AI system. We look at SmolDocling for document processing, investigate why multi-agent systems sometimes underperform, and share a straightforward technique for adding memory to your conversational agents.

Enjoy the read!

Cheers, Andreas & Sascha

In today's edition:

📚 Tutorial of the Week: Building AI Agents with n8n - A Low-Code Approach to AI Workflows

🛠️ Tool Spotlight: SmolDocling: A tiny OCR model for high-quality and privacy-friendly document data extraction

đź“° Top News: Manus AI - The first glimpse on AGI?

đź’ˇ Tips: Add Memory to Your AI Agents for Better Conversations

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

Building AI Agents with n8n - A Low-Code Approach to AI Workflows

For developers who've worked with custom-built AI agents, you know the challenge of managing API integrations and maintaining complex code. This week, we explore a pragmatic alternative using n8n as a low-code platform for AI agent development.

Our tutorial demonstrates how to leverage n8n's visual workflow system to build an agent that can:

  • Process user input through a chat interface

  • Make decisions on which tools to use for specific queries

  • Execute Wikipedia searches when factual information is needed

  • Perform dynamic SQL queries against a PostgreSQL database

  • Maintain conversation context through configurable memory systems

The implementation is quite simple: it starts by connecting a Chat Trigger node to an AI Agent node (using OpenAI's GPT-4o), then extending functionality through tool nodes. The workflow execution is transparent, showing exactly how the agent routes requests and processes responses at each step.

A very welcome and dynamic observation is n8n's ability to handle the API complexity while still allowing for custom code when needed. For example, you can use expressions like {{ $fromAI('placeholder_name') }} to dynamically inject AI-generated SQL queries into database connections.

We also address practical considerations like Docker deployment options, data persistence, and the limitations of using this approach in high-volume production environments versus prototyping scenarios.

This method won't replace custom-built agents for production-critical systems, but it offers a significant advantage for rapid development and testing of complex agent behaviors without writing hundreds of lines of API integration code.

Tool of the week

SmolDocling: High-Quality document data extraction with privacy-preserving easy self-hosting option

Earlier this month, Mistral released their hosted OCR API. First impressions showed that it is one of the best OCR solutions available for now. However, in our tests there were two main shortcomings:

  • It’s cloud-only, meaning it might be difficult for privacy reasons

  • For table extractions, it sometimes opts to create images instead of extracting the tables, making it not ideal for technical documentation

That’s where our tool of the week comes into play: SmolDocling-256M-preview. Developed by the Docling Team at IBM Research, this image-text-to-text model brings high-quality document processing capabilities in a significantly smaller package.

Top News of the week

ManusAI: The first glimpse at what “AGI” might look like?

A new AI agent named Manus has captured significant attention in the global AI community over the past week. Developed by Wuhan-based startup Butterfly Effect, Manus represents a fundamental shift in AI assistant architecture compared to systems like ChatGPT and Claude.

Manus AI sets new records in virtually any general AI assistant benchmarks

What Makes Manus Different?

Unlike traditional chatbots that rely on a single large language model, Manus employs a multi-agent approach, utilizing several AI models working in coordination - including Anthropic's Claude 3.5 Sonnet and fine-tuned versions of Alibaba's open-source Qwen. This architecture enables Manus to function as what the company calls "the world's first general AI agent" - designed to work autonomously across a wide range of tasks.

The system's defining feature is "Manus's Computer" - a window that allows users to observe the agent's work in real time and intervene when necessary. This creates a significantly more transparent and collaborative experience compared to black-box AI assistants.

MIT Technology Review's Assessment

In hands-on testing with three complex tasks - compiling journalist lists, apartment hunting, and candidate research - Manus demonstrated impressive capabilities but also revealed limitations:

Strengths:

  • Highly intuitive interface accessible to non-technical users

  • Effective at breaking down complex tasks into logical steps

  • Transparent workflow allowing real-time observation and intervention

  • Superior results to ChatGPT DeepResearch on certain tasks

  • Cost-effective at approximately $2 per task (one-tenth of DeepResearch)

  • Ability to learn from feedback and refine outputs

Limitations:

  • System instability and crashes during high service loads

  • Difficulty processing large text chunks

  • Occasional "cutting corners" to expedite tasks

  • Challenges with paywalled content and CAPTCHA systems

  • Incomplete research when faced with broad, open-ended tasks

The review suggests Manus is particularly well-suited for analytical tasks requiring extensive open web research but with defined parameters - essentially serving as a "highly intelligent and efficient intern."

A Glimpse of AGI Capabilities?

Some observers have suggested that Manus's architecture - with multiple specialized AI models working together to solve diverse problems - might represent an early conceptual step toward what Artificial General Intelligence could eventually look like.

The system's ability to decompose complex tasks, navigate information resources independently, and adapt to user feedback shows a more flexible form of problem-solving than single-model approaches. This coordination between specialized systems mirrors certain theoretical frameworks for how AGI might eventually be constructed.

That said, we very much caution against overstating the comparison. We still don't know what pathway to AGI might prove successful, or even what specific capabilities would define true AGI. Manus faces significant limitations in reasoning depth, robustness, and breadth of capabilities that highlight the substantial gap between today's systems and the hypothetical capabilities of true general intelligence. But still - Manus is a remarkable tool and one which you might try: Despite it’s limitations, it can help tremendously.

Also in the news

Microsoft Explores Paying Contributors to AI Training Data

Microsoft has launched research into technology that could identify and compensate individuals whose work significantly influences AI outputs. The project, described as "training-time provenance," would trace which human-created content was most essential to specific AI generations, potentially establishing a payment framework for creators. This approach contrasts with the industry's current direction, where major AI labs have typically avoided individual contributor payments while advocating for broader fair use protections.

AI Leaders Challenge AGI Hype with More Grounded Perspectives

A growing number of prominent AI researchers are publicly questioning optimistic predictions about Artificial General Intelligence (AGI). While Anthropic's Dario Amodei predicts human-level AI by 2026 and OpenAI's Sam Altman claims his company knows how to build "superintelligent" systems, others are bringing more measured viewpoints. Thomas Wolf (Hugging Face) calls such predictions "wishful thinking," arguing that breakthrough discoveries require asking entirely new questions—something current LLMs struggle with. Similarly, Yann LeCun (Meta) dismissed the idea that LLMs could achieve AGI as "nonsense," while Kenneth Stanley (Lila Sciences) highlights creativity as a fundamental capability missing from today's AI. These "AI realists" aren't opposing progress but focusing attention on concrete barriers between current technology and truly general intelligence.

Highly recommended watch: Yann LeCun (Head of AI at Meta) talking about realistic AI outcomes:

New Paper: Why AI Multi-Agents still fail?

A new highly interesting study has identified why Multi-Agent Systems (MAS), where multiple LLM agents collaborate on tasks, aren't significantly outperforming single-agent approaches despite industry excitement. Researchers analyzed five popular MAS frameworks across 150 diverse tasks, identifying 14 distinct failure modes organized into three categories: specification failures, inter-agent misalignment, and verification problems.

Common issues include agents ignoring task specifications, failing to verify outputs, losing conversation history, and poor communication between agents. The research challenges assumptions that simply connecting multiple LLMs automatically leads to better performance. The findings suggest that effective multi-agent systems require more sophisticated orchestration, clearer role definitions, and structured verification mechanisms.

Highly recommended read:

Tip of the week

Add Memory to Your AI Agents for Better Conversations

This week's tip focuses on enhancing AI agents with memory capabilities - an interesting feature for creating more natural, context-aware conversational experiences.

Why Memory Matters for AI Agents

When building custom AI agents, one of the biggest limitations is their inability to remember previous interactions. Without memory, your agent starts each conversation fresh, forcing users to constantly repeat information and context. This creates a disjointed experience that feels unnatural and inefficient.

Implementing a Simple Memory System

The article below details a practical approach to adding memory to AI agents with these key components:

  1. Memory Configuration - Define parameters like maximum message count before summarization and connection details for persistent storage

  2. Conversation Storage - Save user inputs, agent responses, and tool calls in a database for future reference

  3. Context Summarization - Automatically create summaries of past exchanges to maintain manageable context size

  4. Context Retrieval - Fetch relevant conversation history and inject it into the agent's prompt before processing new requests

When to Use Memory

Memory is most valuable for:

  • Multi-turn conversations where context builds over time

  • Agents that need to recall user preferences or previous decisions

  • Applications where users might continue conversations after interruptions

However, for simple one-off queries or when privacy is paramount, you might choose a stateless design instead.

The technique we demonstrate can be implemented with any LLM and requires only basic database capabilities, making it accessible for most development scenarios.

Read the full implementation guide in the article below.

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help