Pondhouse Data AI - Tips & Tutorials for Data & AI 14

GPT-5 Hits Ceiling | 50% LLM Cost-Saving Tip | How-To: Company Doc Search | Google Adopts OpenAI Standard

Hey there,

Welcome to our Pondhouse Data newsletter about all Data & AI.
This week, we focus on practical implementations: from building smart search across documentation systems to reducing API costs through batch inference. We've also included our analysis of recent industry developments that might affect your AI strategy.

And last but not least, we cover a new study which is one of the first larger studies indicating the impact of AI on the job market.

Let’s get started!

Cheers, Andreas & Sascha

In todays edition:

Headline Story: Are LLMs hitting a performance-ceiling? And why this might be the trigger companies need to start automating their processes.

Tutorial: Build a Company-Wide Search System, using Airbyte and LangChain

Tool Spotlight: LibreChat - The Open-Source ChatGPT Alternative

Cost-Saving Tip: Cut Your AI Costs in Half with Batch Inference

📰 Industry Updates:

Google Gemini Adopts OpenAI's API Standard - A Major Step Towards Standardization in AI

First Research Shows AI's Real Impact on Jobs

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

Building a Smart Search for Your Company Docs: A Practical Approach

If you've ever been frustrated searching through your company's documentation, here's a practical solution: Let's turn your knowledge base into a searchable system that actually understands questions and provides relevant answers. While the provided approach is technically just ordinary RAG - the used combination of tools is interesting as it basically allows to integrate dozens of different documentation systems, using the outlined setup.

The Tools That Make It Work

This setup uses three main components:

  • Airbyte: Pulls your content from existing systems

  • PGVector: A PostgreSQL extension that handles the search

  • Langchain: Connects everything and manages the AI interaction

What Makes This Approach Different

The key advantage here is simplicity. Thanks to Airbyte's new Python library, you can now connect to your data sources with minimal setup – no complex infrastructure required. While our example uses Confluence, the same approach works just as well with Notion, SharePoint, or other documentation systems. With Airbyte providing literally hundreds of connectors, there is a good chance that you’ll be able to connect your knowledge base using the demonstrated approach!

How It Actually Works

  1. Pull in your documentation using Airbyte

  2. Process and index the content

  3. Store it efficiently in PostgreSQL and pgvector

  4. When someone asks a question:

    • Find relevant documents

    • Use AI to formulate a clear answer

    • Return results based on your actual documentation

Why This Matters

This approach solves a straightforward problem: making existing documentation more accessible. Instead of building a new knowledge base or changing how teams work, it enhances what you already have. The setup is particularly useful for organizations that maintain extensive documentation but struggle with quick information retrieval.

Airbyte's extensive connector library lets you tap into these sources without writing custom integration code - Whether it's internal wikis, support tickets, or project documentation spread across different tools.

Read the full hands-on step-by-step guide in the link below:

Tool of the week

LibreChat: OpenSource, privacy-friendly ChatGPT alternative

As companies increasingly rely on AI chatbots, many face two critical challenges with ChatGPT: high subscription costs ($20-60 per user monthly) and data privacy concerns, particularly regarding OpenAI's data usage policies for training their models.

LibreChat offers a practical solution as a self-hosted, open-source alternative. It mirrors ChatGPT's familiar interface while adding several business-critical features:

Key Features & Advantages:

  • 💰 Cost Benefits

    • Zero per-user subscription fees

    • Full control over AI provider selection and usage

    • Flexible scaling without licensing restrictions

  • 🔒 Privacy & Security

    • Self-hosted infrastructure

    • Complete data sovereignty

    • Multiple authentication options (LDAP, Azure AD, AWS Cognito)

    • No external data sharing for model training

  • 🔗 Integration & Flexibility

    • Support for multiple AI providers (OpenAI, Anthropic, Google, AWS)

    • OpenAI Assistants API integration

    • Custom endpoint support for specialized AI models

    • RAG (Retrieval Augmented Generation) capabilities for document analysis

  • 🛠️ Advanced Features

    • Multi-modal support (text, images, code)

    • Customizable prompt library with variables

    • Conversation forking for team collaboration

    • Built-in image generation via DALL-E 3

    • "Artifacts" feature for live code/diagram rendering

    • File handling and analysis capabilities

Current Limitation: While LibreChat excels in features and privacy, it currently lacks granular user permissions - though this is prioritized on their roadmap.

Perfect for: Organizations that need ChatGPT's capabilities but require data privacy, cost control, and infrastructure flexibility. Particularly valuable for enterprises in regulated industries or those handling sensitive information.

Top News

Leak suggests: Diminishing returns with GPT-5 - are LLMs hitting their ceiling?

What happened?

  • OpenAI's new model "Orion" shows signs of diminishing returns in traditional AI scaling

  • Model reached GPT-4 level performance already at 20% of training, but further improvements proved more modest than previously

  • General improvements were smaller than previous generational leaps (GPT-3 to GPT-4)

This has prompted OpenAI to create a dedicated "foundations" team to explore new development approaches as high-quality training data becomes scarcer.

The company still plans to release Orion in early 2025, potentially under a new naming convention, marking a strategic pivot in their AI development roadmap.

⚡ Why it matters:
The AI industry may be hitting a ceiling with conventional scaling approaches as high-quality training data becomes scarce, forcing AI leaders to rethink development strategies. On the other hands, this might be good news for any AI application practitioners, as it gives room to regroup, breath and apply the already outstanding models currently available.

📅 What's next:
Orion is scheduled for release in early 2025, pending safety testing completion.

Read the full leak here.

Opinion: Why LLMs hitting a ceiling might be a good thing

These current AI models are just stepping stones. The real game-changing version is coming next year

Every other LinkedIn post

If I had a dollar for every time I've seen this sentiment flood my LinkedIn feed, we could fund another AI startup :-). But with OpenAI's recent announcement about diminishing returns in model improvements, it's time to burst this bubble of perpetual waiting. And honestly? This might be the best news we've had in AI this year.

For too long, we've been caught in a breathless cycle of anticipation, always waiting for the next big breakthrough, the next GPT version, the next AI miracle. This "wait and see" approach is costing businesses millions in missed opportunities. And please bare with me. I'm not suggesting to mindlessly follow the hype - quite the opposite.

Here's the truth: The current generation of AI models is already incredibly powerful. They're more than capable of transforming how we work - not by achieving artificial general intelligence or passing philosophical tests, but by tackling the unglamorous yet costly challenges that businesses face every day.

Think about it: Document parsing, spreadsheet automation, report generation, information retrieval - these aren't the stuff of sci-fi dreams, but they're the real day-to-day work-problems, manually executed, hated - but still done as if it was the 80s. These are the tasks eating up countless human hours and corporate budgets. GPT-4 and its peers are already perfectly equipped to handle these challenges.

The "pause" in breakthrough improvements gives us all a chance to catch our breath and focus on implementation rather than anticipation. Instead of waiting for the next model that's marginally better at philosophical debates and finding the amount of ‘r’s in ‘strawberry’, businesses should be asking: "How can we use what's available now to automate our processes, reduce costs, and free our people from repetitive tasks?"

Let's use this period of technological consolidation wisely. The real AI revolution isn't about reaching new theoretical heights - it's about bringing practical, profitable automation to the mundane tasks that bog down our businesses every day.

The future of AI isn't just about building better models - it's about better using the powerful tools we already have. Stop waiting for the "next big thing" - it's already here, and it's waiting to be utilized, right now, not tomorrow.

Also in the news

Google Gemini Adopts OpenAI's API Standard

Google has announced that Gemini is now compatible with the OpenAI API package, marking a significant step toward standardization in the AI industry.

⚡ Why This Matters:

  • Code Portability: Developers can now switch between AI providers without rewriting their entire codebase

  • Simplified Integration: Teams already using OpenAI's format can easily add Gemini to their toolkit

  • Reduced Learning Curve: One API standard means less time spent learning different implementations

  • Future-Proofing: Projects become more resilient to provider changes or outages - meaning we can implement easy-to-use fallbacks if one of the providers lacks out

With most major players now supporting the same API format, OpenAI's implementation is becoming the de facto standard for AI integration.

Study Reveals AI's Early Impact on Job Market

A new study provides the first concrete evidence of AI's impact on employment, particularly in the freelance sector.

🔍 Key Findings:

  • 21% reduction in automation-prone freelance jobs since ChatGPT's release

  • Most affected areas: Writing, software development, and engineering

  • 17% decrease in graphic design jobs following AI image generators' release

  • Study analyzed 1.3M+ job posts over two years (2021-2023)

📊 Impact by Sector:

  1. Writing & Content Creation

  2. Software & Web Development

  3. Engineering

  4. Graphic Design & 3D Modeling

⚡ Silver Linings:

  • Workers with AI-complementary skills seeing increased demand

  • New roles emerging at the intersection of human-AI collaboration

  • Productivity gains reported:

    • 25% faster task completion in consulting

    • 40% quality improvement with GPT-4

    • 40% reduction in business document writing time

🎯 Future Implications:

  • Growing gap between high-skill and low-skill jobs

  • Shift toward AI-human collaborative roles

  • Need for reskilling and social safety net programs

💡 Key Takeaway:
While routine tasks are being automated, the focus is shifting to higher-value work combining human creativity with AI efficiency. This transformation suggests not just job displacement, but a fundamental reshaping of how work is performed.

Read the full article here.

Tip of the week

Cutting LLM costs in half by using batch APIs

LLM costs have already come down significantly in the last years and months. However there is one method we see too little adopted: Using batch inference.

Almost every major LLM provider (OpenAI, Anthropic, Google - also self-hosting servers) offers batch processing capabilities. Check your provider's documentation for "Batch API" or "Async Processing" options.

What is Batch Inference?
Instead of sending individual API requests one by one, batch inference - or processing - allows you to bundle multiple LLM requests together and process them asynchronously.

Benefits:

  • Mostly 50% cost reduction compared to regular API calls

  • Significantly higher rate limits

  • Guaranteed 24-hour turnaround time

  • Perfect for non-time-critical tasks

Ideal Use Cases:

  • Content classification

  • Document processing

  • Data analysis

  • Bulk content generation

How it works:

Below you’ll find the steps required to use the batch processing API of OpenAI. No matter which AI provider you use - they all work similarly.

  1. Create your batch file. It’s simply a JSONL file of your request messages

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
  1. Upload this file to the provider's API

from openai import OpenAI
client = OpenAI()

batch_input_file = client.files.create(
  file=open("batchinput.jsonl", "rb"),
  purpose="batch"
)
  1. Create the batch job

batch_input_file_id = batch_input_file.id

batch = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
      "description": "nightly eval job"
    }
)
  1. Wait until the job is done

client.batches.retrieve(batch.id)
  1. Get the results

file_response = client.files.content("file-xyz123")
print(file_response.text)

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help