- Pondhouse Data OG - We know data & AI
- Posts
- Pondhouse Data AI - Tips & Tutorials for Data & AI 14
Pondhouse Data AI - Tips & Tutorials for Data & AI 14
GPT-5 Hits Ceiling | 50% LLM Cost-Saving Tip | How-To: Company Doc Search | Google Adopts OpenAI Standard

Hey there,
Welcome to our Pondhouse Data newsletter about all Data & AI.
This week, we focus on practical implementations: from building smart search across documentation systems to reducing API costs through batch inference. We've also included our analysis of recent industry developments that might affect your AI strategy.
And last but not least, we cover a new study which is one of the first larger studies indicating the impact of AI on the job market.
Let’s get started!
Cheers, Andreas & Sascha
In todays edition:
Headline Story: Are LLMs hitting a performance-ceiling? And why this might be the trigger companies need to start automating their processes.
Tutorial: Build a Company-Wide Search System, using Airbyte and LangChain
Tool Spotlight: LibreChat - The Open-Source ChatGPT Alternative
Cost-Saving Tip: Cut Your AI Costs in Half with Batch Inference
📰 Industry Updates:
Google Gemini Adopts OpenAI's API Standard - A Major Step Towards Standardization in AI
First Research Shows AI's Real Impact on Jobs
Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.
Tutorial of the week
Building a Smart Search for Your Company Docs: A Practical Approach
If you've ever been frustrated searching through your company's documentation, here's a practical solution: Let's turn your knowledge base into a searchable system that actually understands questions and provides relevant answers. While the provided approach is technically just ordinary RAG - the used combination of tools is interesting as it basically allows to integrate dozens of different documentation systems, using the outlined setup.
The Tools That Make It Work
This setup uses three main components:
Airbyte: Pulls your content from existing systems
PGVector: A PostgreSQL extension that handles the search
Langchain: Connects everything and manages the AI interaction
What Makes This Approach Different
The key advantage here is simplicity. Thanks to Airbyte's new Python library, you can now connect to your data sources with minimal setup – no complex infrastructure required. While our example uses Confluence, the same approach works just as well with Notion, SharePoint, or other documentation systems. With Airbyte providing literally hundreds of connectors, there is a good chance that you’ll be able to connect your knowledge base using the demonstrated approach!
How It Actually Works
Pull in your documentation using Airbyte
Process and index the content
Store it efficiently in PostgreSQL and pgvector
When someone asks a question:
Find relevant documents
Use AI to formulate a clear answer
Return results based on your actual documentation
Why This Matters
This approach solves a straightforward problem: making existing documentation more accessible. Instead of building a new knowledge base or changing how teams work, it enhances what you already have. The setup is particularly useful for organizations that maintain extensive documentation but struggle with quick information retrieval.
Airbyte's extensive connector library lets you tap into these sources without writing custom integration code - Whether it's internal wikis, support tickets, or project documentation spread across different tools.
Read the full hands-on step-by-step guide in the link below:
Tool of the week
LibreChat: OpenSource, privacy-friendly ChatGPT alternative

As companies increasingly rely on AI chatbots, many face two critical challenges with ChatGPT: high subscription costs ($20-60 per user monthly) and data privacy concerns, particularly regarding OpenAI's data usage policies for training their models.
LibreChat offers a practical solution as a self-hosted, open-source alternative. It mirrors ChatGPT's familiar interface while adding several business-critical features:
Key Features & Advantages:
💰 Cost Benefits
Zero per-user subscription fees
Full control over AI provider selection and usage
Flexible scaling without licensing restrictions
🔒 Privacy & Security
Self-hosted infrastructure
Complete data sovereignty
Multiple authentication options (LDAP, Azure AD, AWS Cognito)
No external data sharing for model training
🔗 Integration & Flexibility
Support for multiple AI providers (OpenAI, Anthropic, Google, AWS)
OpenAI Assistants API integration
Custom endpoint support for specialized AI models
RAG (Retrieval Augmented Generation) capabilities for document analysis
🛠️ Advanced Features
Multi-modal support (text, images, code)
Customizable prompt library with variables
Conversation forking for team collaboration
Built-in image generation via DALL-E 3
"Artifacts" feature for live code/diagram rendering
File handling and analysis capabilities
Current Limitation: While LibreChat excels in features and privacy, it currently lacks granular user permissions - though this is prioritized on their roadmap.
Perfect for: Organizations that need ChatGPT's capabilities but require data privacy, cost control, and infrastructure flexibility. Particularly valuable for enterprises in regulated industries or those handling sensitive information.
Top News
Leak suggests: Diminishing returns with GPT-5 - are LLMs hitting their ceiling?
What happened?
OpenAI's new model "Orion" shows signs of diminishing returns in traditional AI scaling
Model reached GPT-4 level performance already at 20% of training, but further improvements proved more modest than previously
General improvements were smaller than previous generational leaps (GPT-3 to GPT-4)
This has prompted OpenAI to create a dedicated "foundations" team to explore new development approaches as high-quality training data becomes scarcer.
The company still plans to release Orion in early 2025, potentially under a new naming convention, marking a strategic pivot in their AI development roadmap.
⚡ Why it matters:
The AI industry may be hitting a ceiling with conventional scaling approaches as high-quality training data becomes scarce, forcing AI leaders to rethink development strategies. On the other hands, this might be good news for any AI application practitioners, as it gives room to regroup, breath and apply the already outstanding models currently available.
📅 What's next:
Orion is scheduled for release in early 2025, pending safety testing completion.
Read the full leak here.
Opinion: Why LLMs hitting a ceiling might be a good thing
These current AI models are just stepping stones. The real game-changing version is coming next year
If I had a dollar for every time I've seen this sentiment flood my LinkedIn feed, we could fund another AI startup :-). But with OpenAI's recent announcement about diminishing returns in model improvements, it's time to burst this bubble of perpetual waiting. And honestly? This might be the best news we've had in AI this year.
For too long, we've been caught in a breathless cycle of anticipation, always waiting for the next big breakthrough, the next GPT version, the next AI miracle. This "wait and see" approach is costing businesses millions in missed opportunities. And please bare with me. I'm not suggesting to mindlessly follow the hype - quite the opposite.
Here's the truth: The current generation of AI models is already incredibly powerful. They're more than capable of transforming how we work - not by achieving artificial general intelligence or passing philosophical tests, but by tackling the unglamorous yet costly challenges that businesses face every day.
Think about it: Document parsing, spreadsheet automation, report generation, information retrieval - these aren't the stuff of sci-fi dreams, but they're the real day-to-day work-problems, manually executed, hated - but still done as if it was the 80s. These are the tasks eating up countless human hours and corporate budgets. GPT-4 and its peers are already perfectly equipped to handle these challenges.
The "pause" in breakthrough improvements gives us all a chance to catch our breath and focus on implementation rather than anticipation. Instead of waiting for the next model that's marginally better at philosophical debates and finding the amount of ‘r’s in ‘strawberry’, businesses should be asking: "How can we use what's available now to automate our processes, reduce costs, and free our people from repetitive tasks?"
Let's use this period of technological consolidation wisely. The real AI revolution isn't about reaching new theoretical heights - it's about bringing practical, profitable automation to the mundane tasks that bog down our businesses every day.
The future of AI isn't just about building better models - it's about better using the powerful tools we already have. Stop waiting for the "next big thing" - it's already here, and it's waiting to be utilized, right now, not tomorrow.
Also in the news
Google Gemini Adopts OpenAI's API Standard
Google has announced that Gemini is now compatible with the OpenAI API package, marking a significant step toward standardization in the AI industry.
⚡ Why This Matters:
Code Portability: Developers can now switch between AI providers without rewriting their entire codebase
Simplified Integration: Teams already using OpenAI's format can easily add Gemini to their toolkit
Reduced Learning Curve: One API standard means less time spent learning different implementations
Future-Proofing: Projects become more resilient to provider changes or outages - meaning we can implement easy-to-use fallbacks if one of the providers lacks out
With most major players now supporting the same API format, OpenAI's implementation is becoming the de facto standard for AI integration.
Study Reveals AI's Early Impact on Job Market
A new study provides the first concrete evidence of AI's impact on employment, particularly in the freelance sector.
🔍 Key Findings:
21% reduction in automation-prone freelance jobs since ChatGPT's release
Most affected areas: Writing, software development, and engineering
17% decrease in graphic design jobs following AI image generators' release
Study analyzed 1.3M+ job posts over two years (2021-2023)
📊 Impact by Sector:
Writing & Content Creation
Software & Web Development
Engineering
Graphic Design & 3D Modeling
⚡ Silver Linings:
Workers with AI-complementary skills seeing increased demand
New roles emerging at the intersection of human-AI collaboration
Productivity gains reported:
25% faster task completion in consulting
40% quality improvement with GPT-4
40% reduction in business document writing time
🎯 Future Implications:
Growing gap between high-skill and low-skill jobs
Shift toward AI-human collaborative roles
Need for reskilling and social safety net programs
💡 Key Takeaway:
While routine tasks are being automated, the focus is shifting to higher-value work combining human creativity with AI efficiency. This transformation suggests not just job displacement, but a fundamental reshaping of how work is performed.
Read the full article here.
Tip of the week
Cutting LLM costs in half by using batch APIs
LLM costs have already come down significantly in the last years and months. However there is one method we see too little adopted: Using batch inference.
Almost every major LLM provider (OpenAI, Anthropic, Google - also self-hosting servers) offers batch processing capabilities. Check your provider's documentation for "Batch API" or "Async Processing" options.
What is Batch Inference?
Instead of sending individual API requests one by one, batch inference - or processing - allows you to bundle multiple LLM requests together and process them asynchronously.
Benefits:
Mostly 50% cost reduction compared to regular API calls
Significantly higher rate limits
Guaranteed 24-hour turnaround time
Perfect for non-time-critical tasks
Ideal Use Cases:
Content classification
Document processing
Data analysis
Bulk content generation
How it works:
Below you’ll find the steps required to use the batch processing API of OpenAI. No matter which AI provider you use - they all work similarly.
Create your batch file. It’s simply a JSONL file of your request messages
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
Upload this file to the provider's API
from openai import OpenAI
client = OpenAI()
batch_input_file = client.files.create(
file=open("batchinput.jsonl", "rb"),
purpose="batch"
)
Create the batch job
batch_input_file_id = batch_input_file.id
batch = client.batches.create(
input_file_id=batch_input_file_id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={
"description": "nightly eval job"
}
)
Wait until the job is done
client.batches.retrieve(batch.id)
Get the results
file_response = client.files.content("file-xyz123")
print(file_response.text)
We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help