Pondhouse Data AI - Edition 9

Elon Musk purchases 100.000 H100 GPUs | Increasing LLM output quality with simple prompt tricks | Saving up to 80% on LLM costs using LLM routing | The best document-data extraction solution so far

Andreas Nigg
10 Sep

Hey there,

We’re excited to bring you the 9th edition of our Pondhouse AI newsletter — your source for tips and tricks around AI and LLMs. Whether you want to learn about AI concepts, use AI tools effectively, or see inspiring examples, we’ve got you covered.

Let’s get started!

Cheers, Andreas & Sascha

In todays edition:

News: Finally: Anthropic offers enterprise-plan for their chat-bot offering Claude.io. And Elon Musk buys 100.000 H100 GPUs!
Tutorial: The best document-data extraction solution so far: GPT-4o to extract data from pdfs, docx, and more
Tip of the Week: Increasing LLM output quality with simple prompt tricks
Tool of the Week: RouteLLM: Saving up to 80% on LLM costs using LLM routing

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Claude Enterprise - Anthropic's Business Chat-bot with interesting new features

Anthropics Claude 3.5 Sonnet LLM is a favorite in the AI community for quite some time now. It provides on-par or even better output than the frontier models from OpenAI, Google or other providers - at very competitive prices.

Claude 3.5 Sonnet was and is also available in the claude.io chatbot solution from Anthropic - it’s basically ChatGPT - but from Anthropic, with Claude 3.5 Sonnet and some interesting features like Artifacts. The problem up until recently was, that all data provided to the chatbot were used by Anthropic for model training - making it a no-go for corporations of any size.

This changed with Anthropics announcement of Claude Enterprise: Claude, tailored specifically for enterprise customers. This move positions Claude Enterprise as a direct competitor to OpenAI’s ChatGPT Enterprise, which has been on the market for about a year. Claude Enterprise is designed to offer more administrative controls and enhanced security, making it an attractive option for businesses looking to integrate AI into their operations.

Main enterprise features:

Single sign-on (SSO) and domain capture: Securely manage user access and centralize provisioning control.
Role-based access with fine-grained permissioning: Designate a primary owner for your workspace to enhance security and information management.
Audit logs: Trace system activities for security and compliance monitoring. Audit logs will be available in the coming weeks.
System for Cross-domain Identity Management (SCIM): Automate user provisioning and access controls. SCIM will also be available in the coming weeks.

Comparison to ChatGPT Enterprise

While there are many similarities between Claude and ChatGPT, there are some outstanding features not available in ChatGPT.

Claude Enterprise offers a context window of 500k tokens - compared to 128k of ChatGPT
Github integration offers an intriguing possibility for engineers to enhance their productivity - also considering that Claude 3.5 Sonnet is the best coding model at the moment
Claude Artifacts are a superior way of creating usable, interactive LLM outputs. Artifacts are small, interactive web-applications, entirely written by Claude. Example below:

Interactive LLM output

For more information, read the full announcement here.

Tutorials & Use Cases

Extracting data, images and more from documents using GPT-4o

Extracting data from documents like PDFs, Word files, and Excel sheets can be a real headache. Traditional methods often struggle with complex layouts and varying formats. In this tutorial, we'll show you how to leverage the power of GPT-4o, an advanced language model with vision capabilities, to simplify this process and achieve impressive results.

The Strategy:

Our approach involves a seemingly unconventional but surprisingly effective method:

Convert the document to an image (or series of images).
Send the image to GPT-4o.
Prompt GPT-4o to extract the text from the image.

By presenting the document visually, we enable GPT-4o to understand its layout and context, resulting in highly accurate text extraction.

Parsing PDF Files:

PDFs, despite their complexity, are surprisingly easy to parse with this method:

Import packages and initialize the GPT-4o API client:

from openai import AsyncOpenAI
from io import BytesIO
import base64
import pypdfium2 as pdfium
import backoff

client = AsyncOpenAI(api_key="your-api-key")

Load the PDF and convert pages to base64 encoded images:

pdf_file = "mypdf.pdf"
pdf = pdfium.PdfDocument(pdf_file)

images = []
for i in range(len(pdf)):
    page = pdf[i]
    image = page.render(scale=4).to_pil()
    buffered = BytesIO()
    image.save(buffered, format="JPEG")
    img_byte = buffered.getvalue()
    img_base64 = base64.b64encode(img_byte).decode("utf-8")
    images.append(img_base64)

Define a function to parse each page with GPT-4o:

@backoff.on_exception(backoff.expo, RateLimitError)
async def parse_page_with_gpt(base64_image: str) -> str:
    # ... (See full code in the original blog post) ...

Send requests concurrently and combine the results:

text_of_pages = await asyncio.gather(*[parse_page_with_gpt(image) for image in images])
document_text = "\n".join(text_of_pages)

Parsing DOCX and XLSX Files:

For Word and Excel files, we'll first convert them to PDF using LibreOffice and then follow the PDF parsing steps:

Install LibreOffice: (Instructions vary depending on your operating system)
Define functions for LibreOffice path and conversion:

# ... (See full code for libreoffice() and convert_to_pdf() in the original blog post) ...

Convert DOCX or XLSX to PDF:

docx_path = "mydoc.docx" # or "mydoc.xlsx"
with open(docx_path, "rb") as docx_file: 
    docx_content = BytesIO(docx_file.read())
pdf = convert_to_pdf(docx_content) 

# ... continue with PDF parsing as described above ...

For a complete tutorial, see the full step-by-step guide below!

Parsing pdf, word and excel documents with GPT-4o

Learn how to use GPT-4o to parse data from even the most complex of documents - multi-column PDFs, excel documents, tables, and more.

www.pondhouse-data.com/blog/document-extraction-with-gpt4o

Also in the news

Elon Musk’s xAI purchases 100.000 H100 GPUs

Yes, you read that right: 100.000 H100 GPUs - which are between 20.000 to 30.000$ per unit (street price).

These GPUs will be used in xAI’s new AI training cluster ‘Colossus‘ - making it the most powerful training cluster in the world.

100,000 NVIDIA H100 GPUs, surpassing systems used by Google and OpenAI
Plans to double capacity to 200,000 GPUs (50,000 H200s) in coming months
Developed in partnership with NVIDIA, utilizing cutting-edge GPU technology
Potential to accelerate breakthroughs in various AI applications

While - from a technical standpoint - these are great news, it also sparked a debate about the dangers of concentrating AI training power among individual tech giants.

Read Musk’s announcement here.

New Python Library finally makes Document Re-ranking easier

Answer.AI has released 'rerankers', a groundbreaking Python library that streamlines document re-ranking methods for information retrieval systems. This unified interface allows researchers to experiment with various re-ranking techniques by changing just a single line of code.

Key features:

Supports multiple re-ranking models (MonoT5, FlashRank, BERT)
Minimal code changes required
Performance parity with original implementations
Compatible with modern Python versions and HuggingFace Transformers

The library has shown impressive results across datasets like MS Marco, SciFact, and TREC-COVID, maintaining consistent performance with existing implementations. 'rerankers' promises to enhance the efficiency and accuracy of retrieval systems, paving the way for future advancements in information retrieval.

What are rerankers?

Rerankers are mechanisms used in information retrieval systems to improve search result accuracy by refining an initial set of candidate documents, and they're important because they enhance the relevance and quality of search results while balancing computational efficiency in large-scale applications like web search engines or specialized database queries.

Microsoft launches the first LLM modelling our atmosphere

Microsoft Research has open-sourced Aurora, an innovative AI model for weather predictions:

This sophisticated 1.3 billion parameter system was trained on an extensive million-hour (!) weather dataset.
Aurora outperforms current leading forecasting systems, including the highly regarded GraphCast AI model.
Its versatility shines in predicting a wide range of atmospheric conditions, from surface temperatures to complex air pollution patterns.
With high-resolution capabilities (0.1°) and processing speeds up to 5,000 times faster than traditional methods, Aurora promises rapid, detailed forecasts.

The model's success stems from its advanced 3D Swin Transformer architecture and comprehensive training across diverse climate scenarios. Aurora's potential extends beyond daily forecasts, offering valuable insights for extreme weather preparation and climate change studies.

Aurora is yet another technological innovation coming from the AI Transfomer architecture. The significance of this invention can’t be overstated enough.

For more details, read the full article here.

Tip of the week

Increasing the LLM output quality with 2 simple prompt tricks

Wait, wait … before you think this is stupid, let me explain: Researchers have indeed found, that by simply telling the LLM to repeat the question and then answering it, it can solve much harder problems, more consistently.

So, how does it work: Add a phrase like "Repeat the question before answering it." to your prompt. That’s all that’s needed. The LLM will comply and then repeat what you asked to do - and then provide a more accurate answer than without this phrase.

Potential explanations:

According to Rohan Paul, reasons for this behavior could be:

Repeating the question in the model's context, significantly increasing the likelihood of the model detecting any potential "gotchas."
One hypothesis is that maybe it puts the model into more of a completion mode vs answering from a chat instruct mode.
Another, albeit less likely, reason could be that the model might assume the user’s question contains mistakes (e.g., the user intended to ask about a Schrödinger cat instead of a dead cat). However, if the question is in the assistant’s part of the context, the model trusts it to be accurate.

Another trick which we found by accident is: Simply add “Ruminate about this” at the end of your prompt. This will result in a more elaborate answer by the LLM - but more often than not also in more accurate results when tasked with reasoning questions.

While we also can only speculate about the reasons for that, hypothesis 2 - that the model is put into more a completion mode, which works better for reasoning sounds not unlikely.

While we acknowledge, that these tips sound esoteric at first, we encourage you to try them. We find them highly useful.

Read the full paper here.

Tool of the week

RouteLLM: How to Significantly Reduce LLM Costs with LLM routing

What is RouteLLM?

RouteLLM is a framework that intelligently routes your prompts to either a "strong" (e.g., GPT-4) or "weak" (e.g., GPT-4 mini) LLM based on the complexity of the task. This allows you to leverage the power of strong models only when necessary, saving you money on simpler tasks that can be handled by more efficient weaker models.

Key Features and Benefits:

Cost Optimization: RouteLLM can significantly reduce your LLM API costs, potentially by up to 80%, by strategically using less expensive models for simpler tasks.
Improved Performance: By routing simpler tasks to faster "weak" models, RouteLLM can improve the overall speed and responsiveness of your application.
Easy Integration: RouteLLM provides an OpenAI API-compatible interface, allowing for seamless integration with existing applications with minimal code changes.
Multiple Router Options: The framework offers various router implementations, including matrix factorization, weighted Elo ranking, and BERT-based classification, providing flexibility for different use cases.
Calibration for Customization: RouteLLM allows you to calibrate the routing thresholds based on your specific needs and the characteristics of your typical prompts.
Support for Multiple Providers: RouteLLM uses LiteLLM for connecting with LLMs, enabling you to route between models from different providers like OpenAI, Google Gemini, and even locally hosted models via Ollama.

Why Choose RouteLLM?

Reduce your LLM API costs without compromising on quality for the majority of tasks.
Improve the performance and scalability of your LLM-powered application.
Simplify your LLM integration and management.
Gain more control over the trade-off between cost and performance.

For a full tutorial on how to use RouteLLM, click the link below:

Saving costs with LLM Routing: The art of using the right model for the right task

Learn how RouteLLM navigates prompts to cheap or premium models, balancing your budget and brainpower needs.

www.pondhouse-data.com/blog/saving-costs-with-llm-routing

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help