Pondhouse Data AI - Edition 1

Use-Cases and Tips & Tricks from Data and AI

Andreas Nigg
16 May

Hey there,

You've made it to the premiere edition of Pondhouse AI — the go-to resource for learning and experiencing artificial intelligence. Whether you’re looking to understand complex AI concepts, apply AI tools effectively, or explore inspiring use cases, we have you covered.

Let’s dive right in!

Cheers, Andreas & Sascha

In todays edition:

News: OpenAI released a new, multimodal LLM, supporting text, image and audio - all in one model.
Tutorial: Comparison of different methods on how to use language models with proprietary data.
Tip of the week: Compressing conversation memory using LlamaIndex
Tool of the week: GPT-Researcher - an autonomous agent to create web research

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

OpenAI releases GPT-4o - a new multimodal LLM for text, video and audio

On Monday, May 13th, OpenAI released their latest frontier model called GPT-4o (“o” stands for “omni”). GPT-4o will replace GPT-4-Turbo in ChatGPT Plus as well as the free version and is available as part of OpenAIs API offering.

The most significant feature of GPT-4o is its capability for handling text, image and audio - making it the first of its kind. Compared to traditional “multi-modal” approaches, GPT-4o seems to be one mixture-of-expert model, trained on text, image and audio. This is quite intriguing as it creates a new array of use-cases which require interaction between these 3 domains.

Other then that, some notable improvements are:

Very low audio reply latency of about 320ms
Text Generation model performance on par with GPT-4-Turbo
Note: Unconfirmed reports tell us, that while GPT-4o performs well in general, it did indeed lose its talent for complex coding and software development tasks. We will investigate and keep you updated.
Impressive text in image generation
Image generation visual narrative conservation. Until now, if you used image generation models, they had difficulties to grasp the visual narrative of an image - not anymore.
Better token compression: The model gets cheaper as it needs less tokens per word.
Decreased price and decreased latency. Last but not least. GPT-4o got significantly more efficient.
- 50% lower pricing. GPT-4o is 50% cheaper than GPT-4 Turbo, across both input tokens ($5 per 1 million tokens) and output tokens ($15 per 1 million tokens).
- 2x faster latency. GPT-4o is 2x faster than GPT-4 Turbo.
- 5x higher rate limits. Over the coming weeks, GPT-4o will ramp to 5x those of GPT-4 Turbo—up to 10 million tokens per minute for developers with high usage.

Verdict: Without a doubt, GPT-4o is another highly impressive model. Especially the record-breaking performance of their image and audio capabilities is worth a mention. We are once again surprised by how fast OpenAI was able to deliver a best-in-class audio- and image generation model.

On the other hand, the text generation capabilities very much plateaued - no significant improvements were made compared to GPT-4-Turbo.

While this latter part may sound a little dissappointing, lets however end with the incredible gain in efficiency, with a price gut of 50% and performance improvement of 100%!

Tutorials & Use Cases

Using AI Language Models to Improve Business Operations

Many companies today are looking to use AI language models to better understand and use the large amounts of data they have. By training these models on company-specific information, businesses can get more relevant and useful outputs that are tailored to their unique needs and goals.

Ways to Train Language Models on Company Data

Fine-Tuning Existing Models: Adapting a pre-trained LLM to specific datasets by further training it on new, domain-specific data.
- Advantages:
  - Efficient use of resources due to leveraging existing models.
  - Enhanced performance on specialized tasks through tailored training.
- Disadvantages:
  - Requires large amounts of specific data.
  - Risk of overfitting to the new data, which can degrade performance on other tasks.
Training New Models: Building an LLM from scratch, customized entirely to fit the specific needs of a business.
- Advantages:
  - Total customization to company-specific language and tasks.
  - Complete control over the data and training process, enhancing security and relevance.
- Disadvantages:
  - Highly resource-intensive in terms of time, data, and computational power.
  - Demands significant machine learning expertise and infrastructure.
Retrieval-Augmented Generation (RAG): A hybrid approach that combines traditional language generation with dynamic information retrieval from external data sources.
- Advantages:
  - Generates responses informed by the latest, relevant external data.
  - Updates are efficient and cost-effective, requiring only changes to the data sources rather than full model retraining.
- Disadvantages:
  - Complex integration of retrieval systems and language models.
  - Quality of output heavily depends on the quality of the external data.

Why RAG is Optimal for Most Companies

Accuracy and Relevance: Augments pre-trained models with real-time data for highly accurate and relevant responses.
Customization: Easily adapts to different datasets, catering to specific industry needs without extensive retraining.
Efficiency: More cost-effective in updating knowledge bases as it involves modifying data sources rather than retraining models.
Reduced Biases: By accessing diverse data sources, it helps in generating more balanced and comprehensive insights.
Dynamic Adaptation: Capable of evolving with business needs through continuous updates to its external data sources.

Also in the news

ElevenLabs releases Preview of singing voice clones

ElevenLabs is set to introduce a groundbreaking artificial intelligence music generator that includes vocal capabilities. To promote the upcoming release, the company showcased a selection of impressive tracks on the social media platform X (formerly Twitter).

The field of AI-generated music has experienced rapid growth and development throughout the year, establishing itself as one of the most rapidly advancing sectors within the synthetic content domain.

We must say - it sounds very good. Leaving the question: Did our arts over the centuries got so predictable, that a “pattern recognition machine” like AI could easily replicate it?

Hear for yourself: It Started to Sing — ElevenLabs AI music - YouTube

Apple and OpenAI appear to close deal for bringing ChatGPT to iPhones

There are indications that Apple is on the verge of entering into a partnership with OpenAI to integrate ChatGPT into its iPhones.

A potential agreement with OpenAI would enable Apple to incorporate the most popular chatbot into its ecosystem, which already includes numerous new AI features set to be announced next month - potentially making Apples platform a leader in performance for “Copilot” - like assistance features.

So far, neither Apple nor representatives from OpenAI and Google have commented on the ongoing negotiations.

https://www.bloomberg.com/news/articles/2024-05-11/apple-closes-in-on-deal-with-openai-to-put-chatgpt-on-iphone

The AI market is still very much increasing, with projections reaching $826 billion by end of decade

While the idea of robots doing all our housework, as envisioned in the nineties, has not yet become a reality, AI has already transformed many aspects of our daily lives. And according to market predictions it will only get bigger - much bigger. The AI market is expected to grow significantly in the coming years, reaching US$ 826 billion by 2030.

The AI sector is a massive part of the global economy, with recent statistics predicting it will be worth US$ 184.00 billion in 2024.
Forecasts for 2030 suggest that the AI market will grow at almost 29% and will be worth a staggering US$ 826 billion by the end of the decade.

Areas which will be especially impacted:

Face ID technology on smartphones
Personalized experiences on social media and online gaming platforms
Spell check and grammar tools
Spam filters and anti-virus software
Digital voice assistants like Siri, Alexa, Google Home, and Cortana
Smart home devices, such as thermostats and fridges

The market size in the AI market is projected to reach $184bn in 2024 - AI News (artificialintelligence-news.com)

Anthropics Claude 3-powered AI assistant finally arrives in Europe

Claude 3 was released some months ago by Anthropic and at that time proved to be the best or at least among the best text generation models out there. It was a welcomed contender for GPT-4, because - well - competition is good for us AI model consumers.

Anthropic also released a ChatGPT-like assistant tool, making using the model easy. However, up until now we Europeans could not use it as Anthropic allegadly had to first fix data privacy issues - a concern non-existent in the US.

Claude - the name of the new assistant - is available at https://claude.io and is a direct competitor of ChatGPT. If you are looking for an alternative to ChatGPT or if you are just starting with AI assistants - you might give it a try.

Anthropic AI assistant 'Claude' arrives in Europe, CIO News, ET CIO (indiatimes.com)

Tip of the week

Iteratively condensing memory of your AI chatbot

Any chatbot application inevitably runs into the following limitation: The chat history either exceeds the LLMs context window - or at least increases the LLM bill to unreasonable amounts.

Limiting the chat history to certain messages - as was done by ChatGPT for a long time - is not good enough. Simply, because you lose precious context from the front of your conversation.

Using the LlamaIndex Chat Summary Memory Buffer we can iteratively condense the chat history, without losing too much of the context.

model = "gpt-4-0125-preview"
summarizer_llm = OpenAiLlm(model_name=model, max_tokens=256)
tokenizer_fn = tiktoken.encoding_for_model(model).encode
memory = ChatSummaryMemoryBuffer.from_defaults(
    chat_history=chat_history,
    summarizer_llm=summarizer_llm,
    token_limit_full_text=256,
    tokenizer_fn=tokenizer_fn,
)

By adding new history to the memory, the library will automatically condense and summarize the memory for us.

memory.put(new_chat_history[0])

This provides an easy and efficient way of managing conversation memory!

Tool of the week

GPT-Researcher

GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks.

The agent can produce detailed, factual and unbiased research reports, with customization options for focusing on relevant resources, outlines, and lessons. Inspired by the recent Plan-and-Solve and RAG papers, GPT Researcher addresses issues of speed, determinism and reliability, offering a more stable performance and increased speed through parallelized agent work, as opposed to synchronous operations.

Why GPT-Researcher?

To form objective conclusions for manual research tasks can take time, sometimes weeks to find the right resources and information.
Current LLMs are trained on past and outdated information, with heavy risks of hallucinations, making them almost irrelevant for research tasks.
Services that enable web search (such as ChatGPT + Web Plugin), only consider limited sources and content that in some cases result in superficial and biased answers.
Using only a selection of web sources can create bias in determining the right conclusions for research tasks.

GitHub - assafelovic/gpt-researcher: GPT based autonomous agent that does online comprehensive research on any given topic

GPT based autonomous agent that does online comprehensive research on any given topic - assafelovic/gpt-researcher

github.com/assafelovic/gpt-researcher

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help