Pondhouse Data AI - Tips & Tutorials for Data & AI 18

DeepSeek R1 vs. OpenAI o1: Same Power, 30x Cheaper | Speed Up Azure OpenAI Model Requests | Automatic Prompt Optimization with PromptWizard | Automate Web Tasks with OpenAI Operator

Hey there,

This week, we’re diving into smarter ways to optimize your AI workflows with tools like Azure AI Content Filters and Microsoft’s PromptWizard. We’ve also got exciting updates on DeepSeek’s R1 model and new features from LangChain, OpenAI, and Anthropic. And if you’ve ever wanted to build your own LLM (and this in just 3 hours!), our tip of the week is the perfect place to start!

Enjoy the read!

Cheers, Andreas & Sascha

In todays edition:

📚 Tutorial of the Week: Optimize Moderation Quality and Latency for Azure OpenAI workflows

🛠️ Tool Spotlight: Microsoft PromptWizard: Smarter Prompt Optimization Made Easy

💡 Tips: Learn to Build Your Own LLM in 3 Hours

📰 Top News: DeepSeek R1 - An Open-Source Model Rivaling AI Leaders

Let's get started!

Find this Newsletter helpful?
Please forward it to your colleagues and friends - it helps us tremendously.

Tutorial of the week

Optimize Azure AI Content Filters for Faster, Smarter Moderation

Azure OpenAI integrates robust content filtering mechanisms to ensure the responsible deployment of AI models. These filters assess both user inputs and model outputs for potentially harmful or inappropriate content, encompassing categories such as hate speech, sexual content, violence, and self-harm. Each category operates across four severity levels: Safe, Low, Medium, and High.

While these filters are essential for maintaining ethical AI interactions, they can sometimes be overly restrictive in professional settings. For instance, technical terms like "database killing" in IT may trigger the violence filter, leading to unexpected rejections. Additionally, the synchronous nature of these filters—where each request passes through multiple classification models before and after processing by the main language model—can result in increased latency.

Optimizing Content Filters for Your Application

To tailor Azure OpenAI's content filters to better suit your application's needs:

  1. Adjust Severity Levels: Configure the severity thresholds for each content category to align with your specific use case, ensuring necessary protections without unnecessary restrictions.

  2. Enable Asynchronous Processing: Switching filters to asynchronous mode allows them to run in parallel with the main language model, reducing latency and improving performance.

  3. Implement Logging and Monitoring: Set up comprehensive logging to monitor filter activations, helping identify and address any false positives or performance bottlenecks.

The Impact of Optimization

By making these adjustments, you can transform the way your AI handles content moderation. Not only will your system operate more efficiently, but it will also maintain its ethical safeguards in a manner better suited to your use case.

For a detailed exploration of Azure OpenAI's content filters and practical guidance on optimizing them, refer to the full article:

Tool of the week

Prompt Optimization Made Smarter: Microsoft’s New “PromptWizard”

Prompt tuning is often a tedious and time-consuming process, involving trial and error, guesswork, and countless iterations to find the right phrasing. As the demand for efficient prompt optimization grows, we’ve seen the rise of several APE (Automatic Prompt Engineering) tools and frameworks like DSPy and others aiming to streamline this workflow. Now, Microsoft has introduced PromptWizard, a cutting-edge tool designed to approach and automate the inefficiencies of prompt tuning, providing a framework for self-evolving prompt optimization.

Overview flow chart of PromptWizards optimization process

Key Features:

  • Feedback-Driven Refinement: PromptWizard enables Large Language Models (LLMs) to iteratively improve their prompts and examples by generating, critiquing, and refining them through feedback loops.

  • Diverse Example Generation: It creates robust and task-aware synthetic examples, optimizing both prompts and examples simultaneously to enhance performance across various use cases.

  • Self-Generated Chain of Thought (CoT): The framework employs CoT reasoning steps using a blend of positive, negative, and synthetic examples, improving the model's problem-solving abilities.

Getting Started:

To explore PromptWizard, visit the GitHub repository for installation instructions, documentation, and demos. Designed for research purposes and under active development, the tool provides a foundation for improving prompt optimization workflows. Thorough evaluations are recommended before deploying it in production environments.

By leveraging PromptWizard, developers and researchers can automate much of the manual effort involved in prompt tuning, allowing for more efficient and effective LLM interactions.

Top News of the week

Game-Changer: DeepSeek R1's Open-Source Model Challenges AI Leaders

Chinese AI startup DeepSeek has announced the release of its latest AI model, DeepSeek-R1, which matches the performance of leading models like OpenAI's o1 in tasks such as mathematics, coding, and general knowledge. Notably, DeepSeek-R1 is fully open-source under the MIT license, allowing free commercial and academic use.

Model Specifications and Performance

DeepSeek-R1 boasts an architecture with 671 billion parameters, with only 37 billion activated during any given operation, optimizing computational efficiency. Benchmark evaluations demonstrate its competitive edge:

  • AIME 2024: 79.8% pass rate (o1: 79.2%)

  • MATH-500: 97.3% accuracy (o1: 96.4%)

  • MMLU: 90.8% (o1: 91.8%)

These results underscore DeepSeek-R1's advanced reasoning capabilities, including self-verification and reflection.

Distilled Models

To make advanced AI more accessible, DeepSeek has distilled R1's reasoning capabilities into smaller models, ranging from 1.5 billion to 70 billion parameters. These distilled versions maintain impressive performance, with the 32B parameter model achieving a 72.6% pass rate on AIME 2024, outperforming other open-source models of similar size. Even the 14B model is on par with OpenAI's o1-mini, offering competitive reasoning capabilities at a fraction of the cost.

Pricing Comparison

DeepSeek's API pricing is notably more affordable than competitors:

  • 0,14/7,5Input Tokens: $0.14 per million (cache hit), $0.55 per million (cache miss)

  • Output Tokens: $2.19 per million

In contrast, OpenAI's o1 pricing is $15 per million input tokens (Cache miss) and $60 per million output tokens, highlighting DeepSeek-R1's cost-effectiveness.

DeepSeek's commitment to open-source development and affordable pricing is expected to foster further innovation and collaboration within the global AI research community.

For more information, visit DeepSeek's official announcement: DeepSeek-R1 Release.

Also in the news

LangChain Introduces Ambient Agents

LangChain has introduced a new approach to AI interactions with the launch of “ambient agents”. Unlike traditional chat-based models that require user initiation, ambient agents operate in the background, responding to ambient signals and engaging users only when necessary. This design aims to reduce interaction overhead and allows multiple agents to function simultaneously, enhancing efficiency and scalability. A reference implementation, such as an email assistant, demonstrates key features of ambient agents, including human-in-the-loop interactions for notifications, inquiries, and action reviews. This approach seeks to build user trust and ensure responsible agent behavior. For full details, check out LangChain’s official announcement blog post.

“Operator”: OpenAI’s New AI Assistant for Web Tasks

OpenAI has launched 'Operator,' an AI agent capable of autonomously performing web-based tasks such as filling out forms, ordering groceries, and creating memes. Utilizing the Computer-Using Agent (CUA) model, which combines GPT-4o's vision capabilities with advanced reasoning, Operator interacts with web interfaces by interpreting screenshots and engaging with graphical user elements like buttons and text fields. Currently available as a research preview to Pro users in the U.S., Operator aims to streamline repetitive online activities, enhancing efficiency and productivity. OpenAI plans to expand access and integrate Operator into ChatGPT in the future. For more details, check out the official announcement.

Anthropic Enhances Claude with Detailed Citations Feature

Anthropic has introduced a new citations feature in Claude 3.5 Sonnet and 3.5 Haiku, enabling the AI to provide detailed references when answering questions about documents. This enhancement allows users to track and verify information sources within responses, promoting transparency and trust in AI-generated content. The citations feature is currently available in the Messages API, supporting plain text documents, custom content documents, and PDFs. Anthropic recommends using this built-in feature over prompt-based approaches, citing benefits such as cost savings, improved citation reliability, and better citation quality. For full details, check out the official documentation.

Tip of the week

Learn to Build Your Own LLM in 3 Hours

If you've ever wondered how large language models (LLMs) are created and want to take a deep dive into the process, this comprehensive 3-hour video tutorial is the perfect resource. Created by the author of the book Build an LLM from Scratch, this video provides a hands-on walkthrough of the key steps involved in building your own LLM from the ground up.

What You’ll Learn

This video covers the full pipeline of creating an LLM, offering both theory and practical implementation tips. Here are some highlights:

  1. Data Preparation:
    Learn how to collect, clean, and preprocess large datasets to ensure your model has a solid foundation for training.

  2. Model Architecture:
    Get an in-depth explanation of transformer-based architectures, including how they power modern LLMs.

  3. Training the Model:
    Explore the intricacies of training, including configuring hyperparameters, managing computational resources, and utilizing GPU/TPU acceleration.

  4. Fine-Tuning:
    Understand how to adapt pre-trained models for specific tasks like summarization, Q&A, or chat-based applications.

  5. Evaluation and Deployment:
    Discover how to evaluate the performance of your LLM and deploy it efficiently, ensuring scalability and usability.

Who Is This For?

This video is tailored for developers, AI enthusiasts, and data scientists who have a foundational understanding of machine learning concepts and are eager to expand their skills into building custom LLMs. While some programming knowledge (e.g., Python) is helpful, the tutorial is beginner-friendly and provides clear explanations for every step.

Why Watch This Video?

Whether you're working on research, building your own AI product, or simply curious about how the most advanced language models are made, this tutorial gives you a complete picture in an easily digestible format. It’s a rare opportunity to gain insights from an expert who has written extensively on the topic, and it’s free on YouTube.

Watch the full video here: Build Your Own LLM from Scratch

We hope you liked our newsletter and you stay tuned for the next edition. If you need help with your AI tasks and implementations - let us know. We are happy to help