You can't optimize your prompt without optimizing your eval metric

The hardest part of prompt optimization is defining an eval metric, which itself can be a prompt (called synthetic evals).

Learn

Eval Optimization

When you're using synthetic evals, then you can use DSPy to optimize your evals not just your main prompt...More

Experience

Mike Taylor

Built a 50-person growth agency.
Logo
Logo
Logo
💪 Useful 0
😓 Difficult 0
🎉 Fun 0
😴 Boring 0
🚨 Errors 0
😕 Confusing 0
🤓 Interesting 0
Premium subscription required.
Python experience recommended.
1. Scenario
SNOWSTORM HEADQUARTERS - AI EVALUATION WORKSHOP
Marnie brings you into a meeting about prompt optimization, she wants you to use DSPy.
Gustav Gieger
at GoolyBib

Alright, team, gather around! Today we're diving into eval optimization. We all know that optimizing prompts is crucial for improving our AI-generated jokes. And guess what? We've got a secret weapon called DSP Y that's going to help us out.

So, here's the deal: our goal is to set up an evaluation framework to assess the quality of the jokes our AI generates. We're going to compile a test set of funny and not-so-funny jokes to train DSP Y. And let me tell you, when we're done, we're going to need an accuracy of above 80%

I know you're all eager to dive into this task, so let's get started!

This course is a work of fiction. Unless otherwise indicated, all the names, characters, businesses, data, places, events and incidents in this course are either the product of the author's imagination or used in a fictitious manner. Any resemblance to actual persons, living or dead, or actual events is purely coincidental.

2. Brief

Title: Optimizing Evaluations in Machine Learning Models: A Guide

Introduction:

In the world of machine learning, optimizing evaluations is a crucial step in improving the performance of models. The process involves setting up an evaluation framework to assess the effectiveness of prompts used in training the models. While writing prompts may seem straightforward, designing a reliable evaluation framework can be challenging, especially for complex tasks. In this blog post, we will explore a powerful approach to optimize evaluations and improve the generalizability of machine learning models.

Setting Up the Evaluation Framework:

To begin, it is essential to have a labeled test set to evaluate the performance of the model accurately. In the provided transcript, the speaker shares an interesting approach they often use. They collect a list of funny jokes and ask a chatbot, DSP Y, to generate a list of not funny jokes. This creates a dataset with examples of both funny and not funny jokes for evaluation purposes. The jokes are then shuffled and split into training, testing, and development sets.

Using DSP Y for Optimization:

DSP Y, a machine learning model, is employed to optimize the evaluations of the jokes. The speaker uses a class called Sigma ship, which contains information about the joke, topic, and a question to assess whether the joke is funny. The evaluation is based on the prompt, "Would this joke actually be funny to an adult attending a company share?" The model provides a rationale for its assessment of each joke, and a metric is created to evaluate its performance.

Training the Model:

To train the model, a bootstrap with random search technique is used. This involves loading the training and testing datasets, along with specifying the evaluation metric and the previously defined prompt. The model generates synthetic examples of jokes that pass the evaluation metric, and a certain number of labeled examples are included in the prompt. This process helps optimize the model's performance by finding the best combination of prompts that yield accurate results.

Improving the Prompt:

In addition to adding examples to the prompt, the speaker introduces an intriguing option called "me pro." This feature allows the model to generate new prompts while optimizing the evaluations. By using GPT Turbo as the teacher model, the approach optimizes both the prompts and the examples, resulting in an enhanced evaluation framework. Tuning the prompt proves to make a significant difference in the model's accuracy.

Benefits of Optimizing Evaluations:

Optimizing evaluations not only improves the accuracy of machine learning models but also expands the range of tasks that can be performed.

3. Tutorial

  Okay. Using DSP Y is quite good for optimizing prompts, but typically you need to set up your, an evaluation framework first and. That's actually the hardest thing. Not writing the prompts, but. To signing the evaluated, particularly for task where it's not as straightforward. You can't just calculate. You need to actually create some sort of synthetic evaluation where the AI will tell you whether the joke is funny or not, because you can't use the spy as a manual human being.

Funeval.ipynb
Download
4. Exercises
5. Certificate

Share This Course