Add normalizeScorers/normalize_scorers helpers #187

jdeal · 2024-04-12T16:09:27Z

This adds a "higher order" scorer that makes it easier to build other higher order scorers. It takes a list of scorers and returns a single (async) scorer that calls the other scorers and returns a list of Score objects. This way, when you're building a higher order scorer, you don't need to worry about the dependent scorers returning numbers or nulls or whatever. Just call the returned scorer, and you'll get back a list of Score objects.

Here's an example usage:

// Given a list of scorers, returns a new scorer that includes the assertion
// scorers.
export function withAssertionScorers<
  Input,
  Output,
  Expected,
  Metadata extends BaseMetadata = DefaultMetadataType,
>(scorers: EvalScorer<Input, Output, Expected, Metadata>[]) {
  // Wrap the scorers in a scorer that "normalizes" the scores.
  const NormalizedScorer = normalizeScorers(scorers);
  return async function AssertionScorer(
    args: EvalScorerArgs<Input, Output, Expected, Metadata>,
  ) {
    // We are guaranteed to get back a list of score objects.
    const scores = await NormalizedScorer(args);
    // Now we can add in dependent scores.
    return [...scores, AboveZeroScorer(scores), AcceptableScorer(scores)];
  };
}

This helper is a "higher order scorer" that takes other scores and returns a scorer that is guaranteed to return an array of scores.

ankrgyl · 2024-04-12T16:22:16Z

js/src/framework.ts

+      await Promise.all(
+        scorers.map(async (scorer, index) => {
+          const name = scorerName(scorer, index);
+          const oneOrMoreScores = await scorer(args);


So one thing i've been thinking about quite a bit is that when you run scores in a list, you lose some of the tracing niceties that we have in the top-level scores array (namely, one span per scoring function).

I wonder if we should do some of that here as well?

Specifically calling traced and populating it with the scorer's name

That would be nice. I wonder if there should be a callScorer or whatever exported so you could call a scorer and have tracing (and whatever else) added in, so you could do that even if you don't use normalizeScoreres.

Yes that sounds like a good idea

Can traced be called by itself and have the same effect as rootSpan.traced you're using?

Yep! It uses async local storage in that case.

Mostly just thinking out loud here, but...

I'm wondering if there isn't a bit of a hole in the idea of delegating to other scorers. There's more happening when the framework calls a scorer than just getting a score back. There's also error tracking that gets lost if you delegate to other scorers. It will still get tracked in the parent scorer, but it won't get attributed back to the correct scorer, since an exception loses that information.

I wonder about something like this instead:

async function MultiScorer({trackScores}) { // These will pass along the original args. const scores = await trackScores([Score1, Score2]); // Call more scorers and pass additional args. await trackScores([AboveZeroScorer, PassFailScorer], { // These scores will be null if there were errors, so AboveZeroScorer can // decide what to do with those. scores, }); // All scores will be tracked above, no need to return them. return []; }

Passing in a function to the scorer allows it to do everything the framework does, because it's the same code in the same context. It's hard to completely replicate this context by just importing helpers.

Alternatively, the scores parameter for Eval could do this work by taking an object.

scores: [ { scorer: MultiScorer, scores: [Scorer1, Scorer2] } ]

In that case, MultiScorer would get the results of Scorer1 and Scorer2 passed to it rather than needing to call them.

jdeal added 2 commits April 11, 2024 16:28

Add normalizeScorers

ff13e1f

This helper is a "higher order scorer" that takes other scores and returns a scorer that is guaranteed to return an array of scores.

Add normalize_scorers for Python

5a43424

ankrgyl reviewed Apr 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalizeScorers/normalize_scorers helpers #187

Add normalizeScorers/normalize_scorers helpers #187

jdeal commented Apr 12, 2024

ankrgyl Apr 12, 2024

jdeal Apr 12, 2024

ankrgyl Apr 12, 2024

jdeal Apr 12, 2024

ankrgyl Apr 12, 2024

jdeal Apr 12, 2024

Add normalizeScorers/normalize_scorers helpers #187

Are you sure you want to change the base?

Add normalizeScorers/normalize_scorers helpers #187

Conversation

jdeal commented Apr 12, 2024

ankrgyl Apr 12, 2024

Choose a reason for hiding this comment

jdeal Apr 12, 2024

Choose a reason for hiding this comment

ankrgyl Apr 12, 2024

Choose a reason for hiding this comment

jdeal Apr 12, 2024

Choose a reason for hiding this comment

ankrgyl Apr 12, 2024

Choose a reason for hiding this comment

jdeal Apr 12, 2024

Choose a reason for hiding this comment