Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add normalizeScorers/normalize_scorers helpers #187

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jdeal
Copy link

@jdeal jdeal commented Apr 12, 2024

This adds a "higher order" scorer that makes it easier to build other higher order scorers. It takes a list of scorers and returns a single (async) scorer that calls the other scorers and returns a list of Score objects. This way, when you're building a higher order scorer, you don't need to worry about the dependent scorers returning numbers or nulls or whatever. Just call the returned scorer, and you'll get back a list of Score objects.

Here's an example usage:

// Given a list of scorers, returns a new scorer that includes the assertion
// scorers.
export function withAssertionScorers<
  Input,
  Output,
  Expected,
  Metadata extends BaseMetadata = DefaultMetadataType,
>(scorers: EvalScorer<Input, Output, Expected, Metadata>[]) {
  // Wrap the scorers in a scorer that "normalizes" the scores.
  const NormalizedScorer = normalizeScorers(scorers);
  return async function AssertionScorer(
    args: EvalScorerArgs<Input, Output, Expected, Metadata>,
  ) {
    // We are guaranteed to get back a list of score objects.
    const scores = await NormalizedScorer(args);
    // Now we can add in dependent scores.
    return [...scores, AboveZeroScorer(scores), AcceptableScorer(scores)];
  };
}

jdeal added 2 commits April 11, 2024 16:28
This helper is a "higher order scorer" that takes other scores and
returns a scorer that is guaranteed to return an array of scores.
await Promise.all(
scorers.map(async (scorer, index) => {
const name = scorerName(scorer, index);
const oneOrMoreScores = await scorer(args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing i've been thinking about quite a bit is that when you run scores in a list, you lose some of the tracing niceties that we have in the top-level scores array (namely, one span per scoring function).

I wonder if we should do some of that here as well?

Specifically calling traced and populating it with the scorer's name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice. I wonder if there should be a callScorer or whatever exported so you could call a scorer and have tracing (and whatever else) added in, so you could do that even if you don't use normalizeScoreres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds like a good idea

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can traced be called by itself and have the same effect as rootSpan.traced you're using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! It uses async local storage in that case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just thinking out loud here, but...

I'm wondering if there isn't a bit of a hole in the idea of delegating to other scorers. There's more happening when the framework calls a scorer than just getting a score back. There's also error tracking that gets lost if you delegate to other scorers. It will still get tracked in the parent scorer, but it won't get attributed back to the correct scorer, since an exception loses that information.

I wonder about something like this instead:

async function MultiScorer({trackScores}) {
  // These will pass along the original args.
  const scores = await trackScores([Score1, Score2]);
  // Call more scorers and pass additional args.
  await trackScores([AboveZeroScorer, PassFailScorer], {
    // These scores will be null if there were errors, so AboveZeroScorer can
    // decide what to do with those.
    scores,
  });
  // All scores will be tracked above, no need to return them.
  return [];
}

Passing in a function to the scorer allows it to do everything the framework does, because it's the same code in the same context. It's hard to completely replicate this context by just importing helpers.

Alternatively, the scores parameter for Eval could do this work by taking an object.

scores: [
  {
    scorer: MultiScorer,
    scores: [Scorer1, Scorer2]
  }
]

In that case, MultiScorer would get the results of Scorer1 and Scorer2 passed to it rather than needing to call them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants