-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normalizeScorers/normalize_scorers helpers #187
base: main
Are you sure you want to change the base?
Conversation
This helper is a "higher order scorer" that takes other scores and returns a scorer that is guaranteed to return an array of scores.
await Promise.all( | ||
scorers.map(async (scorer, index) => { | ||
const name = scorerName(scorer, index); | ||
const oneOrMoreScores = await scorer(args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing i've been thinking about quite a bit is that when you run scores in a list, you lose some of the tracing niceties that we have in the top-level scores array (namely, one span per scoring function).
I wonder if we should do some of that here as well?
Specifically calling traced
and populating it with the scorer's name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice. I wonder if there should be a callScorer
or whatever exported so you could call a scorer and have tracing (and whatever else) added in, so you could do that even if you don't use normalizeScoreres
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds like a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can traced
be called by itself and have the same effect as rootSpan.traced
you're using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! It uses async local storage in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just thinking out loud here, but...
I'm wondering if there isn't a bit of a hole in the idea of delegating to other scorers. There's more happening when the framework calls a scorer than just getting a score back. There's also error tracking that gets lost if you delegate to other scorers. It will still get tracked in the parent scorer, but it won't get attributed back to the correct scorer, since an exception loses that information.
I wonder about something like this instead:
async function MultiScorer({trackScores}) {
// These will pass along the original args.
const scores = await trackScores([Score1, Score2]);
// Call more scorers and pass additional args.
await trackScores([AboveZeroScorer, PassFailScorer], {
// These scores will be null if there were errors, so AboveZeroScorer can
// decide what to do with those.
scores,
});
// All scores will be tracked above, no need to return them.
return [];
}
Passing in a function to the scorer allows it to do everything the framework does, because it's the same code in the same context. It's hard to completely replicate this context by just importing helpers.
Alternatively, the scores
parameter for Eval
could do this work by taking an object.
scores: [
{
scorer: MultiScorer,
scores: [Scorer1, Scorer2]
}
]
In that case, MultiScorer
would get the results of Scorer1 and Scorer2 passed to it rather than needing to call them.
This adds a "higher order" scorer that makes it easier to build other higher order scorers. It takes a list of scorers and returns a single (async) scorer that calls the other scorers and returns a list of Score objects. This way, when you're building a higher order scorer, you don't need to worry about the dependent scorers returning numbers or nulls or whatever. Just call the returned scorer, and you'll get back a list of Score objects.
Here's an example usage: