-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(learning classifier) | make a learning classifier by itself #21
feat(learning classifier) | make a learning classifier by itself #21
Conversation
class_dict: Optional[Dict[str, str]] = field(default=None) | ||
class_enum: Optional[Enum] = field(default=None) | ||
prediction_class: Optional[Type[BaseModel]] = field(default=None) | ||
model: str | ||
zenbase_tracer: ZenbaseTracer | ||
lm_function: Optional[LMFunction] = field(default=None) | ||
training_set: List[DatasetItem] | ||
test_set: List[DatasetItem] | ||
validation_set: List[DatasetItem] | ||
shots: int = 5 | ||
samples: int = 10 | ||
best_evaluation: Optional[CandidateEvalResult] = field(default=None) | ||
base_evaluation: Optional[CandidateEvalResult] = field(default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to do the following on Python 3.10?
- use
| None
instead ofOptional
- use
list
instead ofList
A generator for creating single-class classifier language model functions. | ||
""" | ||
|
||
instructor_client: Instructor | AsyncInstructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to remove dependency on the Instructor client? Ideally a user submits their own LM function with an initial prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool to return the results in an OpenAI compatible kwargs form so the user can consume them however they want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to make sure we are getting the structured output.
Lgtm for now, though tbh I think there's a lot of work we can do next week to refine the elegance and simplicity of this. |
I think a good way to understand the goal of this is for us to be able to get integrated into e.g. https://www.askmarvin.ai/welcome/what_is_marvin/ |
- Implement `news_dataset` fixture to load the 20 Newsgroups dataset. - Create tests for `SingleClassClassifierLMFunctionGenerator`, including initialization and prediction verification. - Ensure balanced dataset creation for training, validation, and test sets in `SingleClassClassifier`.
…tionGenerator Implement exponential backoff with logging for the classifier function to improve resilience during retries.
I think it's quite straightforward at the moment; we just need to create a classifier and optimize it. Regarding the goal, it seems our code is already quite similar to the example you mentioned (e.g., https://www.askmarvin.ai/welcome/what_is_marvin/). The only way I see to simplify it further is to: Extract the model configurations and initialize them during the library import, then have the class retrieve them from the library. However, I prefer keeping the scope within the class instead of using a global approach. Here’s an example of the current implementation:
|
- Change type hints from `Optional` to union types for clarity. - Modify the `_create_evaluator` method to be static. - Enhance test assertions to validate the result object and its properties.
* Downgrade Faker to 24.2.0 and update lock files * Add single class classifier synthetic data generator This commit message succinctly describes the main addition in the diff, which is a new feature for generating synthetic data for single class classifiers. * Add instructor package and create synthetic data generator notebook
…earning-classifier-by-itself
No description provided.