Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(learning classifier) | make a learning classifier by itself #21

Conversation

ammirsm
Copy link
Contributor

@ammirsm ammirsm commented Jul 19, 2024

No description provided.

Comment on lines 36 to 48
class_dict: Optional[Dict[str, str]] = field(default=None)
class_enum: Optional[Enum] = field(default=None)
prediction_class: Optional[Type[BaseModel]] = field(default=None)
model: str
zenbase_tracer: ZenbaseTracer
lm_function: Optional[LMFunction] = field(default=None)
training_set: List[DatasetItem]
test_set: List[DatasetItem]
validation_set: List[DatasetItem]
shots: int = 5
samples: int = 10
best_evaluation: Optional[CandidateEvalResult] = field(default=None)
base_evaluation: Optional[CandidateEvalResult] = field(default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to do the following on Python 3.10?

  1. use | None instead of Optional
  2. use list instead of List

A generator for creating single-class classifier language model functions.
"""

instructor_client: Instructor | AsyncInstructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to remove dependency on the Instructor client? Ideally a user submits their own LM function with an initial prompt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to return the results in an OpenAI compatible kwargs form so the user can consume them however they want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to make sure we are getting the structured output.

@CyrusNuevoDia
Copy link
Contributor

Lgtm for now, though tbh I think there's a lot of work we can do next week to refine the elegance and simplicity of this.

@CyrusNuevoDia
Copy link
Contributor

I think a good way to understand the goal of this is for us to be able to get integrated into e.g. https://www.askmarvin.ai/welcome/what_is_marvin/

Screenshot 2024-07-19 at 16 55 10

ammirsm added 2 commits July 21, 2024 14:09
- Implement `news_dataset` fixture to load the 20 Newsgroups dataset.
- Create tests for `SingleClassClassifierLMFunctionGenerator`, including initialization and prediction verification.
- Ensure balanced dataset creation for training, validation, and test sets in `SingleClassClassifier`.
…tionGenerator

Implement exponential backoff with logging for the classifier function to improve resilience during retries.
@ammirsm
Copy link
Contributor Author

ammirsm commented Jul 21, 2024

Lgtm for now, though tbh I think there's a lot of work we can do next week to refine the elegance and simplicity of this.

I think a good way to understand the goal of this is for us to be able to get integrated into e.g.
https://www.askmarvin.ai/welcome/what_is_marvin/

I think it's quite straightforward at the moment; we just need to create a classifier and optimize it.

Regarding the goal, it seems our code is already quite similar to the example you mentioned (e.g., https://www.askmarvin.ai/welcome/what_is_marvin/).

The only way I see to simplify it further is to:

Extract the model configurations and initialize them during the library import, then have the class retrieve them from the library. However, I prefer keeping the scope within the class instead of using a global approach.

Here’s an example of the current implementation:

classifier = SingleClassClassifier(
    # Model config --> This should be done in another part with Maven too.
    instructor_client=instructor_client,
    model="gpt-4o-mini",
    zenbase_tracer=zenbase_tracer,
    # Prompt definition --> Same as Maven.
    prompt=prompt_definition,
    class_dict=class_dict,
    # Optimization parameters.
    training_set=train_set,
    validation_set=validation_set,
    test_set=test_set,
)
best_fn, _, _ = classifier.perform()
output = best_fn(sample_input)

ammirsm added 2 commits July 21, 2024 14:28
- Change type hints from `Optional` to union types for clarity.
- Modify the `_create_evaluator` method to be static.
- Enhance test assertions to validate the result object and its properties.
@ammirsm ammirsm changed the title WIP feat(learning classifier) | make a learning classifier by itself feat(learning classifier) | make a learning classifier by itself Jul 21, 2024
@ammirsm ammirsm merged commit cfae458 into main Jul 25, 2024
3 checks passed
@ammirsm ammirsm deleted the amir/eng-32-featlearning-classifier-make-a-learning-classifier-by-itself branch July 28, 2024 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants