-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added text summarization notebook #317
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
merge main to the branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions and questions. Please also have Johnny review and approve before merging as he's working on tidying up this notebooks directory. Want to make sure this follows the planned style and structure instead of creating more stuff to clean up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Marjan-emd,
I'll probably move the file location as I reorganize the notebooks in this repo, but we can go ahead and get this merged for your blog. I'll make sure the link gets switched (there will be a bunch of broken links, so no worries here).
One request before we merge. Can we update this to the new SDK interface?
Here's what that would look like:
from gretel_client import Gretel
PROJECT = 'data-summarization'
DATASET_PATH = 'https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/Samsum-text-summerization-sample-1000.csv'
gretel = Gretel(project_name=f"{PROJECT}-llama-2-7b", api_key="prompt", validate=True)
trained = gretel.submit_train(
"natural-language",
data_source=DATASET_PATH
params={"steps": 1000},
)
trained.report.display_in_notebook()
You can pass the DataFrame as the data_source
if you prefer.
Maybe explicitly add the pretrained model parameter: from gretel_client import Gretel
PROJECT = 'data-summarization'
DATASET_PATH = 'https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/Samsum-text-summerization-sample-1000.csv'
LLM = "meta-llama/Llama-2-7b-chat-hf"
gretel = Gretel(project_name=f"{PROJECT}-llama-2-7b", api_key="prompt", validate=True)
trained = gretel.submit_train(
"natural-language",
data_source=DATASET_PATH,
pretrained_model=LLM,
params={"steps": 1000},
)
trained.report.display_in_notebook() |
Thanks for reviewing this. I totally forgot about adding the new SDK interface! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice – thanks, @Marjan-emd!
One last thing before you merge: will you put this at the top of the first markdown cell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All feedback comments were addressed.
Added this text generation notebook to the blueprints, since the only available one (Taylor Swift Lyrics) is not creating the text SQS and enables the old mpt-7b model.
This notebook will be called in the showcase text metrics blog and is added to the blog folder following up John's suggestion, here.