Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added text summarization notebook #317

Merged
merged 12 commits into from
Oct 26, 2023
Merged

added text summarization notebook #317

merged 12 commits into from
Oct 26, 2023

Conversation

Marjan-emd
Copy link
Contributor

@Marjan-emd Marjan-emd commented Oct 20, 2023

Added this text generation notebook to the blueprints, since the only available one (Taylor Swift Lyrics) is not creating the text SQS and enables the old mpt-7b model.

This notebook will be called in the showcase text metrics blog and is added to the blog folder following up John's suggestion, here.

@Marjan-emd Marjan-emd requested a review from kboyd October 20, 2023 00:20
@vercel
Copy link

vercel bot commented Oct 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
gretel-blueprints ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 26, 2023 0:14am

merge main to the branch
Copy link
Contributor

@kboyd kboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions and questions. Please also have Johnny review and approve before merging as he's working on tidying up this notebooks directory. Want to make sure this follows the planned style and structure instead of creating more stuff to clean up later.

docs/notebooks/content/Text-Summerization-gpt.ipynb Outdated Show resolved Hide resolved
docs/notebooks/content/Text-Summerization-gpt.ipynb Outdated Show resolved Hide resolved
docs/notebooks/content/Text-Summerization-gpt.ipynb Outdated Show resolved Hide resolved
docs/notebooks/content/Text-Summerization-gpt.ipynb Outdated Show resolved Hide resolved
Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Marjan-emd,

I'll probably move the file location as I reorganize the notebooks in this repo, but we can go ahead and get this merged for your blog. I'll make sure the link gets switched (there will be a bunch of broken links, so no worries here).

One request before we merge. Can we update this to the new SDK interface?

Here's what that would look like:

 from gretel_client import Gretel

PROJECT = 'data-summarization'
DATASET_PATH = 'https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/Samsum-text-summerization-sample-1000.csv'

gretel = Gretel(project_name=f"{PROJECT}-llama-2-7b", api_key="prompt", validate=True)

trained = gretel.submit_train(
    "natural-language",
    data_source=DATASET_PATH
    params={"steps": 1000}, 
 )

trained.report.display_in_notebook()

You can pass the DataFrame as the data_source if you prefer.

@johnnygreco
Copy link
Contributor

johnnygreco commented Oct 25, 2023

Maybe explicitly add the pretrained model parameter:

from gretel_client import Gretel

PROJECT = 'data-summarization'
DATASET_PATH = 'https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/Samsum-text-summerization-sample-1000.csv'
LLM = "meta-llama/Llama-2-7b-chat-hf"

gretel = Gretel(project_name=f"{PROJECT}-llama-2-7b", api_key="prompt", validate=True)

trained = gretel.submit_train(
    "natural-language",
    data_source=DATASET_PATH,
    pretrained_model=LLM,
    params={"steps": 1000},
 )

trained.report.display_in_notebook()

@Marjan-emd
Copy link
Contributor Author

Maybe explicitly add the pretrained model parameter:

from gretel_client import Gretel

PROJECT = 'data-summarization'
DATASET_PATH = 'https://gretel-datasets.s3.us-west-2.amazonaws.com/Text-dataset/Samsum-text-summerization-sample-1000.csv'
LLM = "meta-llama/Llama-2-7b-chat-hf"

gretel = Gretel(project_name=f"{PROJECT}-llama-2-7b", api_key="prompt", validate=True)

trained = gretel.submit_train(
    "natural-language",
    data_source=DATASET_PATH,
    pretrained_model=LLM,
    params={"steps": 1000},
 )

trained.report.display_in_notebook()

Thanks for reviewing this. I totally forgot about adding the new SDK interface!
Just changed the model to the regular Llama-2 instead of the chat since I did my experiments of that one, though there should not be a huge change in the results.

Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice – thanks, @Marjan-emd!

One last thing before you merge: will you put this at the top of the first markdown cell?

Open In Colab

Copy link
Contributor Author

@Marjan-emd Marjan-emd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All feedback comments were addressed.

@Marjan-emd Marjan-emd merged commit 0f4f3c1 into main Oct 26, 2023
4 checks passed
@Marjan-emd Marjan-emd deleted the me/RDS-736 branch October 26, 2023 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants