Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add torchbench example: WIP #388

Merged
merged 7 commits into from
Jun 30, 2023
Merged

Conversation

rakataprime
Copy link
Contributor

Work in progress PR for adding torchbench gpu benchmarking sdl

@anilmurty
Copy link
Contributor

SDL needs to be updated to include the "vendor" key as shown here https://docs.akash.network/testnet/example-gpu-sdls/specific-gpu-vendor
add:

          attributes:
            vendor:
              nvidia:

@rakataprime
Copy link
Contributor Author

SDL needs to be updated to include the "vendor" key as shown here https://docs.akash.network/testnet/example-gpu-sdls/specific-gpu-vendor add:

          attributes:
            vendor:
              nvidia:

i have updated for the attributes. I think it would be best to prepackage a notebook for the benchmarks so that people just have to click play all to get the benchmarks. We could probably use shebang in the first cell like !run.sh or !python /workspace/benchmark/install.py models hf_bert hf_Bert_large resnet50 tacotron2 && pytest /workspace/benchmark/test_bench.py -k "(hf_bert or hf_bert_Large or resnet50 or tacotron2)" --ignore_machine_config

that would be the most minimal. You could also persist the json stored benchmarks and try to make some pretty plots too, but if time is of the essence I think we could just add jupyter to the requirments.txt with the minimal template notebook.

@anilmurty
Copy link
Contributor

That would be great @rakataprime - would you like to add to this PR itself?

@anilmurty
Copy link
Contributor

by the way - if you want to test deployments you can use one of these client options https://docs.akash.network/testnet/gpu-testnet-client-instructions - we have a few GPU providers on the testnet now https://akash.praetorapp.com/provider-status (select "testnet" in the "Network Selection" dropdown to see them)

@rakataprime
Copy link
Contributor Author

That would be great @rakataprime - would you like to add to this PR itself?

I could do either this pr or a new one. Do you have the requirements for what information you want included in that notebook other than the benchmarks? eg github, username, email, wallet address, etc ?

@anilmurty
Copy link
Contributor

anilmurty commented Jun 21, 2023

we will already be collecting those details via a typeform (right @brewsterdrinkwater ?) but wouldn't hurt to ask for github ID, Discord Handle, and wallet address, I think.

@anilmurty
Copy link
Contributor

in fact I think it may help correlate things for awards

@anilmurty
Copy link
Contributor

@rakataprime - not sure if you are waiting on a response here but we're ok either way re. collecting user info in the jupyter notebook

@rakataprime
Copy link
Contributor Author

@rakataprime - not sure if you are waiting on a response here but we're ok either way re. collecting user info in the jupyter notebook

I think if we test it and it works well enough we would be ready to merge.

@anilmurty
Copy link
Contributor

Thanks @rakataprime - have you tried this on the testnet? There are 26 GPUs available there right now https://akash.praetorapp.com/provider-status?chainid=testnet-02

Copy link
Contributor

@chainzero chainzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes to GPU profile and exposed port have been made. SDL looks good and have tested successfully.

@chainzero chainzero merged commit e69560b into akash-network:master Jun 30, 2023
@anilmurty
Copy link
Contributor

Thanks again @rakataprime and thanks @chainzero !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants