Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL tutorial - add GPU header and open in colab button #2736

Merged
merged 3 commits into from
Jul 4, 2022
Merged

GPL tutorial - add GPU header and open in colab button #2736

merged 3 commits into from
Jul 4, 2022

Conversation

vblagoje
Copy link
Member

Related Issue(s): #2632, tutorial update

Proposed changes:
-Adds Colab GPU setup header and "open in colab" button to GPL tutorial

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@vblagoje vblagoje requested a review from agnieszka-m June 27, 2022 08:49
@vblagoje
Copy link
Member Author

Any idea why this PR fails now @ZanSara @bogdankostic and it passed before (2-3 weeks ago)?

@julian-risch
Copy link
Member

The tutorial might just need a pip install datasets.

@vblagoje
Copy link
Member Author

The tutorial might just need a pip install datasets.

How would you add it @julian-risch I thought adding it in the setup cells of the notebook (the very top) is enough.

@julian-risch
Copy link
Member

@vblagoje for the tests of the tutorials, we need it here as well: https://github.com/deepset-ai/haystack/blob/master/.github/workflows/tutorials.yml and here: https://github.com/deepset-ai/haystack/blob/master/.github/workflows/tutorials_nightly.yml or add it here

all =
for the python script (not the jupyter notebook).
What fails is the execution of https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.py if I am not mistaken.

@julian-risch
Copy link
Member

Btw, here you can see how the tutorial tests are executed: https://github.com/deepset-ai/haystack/blob/master/.github/utils/tutorials.sh

@ZanSara
Copy link
Contributor

ZanSara commented Jun 27, 2022

To add to @julian-risch answer, if GLP needs additional dependencies it should have its own dependency group. So we should add something like:

gpl = 
   datasets

in setup.cfg, and then add the gpl group to the all list. You can have a look how it was done for the audio group (b4fbb85 and 1641248). I can help you @vblagoje if you're unsure about something 🙂

@vblagoje
Copy link
Member Author

Thanks @julian-risch @ZanSara Tutorial 18 takes about 20-25 mins to run. Is that ok or should we add it to the no-run list?

@ZanSara
Copy link
Contributor

ZanSara commented Jun 27, 2022

I think 25 minutes is still manageable 👍 Nightly runs can take as long as 6 hours and the current one takes less than two, so we have plenty of room still.

The no-run list is for tutorials that can't possibly run on GH, for example the ones that that take several hours on CPU, or need too much ram.

@vblagoje
Copy link
Member Author

I think 25 minutes is still manageable 👍 Nightly runs can take as long as 6 hours and the current one takes less than two, so we have plenty of room still.

The no-run list is for tutorials that can't possibly run on GH, for example the ones that that take several hours on CPU, or need too much ram.

Oh, I thought we have a GPU for these nightly runs. Then there is no chance for this tutorial to finish. Training alone takes 15 min on GPU.

@vblagoje
Copy link
Member Author

Thanks a bunch @ZanSara @julian-risch for your assistance! I think it should be gtg now.

@julian-risch
Copy link
Member

LGTM! 👍 I tried it out on Google colab and it worked. I was just surprised that the installation command is slightly different than in our other tutorials. Here it is !pip install -q git+https://github.com/deepset/haystack.git, whereas all other tutorials use !pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]. So it seems that the colab extra is not needed anymore. We could remove that from all the other tutorials in a separate PR then.

@vblagoje
Copy link
Member Author

LGTM! 👍 I tried it out on Google colab and it worked. I was just surprised that the installation command is slightly different than in our other tutorials. Here it is !pip install -q git+https://github.com/deepset/haystack.git, whereas all other tutorials use !pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]. So it seems that the colab extra is not needed anymore. We could remove that from all the other tutorials in a separate PR then.

Thanks for trying it out @julian-risch I wanted to see the bare minimum of deps that would work and it seems that it stuck - the raw repo. Not sure tbh what is the best option here...

@ZanSara
Copy link
Contributor

ZanSara commented Jun 28, 2022

LGTM! +1 I tried it out on Google colab and it worked. I was just surprised that the installation command is slightly different than in our other tutorials. Here it is !pip install -q git+https://github.com/deepset/haystack.git, whereas all other tutorials use !pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]. So it seems that the colab extra is not needed anymore. We could remove that from all the other tutorials in a separate PR then.

I am very happy if we can remove the colab extra, but let's test it well before doing it. It also might be a good idea to keep it in haystack even if unused, because when Colab decides to upgrade or downgrade the version of grpcio again, we at least have an easy way to patch it quickly.

@ZanSara
Copy link
Contributor

ZanSara commented Jun 28, 2022

I think 25 minutes is still manageable +1 Nightly runs can take as long as 6 hours and the current one takes less than two, so we have plenty of room still.
The no-run list is for tutorials that can't possibly run on GH, for example the ones that that take several hours on CPU, or need too much ram.

Oh, I thought we have a GPU for these nightly runs. Then there is no chance for this tutorial to finish. Training alone takes 15 min on GPU.

Ok I thought 20-25 minutes was the time on CPU! Ok then it goes in the no-run list for now. We should spin up some self-hosted runners for these 4 tutorials anyway...

@vblagoje
Copy link
Member Author

@ZanSara Ok, in that case, I should undo the last commit of this PR which changes setup.cfg and add a commit of adding this tutorial to no-run list. WDYT?

@vblagoje vblagoje merged commit ffb7e4e into deepset-ai:master Jul 4, 2022
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* GPL tutorial - add GPU header and open in colab button

* Add GPL tutorial to run exclusion list
@vblagoje vblagoje deleted the gpl_tutorial_header branch February 28, 2023 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants