Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installed packages cloning issues #9

Closed
majdzr opened this issue Mar 3, 2020 · 4 comments
Closed

Installed packages cloning issues #9

majdzr opened this issue Mar 3, 2020 · 4 comments

Comments

@majdzr
Copy link

majdzr commented Mar 3, 2020

Hello,

I have the following situation where the cloning options installs the wrong version of a package (which eventually causes the experiment to fail, regardless of trains/trains-agent):

  • code is running from conda base venv

  • A requirements.txt file including torchvision as one of the packages (note, no version number). torchvision is just an example of a package.

  • A machine with already installed torchvision (0.4.2) and Pillow (5.4.1). Note that Pillow is not listed in the requirements.txt but a dependency of torchvision .

  • When I run this as a new task, everything runs smoothly. Trains logs under the installed packages the torchvision (0.4.2) but not Pillow.

  • However, when I clone it, trains-agent installs torchvision==0.4.2+cu100 from scratch, which depends on Pillow. However, as this is a new installation, it installs the latest Pillow 7.0 instead and ignores the 5.4.1 (which, again, appears in the pip list but not in trains installed packages).

Am I missing something? Isn't that the entire point of trains-agent? And of course, how to overcome this?

Thank you in advance!
Majd

@bmartinn
Copy link
Member

bmartinn commented Mar 3, 2020

Hi @majdzr ,

Are you using Pillow in your code? If you're not using it (and it's only used by torchvision), what you're seeing is the expected behavior - Trains logs the torchvision version, and Trains-Agent, when running the experiment, installs the same torchvision version and all its dependencies as specified by torchvision (which in this case did not require a specific Pillow version which caused the latest Pillow version to be installed).

The reason Pillow 5.4.1 in your base venv was ignored is that Trains-Agent was designed to install dependencies in a clean environment in order to avoid any issues related to pre-installed packages, and match package requirements as closely as possible.

If the specific Pillow version is important for you, you can always edit the cloned experiment and add the Pillow==5.4.1 version to the requirements section.

Also, if your base venv is based on conda, make sure to configure Trains-Agent to run your experiments using conda as the package manager. In order to do this, replace pip with conda in line 42 in the trains.conf file used by Trains-Agent.

@majdzr
Copy link
Author

majdzr commented Mar 3, 2020 via email

@bmartinn
Copy link
Member

bmartinn commented Mar 3, 2020

Well @majdzr , the way I see it, there are two options:

  1. You are using Pillow specifically in your code, in that case trains should (and would) add the packages your are using into the "Installed-Packages" section (If it did not, please open a bug report).
  2. Pillow is used by another packages, and not directly by your code. In that case, Pillow will not be part of the Installed-Packages (Think for example, that if you'd want to have it in the "Installed Packages", you are essentially doing a pip freeze and that is an overkill, and moreover might quickly break when trying to set up the environment on a remote machine)

In your case, torchvision support for Pillow >= 7 is broken in the specific torchvision version, see issues: #1846, #1835, #1774, #1726, #1718, #1714, #1712

In order to overcome compatibility issues like that, we enable manual editing of the Installed-Packages, but these are exceptions and should not happen most of the time.

The last thing to remember is that after trains-agent executes the experiment, it will update the "Installed Packages" to all the packages that were installed on the clean virtual-environment (basically pip freeze on the newly created virtual environment). That way, once you have a working setup, it can be fully reproduced with all of the packages, not just the ones your code uses directly.

The reasoning behind it is that while in development we have variety of packages and usually our environment contains a lot more than needed, so reproducing is slow and fragile. But if we start with only our direct used packages and those install their requirements, we end up with a slim stable environment that we can always reproduce.

Makes sense?

@majdzr
Copy link
Author

majdzr commented Mar 4, 2020

Hey @bmartinn,

Thanks again for the informative reply. It's much appreciated.
It makes a lot of sense. I'm actually doing what you have suggested; cloning and making sure it runs and then using this cloned experiments as a template for other variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants