Installed packages cloning issues #9

majdzr · 2020-03-03T14:43:26Z

Hello,

I have the following situation where the cloning options installs the wrong version of a package (which eventually causes the experiment to fail, regardless of trains/trains-agent):

code is running from conda base venv
A requirements.txt file including torchvision as one of the packages (note, no version number). torchvision is just an example of a package.
A machine with already installed torchvision (0.4.2) and Pillow (5.4.1). Note that Pillow is not listed in the requirements.txt but a dependency of torchvision .
When I run this as a new task, everything runs smoothly. Trains logs under the installed packages the torchvision (0.4.2) but not Pillow.
However, when I clone it, trains-agent installs torchvision==0.4.2+cu100 from scratch, which depends on Pillow. However, as this is a new installation, it installs the latest Pillow 7.0 instead and ignores the 5.4.1 (which, again, appears in the pip list but not in trains installed packages).

Am I missing something? Isn't that the entire point of trains-agent? And of course, how to overcome this?

Thank you in advance!
Majd

bmartinn · 2020-03-03T16:55:19Z

Hi @majdzr ,

Are you using Pillow in your code? If you're not using it (and it's only used by torchvision), what you're seeing is the expected behavior - Trains logs the torchvision version, and Trains-Agent, when running the experiment, installs the same torchvision version and all its dependencies as specified by torchvision (which in this case did not require a specific Pillow version which caused the latest Pillow version to be installed).

The reason Pillow 5.4.1 in your base venv was ignored is that Trains-Agent was designed to install dependencies in a clean environment in order to avoid any issues related to pre-installed packages, and match package requirements as closely as possible.

If the specific Pillow version is important for you, you can always edit the cloned experiment and add the Pillow==5.4.1 version to the requirements section.

Also, if your base venv is based on conda, make sure to configure Trains-Agent to run your experiments using conda as the package manager. In order to do this, replace pip with conda in line 42 in the trains.conf file used by Trains-Agent.

majdzr · 2020-03-03T19:40:48Z

Hey, Thanks a lot for your reply. I understand. However, how to avoid such a scenario? In this case, I was "lucky" as the newer version completely broke the experiment, but I can imagine that in a different scenario, the different version could lead to different undiscoverable behavior, which eventually could damage the reproducibility of the experiment.

…

On Tue, Mar 3, 2020, 5:55 PM Martin.B ***@***.***> wrote: Hi @majdzr <https://github.com/majdzr> , Are you using Pillow in your code? If you're not using it (and it's only used by torchvision), what you're seeing is the expected behavior - Trains logs the torchvision version, and Trains-Agent, when running the experiment, installs the same torchvision version and all its dependencies as specified by torchvision (which in this case did not require a specific Pillow version which caused the latest Pillow version to be installed). The reason Pillow 5.4.1 in your base venv was ignored is that Trains-Agent was designed to install dependencies in a clean environment in order to avoid any issues related to pre-installed packages, and match package requirements as closely as possible. If the specific Pillow version is important for you, you can always edit the cloned experiment and add the Pillow==5.4.1 version to the requirements section. Also, if your base venv is based on conda, make sure to configure Trains-Agent to run your experiments using conda as the package manager, by setting: agent { package_manager { type: conda } } in the trains.conf file used by Trains-Agent. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADU6DBLPW4PHE6WMNVBUUOLRFUY7RANCNFSM4LAL7H3Q> .

bmartinn · 2020-03-03T20:22:37Z

Well @majdzr , the way I see it, there are two options:

You are using Pillow specifically in your code, in that case trains should (and would) add the packages your are using into the "Installed-Packages" section (If it did not, please open a bug report).
Pillow is used by another packages, and not directly by your code. In that case, Pillow will not be part of the Installed-Packages (Think for example, that if you'd want to have it in the "Installed Packages", you are essentially doing a pip freeze and that is an overkill, and moreover might quickly break when trying to set up the environment on a remote machine)

In your case, torchvision support for Pillow >= 7 is broken in the specific torchvision version, see issues: #1846, #1835, #1774, #1726, #1718, #1714, #1712

In order to overcome compatibility issues like that, we enable manual editing of the Installed-Packages, but these are exceptions and should not happen most of the time.

The last thing to remember is that after trains-agent executes the experiment, it will update the "Installed Packages" to all the packages that were installed on the clean virtual-environment (basically pip freeze on the newly created virtual environment). That way, once you have a working setup, it can be fully reproduced with all of the packages, not just the ones your code uses directly.

The reasoning behind it is that while in development we have variety of packages and usually our environment contains a lot more than needed, so reproducing is slow and fragile. But if we start with only our direct used packages and those install their requirements, we end up with a slim stable environment that we can always reproduce.

Makes sense?

majdzr · 2020-03-04T08:10:41Z

Hey @bmartinn,

Thanks again for the informative reply. It's much appreciated.
It makes a lot of sense. I'm actually doing what you have suggested; cloning and making sure it runs and then using this cloned experiments as a template for other variants.

majdzr closed this as completed Mar 4, 2020

bmartinn mentioned this issue Mar 6, 2020

Missing an installed package (scipy) #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installed packages cloning issues #9

Installed packages cloning issues #9

majdzr commented Mar 3, 2020

bmartinn commented Mar 3, 2020 •

edited

Loading

majdzr commented Mar 3, 2020 via email

bmartinn commented Mar 3, 2020

majdzr commented Mar 4, 2020

Installed packages cloning issues #9

Installed packages cloning issues #9

Comments

majdzr commented Mar 3, 2020

bmartinn commented Mar 3, 2020 • edited Loading

majdzr commented Mar 3, 2020 via email

bmartinn commented Mar 3, 2020

majdzr commented Mar 4, 2020

bmartinn commented Mar 3, 2020 •

edited

Loading