Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience methods for listing installed force fields #477

Closed
jchodera opened this issue Dec 22, 2019 · 15 comments
Closed

Convenience methods for listing installed force fields #477

jchodera opened this issue Dec 22, 2019 · 15 comments

Comments

@jchodera
Copy link
Member

jchodera commented Dec 22, 2019

Is your feature request related to a problem? Please describe.
Currently, it's hard for users to perform operations like

  • find out which force fields are available (installed) in any plugin directories
  • find out which force field series (openforcefield, smirnoff99Frosst) are installed
  • find the latest version of a particular series (e.g. the latest smirnoff99Frosst is smirnoff99Frosst-1.1.0.offxml) to use the most up-to-date force field
  • find the location on the filesystem where a force field is actually installed

Describe the solution you'd like
It would be useful to have a convenience method for listing all installed force field files located in plugin directories, such as:

>>> print(from openforcefield.typing.engines.smirnoff import ForceField)
>>> print(ForceField.available_force_fields)
['openforcefield-1.0.0.offxml', 'smirnoff99Frosst-1.1.0.offxml', ...]

We could go further and list all installed force field series (truncating the semantic versioning):

>>> print(ForceField.available_force_field_series)
['openforcefield', 'smirnoff99Frosst']

and allow us to retrieve the latest versions of a specific series

>>> print(ForceField.get_latest_version('openforcefield'))
'/full/path/to/openforcefield-1.0.0.offxml'

or retrieve full paths

>>> print(ForceField.get_full_path('smirnoff99Frosst-1.1.0.offxml'))
'/full/path/to/smirnoff99Frosst-1.0.0.offxml'

Describe alternatives you've considered
In principle, a user could access some of the private methods and attributes in openforcefield.typing.engines.smirnoff, but the API for these is not guaranteed to be stable.

Additional context
I am working on incorporating support for the Open Force Field Initiative force fields into the OpenMM ForceField object in http://github.com/choderalab/openmm-forcefields

@jchodera
Copy link
Member Author

This is made more important because currently, we build and install the https://github.com/openforcefield/smirnoff99Frosst and https://github.com/openforcefield/openforcefields repos as compressed eggs. Because if this, the .offxml files don't actually exist as files on the filesystem, making it difficult for other tools to use them.

I might just not know how to use pkg_resources to automatically manage unzipping and caching of force field files. Is there an easy way to do this?

@j-wags
Copy link
Member

j-wags commented Feb 15, 2020

Extending this idea/spec:

  • The lexical sort on version numbers can be undefined if some part of the version is not a number (eg 1.1.0RC2, 1.2.3a, etc)
  • There will be an optional ForceField.available_force_fields(validate=False) kwarg, which attempts to load each FF (squelching warnings by lowering the logging level) and does not return FFs that can not be loaded.
  • The above will be powered by a new static method ForceField.validate(file_path, verbose=False, <ForceField __init kwargs>) which returns bool. If verbose is False, warnings are not squelched.

@jchodera
Copy link
Member Author

The lexical sort on version numbers can be undefined if some part of the version is not a number (eg 1.1.0RC2, 1.2.3a, etc)

There is an unambiguous order specified by PEP 440. See the information on ordering. There are packages that handle sorting for you.

What does ForceField.validate do in your proposal above?

@jchodera
Copy link
Member Author

jchodera commented Mar 3, 2020

Now that openff-1.1.0 has been merged (and will soon be released), it would be super valuable if we could create an autodiscovery scheme!

@jchodera
Copy link
Member Author

@j-wags @dotsdl @mattwthompson : Now that we are releasing more force fields via the openforcefields package, it would be great if there was a programmatic way to get a list of all installed SMIRNOFF (.offxml) force fields that could be loaded by ForceField in the plugin paths so that I don't have to keep manually adding force field names to openmmforcefields.

@mattwthompson
Copy link
Member

I'd be happy to take this on

I'm no master of pkg_resources but there may be hope for accessing uncompresse .offxml files?

image

@trevorgokey
Copy link
Collaborator

@mattwthompson I wrote this awhile back to search for a user-specified ff string to load. You may find parts of it helpful: https://github.com/MobleyLab/openff-spellbook/blob/master/offsb/op/openforcefield.py#L27-L36

        from pkg_resources import iter_entry_points
        for entry_point in iter_entry_points(group='openforcefield.smirnoff_forcefield_directory'):
            pth = entry_point.load()()[0]
            abspth = os.path.join(pth, filename)
            print("Searching", abspth)
            if os.path.exists( abspth):
                self.abs_path = abspth
                print("Found")
                break

I adapted it from an example laying around that @j-wags might have at the tip of his fingers. I can't seem to find it.

@j-wags
Copy link
Member

j-wags commented Apr 14, 2020

I'm still, embarrassingly, not that great at python package config, and I treat the entry point access methods as olde incatations whenever I need them. Here's the ur-instance of entry point usage that I keep coming back to: https://github.com/openforcefield/openforcefield/blob/master/openforcefield/typing/engines/smirnoff/forcefield.py#L63-L85

@mattwthompson, I'd be happy if you tackled this Issue. I think we're all equally inexperienced with entry points.

@dotsdl
Copy link
Member

dotsdl commented Apr 14, 2020

I could take this on. I've done similar things in the past for other packages with data files.

@jchodera
Copy link
Member Author

Just a heads-up that I've cut openmmforcefields 0.7.3, which contains a hack to auto-discover which force fields are installed:
https://github.com/openmm/openmmforcefields/releases/tag/0.7.3

Once this issue is implemented and there are API methods to query what force fields are available and retrieve the corresponding file paths for available force fields, I can remove some ugly hacks that were needed to make this work.

But the good news is that the QCEngine based on this openmmforcefields should have access to all installed released force fields.

@jchodera
Copy link
Member Author

Ah! I see that @mattwthompson added #643!

Is there a way to modify the behavior so that the most current force fields (maybe based on a date tag?) are listed first? That way, it's easier to pick the latest force field as a sensible default.

@jchodera
Copy link
Member Author

Is there also a way to get the force field file associated with a given filename? I have this clunky method I'd like to replace since it uses a private openforcefield toolkit API:

    def _search_paths(self, filename):
        """Search registered openforcefield plugin directories

        Parameters
        ----------
        filename : str
            The filename to find the full path for

        Returns
        -------
        fullpath : str
            Full path to identified file, or None if no file found

        .. todo ::

           Replace this with an API call once this issue is addressed:
           https://github.com/openforcefield/openforcefield/issues/477

        """
        # TODO: Replace this method once there is a public API in the openforcefield toolkit for doing this

        from openforcefield.utils import get_data_file_path
        from openforcefield.typing.engines.smirnoff.forcefield import _get_installed_offxml_dir_paths

        # Check whether this could be a file path
        if isinstance(filename, str):
            # Try first the simple path.
            searched_dirs_paths = ['']
            # Then try a relative file path w.r.t. an installed directory.
            searched_dirs_paths.extend(_get_installed_offxml_dir_paths())

            # Determine the actual path of the file.
            # TODO: What is desired toolkit behavior if two files with the desired name are available?
            for dir_path in searched_dirs_paths:
                file_path = os.path.join(dir_path, filename)
                if os.path.isfile(file_path):
                    return file_path
        # No file found
        return None

@jchodera
Copy link
Member Author

One more request: A variant of the behavior where we can also only return the force field names (without .offxml or the _unconstrained variant suffixes) would be handy!

Here's what I originally had:

    @ClassProperty
    @classmethod
    def INSTALLED_FORCEFIELDS(cls):
        """Return a list of the offxml files shipped with the openforcefield package.
        Returns
        -------
        file_names : str
           The file names of available force fields
        .. todo ::
           Replace this with an API call once this issue is addressed:
           https://github.com/openforcefield/openforcefield/issues/477
        """
        # TODO: Replace this method once there is a public API in the openforcefield toolkit for doing this
        # TODO: Impose some sort of ordering by preference?

        from openforcefield.utils import get_data_file_path
        from openforcefield.typing.engines.smirnoff.forcefield import _get_installed_offxml_dir_paths
        from glob import glob

        file_names = list()
        for dir_path in _get_installed_offxml_dir_paths():
            file_pattern = os.path.join(dir_path, '*.offxml')
            file_paths = [file_path for file_path in glob(file_pattern)]
            for file_path in file_paths:
                basename = os.path.basename(file_path)
                root, ext = os.path.splitext(basename)
                # Only add variants without '_unconstrained'
                if '_unconstrained' not in root:
                    file_names.append(root)
        return file_names

@mattwthompson
Copy link
Member

I'd envision the following, which I think would satisfy your needs as laid out

Two new arguments to get_available_force_fields:

sort_by_date=False

This would attempt to parse the date tag in the force field, either by loading it up completely (a source of errors/warnings) or do some risky parsing of the top of the file. I'd advocate for this to be turned off by default to avoid warnings/errors associated with loading un-supported force fields (unless SMIRNOFF up-converters are implemented) and avoid the ~1-200 ms load time for each file.

ignore_unconstrained=False

This would just look for _unconstrained and not append it to the returned list, pretty much what you have. Prefer turned off by default so as to make all data available with default arguments, but should not impact performance.

And a new utility function along the lines of

def get_full_path_from_ff_name(ff_name):
    # pretty much copying your code
    return full_path

i.e. get_full_path_from_ff_name('openff-1.2.0') would return '/full/path/to/file/buried/in/the/weeds/of/packaging/openff-1.2.0'

@jchodera
Copy link
Member Author

This all sounds good!

sort_by_date=False

The name of this argument doesn't indicate whether this sorting would yield the most recent first, or oldest first. Is there a better name, like sort="newest-first"?

I'd advocate for this to be turned off by default to avoid warnings/errors associated with loading un-supported force fields (unless SMIRNOFF up-converters are implemented) and avoid the ~1-200 ms load time for each file.

Good point. I suppose an alternative would be to just lean on semantic versioning for ordering, but we wouldn't know what order to list force field series (smirnoff99Frosst, openff) in.

Perhaps a further alternative would be to have an API for returning available FF series, then returning the installed files in those series using semantic versioning to sort?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants