Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update materials cloud archive URL's #13

Closed
mbercx opened this issue Oct 29, 2020 · 26 comments · Fixed by #104
Closed

Update materials cloud archive URL's #13

mbercx opened this issue Oct 29, 2020 · 26 comments · Fixed by #104

Comments

@mbercx
Copy link
Member

mbercx commented Oct 29, 2020

When trying to install the default SSSP family, I ran into connectivity issues:

$ aiida-pseudo install sssp
Info: downloading selected pseudo potentials archive...  [FAILED]
Critical: HTTPSConnectionPool(host='legacy-archive.materialscloud.org', port=443): Max retries exceeded with url: /file/2018.0001/v4/SSSP_1.1_PBE_efficiency.tar.gz (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fedc762c5f8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

After touching base with @borellim, it turns out that the legacy archive had been shut down, since they didn't think anyone was still using it. It's back online now, but we should probably update the URL's so we are no longer relying on the legacy archive.

@borellim
Copy link
Member

borellim commented Oct 29, 2020

The corresponding page on the new Archive should be this (please double check): https://archive.materialscloud.org/record/2018.0001/v4
You should be able to copy the URLs for individual files by right-clicking on each one. Note that the files have been reorganized by @sphuber : see the notice on the page linked above.

@sphuber
Copy link
Contributor

sphuber commented Oct 29, 2020

Thanks @borellim. We had discussed this at the time that I made v4 of the archive entry. I did the reorganization and cleanup exactly for the purpose of this plugin. I made sure that all files had a consistent format and file naming scheme. This made it easy for this plugin to automatically build the URL based on say the desired, pseudo family protocol, functional and version. With the new archive version, the URL of each file gets a randomized ID with it, which made this approach impossible. This is why I kept the legacy version around and you said you were going to take a look into making this possible for the new version of the archive as well. Do you think this is still a possibility?

For example, the SSSP_1.1_PBE_efficiency.tar.gz file currently has the URL https://archive.materialscloud.org/record/file?record_id=23&file_id=d2ce4186-bf76-4e05-8b39-444b4da30273&filename=SSSP_1.1_PBE_efficiency.tar.gz. Would it be possible to get a URL without the random id file_id=d2ce4186-bf76-4e05-8b39-444b4da30273? Shouldn't the filename parameter be unique anyway?

@borellim
Copy link
Member

Thanks Sebastiaan for the summary of the situation. I will discuss it with Valeria and try to see if this feature can be added quickly. If not, I guess you always have the option of keeping a hardcoded table of UUIDs.
I'll get back to you.

@borellim
Copy link
Member

Hi @sphuber , @mbercx . Valeria and I agreed that this should be possible, but we can't start working on it for about a week, and then I guess it will take a few days but I can't say for certain. I'll leave it to you guys (and @giovannipizzi) to decide whether you want to move immediately to a hardcoded table of UUIDs, so that legacy-archive can be shut down, or to simply continue using legacy-archive: for me it's the same.

@sphuber
Copy link
Contributor

sphuber commented Oct 29, 2020

Thanks a lot for the response @borellim ! Well, if you think it is useful in any case to simplify the links for files of archive entries and you plan on working on this regardless of this issue for aiida-pseudo, and it would be possible to keep the legacy server running until that is implemented, then that would be the ideal solution for I guess. However, if this would be the only use-case for implementing this on the archive and/or running the legacy server until it is implemented takes an unreasonable amount of resources, then maybe it doesn't make sense. I cannot make that judgement though. Adding the hard-coded table is not that much work ultimately, it is just a far less elegant solution.

@giovannipizzi
Copy link
Member

Hi - we can keep the machine online for a few weeks, it's just that we start having too many and we need to start dropping the legacy ones.

I suggest to have a URL like this:
https://archive.materialscloud.org/record/file?record_id=23&filename=SSSP_1.1_PBE_efficiency.tar.gz
which redirects correctly.

Where maybe even the record_id is NOT 23 but instead the one of the form 2020.xx (in this case 2018.0001/v4), as this is the "known" public one, rather than 23.
And/or if possible have a record_doi=xxx as an alternative option.

However, as it's not clear how long a proper implementation can need to go into production, and also other websites (for other families) on which we don't have control could have a different naming strategy, I suggest that this library also is implemented in a way to have a table of URLs for cases like this.

@mbercx
Copy link
Member Author

mbercx commented May 11, 2021

Since the opening of this issue, there have been two new patch releases in the works (one still not released as of this moment): v1.1.1 and v1.1.2. Are we planning to provide support for separate patch releases as well, or do we stick with 1.0, 1.1, etc? Although I think having all releases available is cleaner, this will require quite a few support configurations over time, and the results should be the same for different patch releases, so installing e.g. 1.1.1 no longer makes sense now that 1.1.2 will be released.

EDIT: @asle85: is it tricky to set up/maintain these links? Just to have an idea if this causes a lot of overhead on your end. Thanks!

@elsapassaro
Copy link

@mbercx actually no, it's not tricky to maintain the links. I can add the files for the v1.1.2 below the ones for the other versions as done for the current version of the record on the Materials Cloud Archive. For completeness, I'll create an AiiDA export file also for v1.1.2.
If I understood correctly the number of files that we can add to an Archive record is not limited, so if in the future the list keeps growing there shouldn't be any issue, is that right @borellim ?

@borellim
Copy link
Member

borellim commented May 11, 2021

If I understood correctly the number of files that we can add to an Archive record is not limited, so if in the future the list keeps growing there shouldn't be any issue, is that right @borellim ?

I'm not aware of any limit, but maybe we should also ask @vgranata .
Let's keep in mind that we don't have a "donwload all" button, and having a very large number of files seems very user-unfriendly to me. I'll let you decide whether it's worth it in this particular case.

@mbercx
Copy link
Member Author

mbercx commented May 11, 2021

Let's keep in mind that we don't have a "donwload all" button, and having a very large number of files seems very user-unfriendly to me.

I do agree with this. I'm also rather in favour of just having one version per minor release, always making available the latest patch release. But that does raise the question on what happens if users try to install e.g. v1.1 with:

$ aiida-pseudo install sssp -v 1.1

Do we accept this command and install v 1.1.2, or raise and expect them to provide -v 1.1.2? Do we make the label SSSP/1.1.2/PBE/efficiency? This will also require maintenance for e.g. aiida-quantumespresso in the protocols.

@sphuber
Copy link
Contributor

sphuber commented May 11, 2021

I need some more information here first. What exactly changed in that 1.1.1 and 1.1.2 patch versions with respect to the previous? And what is (if there is any) the policy for when a new minor version is created as opposed to a patch?

@mbercx
Copy link
Member Author

mbercx commented May 11, 2021

The first was an additional & that was causing trouble, I think 🤔. @asle85 can explain further. 1.1.2 was necessary because the mesh size in the PP_HEADER did not match the one in PP_MESH. This is now fixed on the fly in the Quantum ESPRESSO code, but was still causing problems when running SIRIUS.

I'm not sure if a policy has already been set for SemVer, but I would say that:

  • Patch releases only fix bugs like the ones described above, that have no bearing at all on the configurations offered, or results obtained.
  • Minor releases do not change the results for existing pseudos in the family, but may increase the pseudo coverage (e.g. adding U) or the number of functionals supported.
  • Major releases involve changes that do affect the results obtained compared to previous versions. Obvious examples here are changing a pseudo or the recommended cutoffs.

@mbercx
Copy link
Member Author

mbercx commented May 12, 2021

Just pinging @csadorf that the QE app SsspInstallWidget will stop working when the legacy machine is shut down, and will have to update the base_url. Will open a PR when this happens.

@elsapassaro
Copy link

The first was an additional & that was causing trouble, I think thinking. @asle85 can explain further.

Indeed, there were unescaped & that were fixed on the fly in Quantum ESPRESSO. The only difference with v1.1 is that in v1.1.1 those & characters are escaped.

@sphuber
Copy link
Contributor

sphuber commented May 12, 2021

Ok thanks for the info. Then I guess that it doesn't really ever make sense for someone to be wanting to install an outdated patch version, because it would just include more breaking behavior. Only reason I could see is for people wanting to install the broken versions on purpose to test them. But then I think we can just force them to use the API instead of the CLI, which is always available.

If that makes sense, should we then maybe simply not add explicit patch versions in the CLI but simply always try to download the latest? So if people ask for v1.1 now, we install v1.1.2. Question there is whether we can write it such that this doesn't require a new release of aiida-pseudo when a new patch version becomes available on the MC. How are these patch versions released currently? Is the main v1.1 entry just updated and gets a new entry version? I think that is the case right, because I see v6 of the MCA entry online.

This seems to contain the archives for both 1.1 and 1.1.1. Given that what you have described as patch versions being really bug fixes, I am not sure if that is useful nor necessary. At least from the perspective of MCA, in v6 of the entry you could have simply overwritten the 1.1 archives with the fixed ones and not add them separately as 1.1.1 ones. The reasoning being that you could always go back to v5 to get the "original" v1.1. In this sense, there is a double versioning. Than again, this would remove some information for aiida-pseudo. Because now you could have two families of v1.1 installed in different profiles, thinking they are the same, but one could be really v1.1 and the other v1.1.1. Still, I am not sure if the current approach for the MCA is tractable since you will start to accumulate a lot of archives and data, that doesn't differ a whole lot.

@csadorf
Copy link

csadorf commented May 12, 2021

If that makes sense, should we then maybe simply not add explicit patch versions in the CLI but simply always try to download the latest? So if people ask for v1.1 now, we install v1.1.2

@sphuber If you use the standard Python requirement and versioning semantics this would already be implied. See here for an in-code example on how to test for this: https://github.com/aiidalab/aiidalab/blob/9fa4230e11773b92d587cc6133fe1f7dd7c505ca/aiidalab/app.py#L116

@sphuber
Copy link
Contributor

sphuber commented May 12, 2021

Thanks for the link @csadorf . I think the problem for us here is that we do not necessarily have an API to ask which versions of the SSSP are available. It is kind of hidden in the filenames that are present in the various versions of the Materials Cloud Archive entry. So we cannot really automate our command to dynamically check what patch version currently exists (and how to download it) for a certain minor version of the SSSP.

@elsapassaro
Copy link

If that makes sense, should we then maybe simply not add explicit patch versions in the CLI but simply always try to download the latest? So if people ask for v1.1 now, we install v1.1.2. Question there is whether we can write it such that this doesn't require a new release of aiida-pseudo when a new patch version becomes available on the MC. How are these patch versions released currently? Is the main v1.1 entry just updated and gets a new entry version? I think that is the case right, because I see v6 of the MCA entry online.

Actually no, the v1.1 has always been the same since version v3 of the MCA entry. The other versions were published because there were changes to other files (for instance, in v5 I migrated and fixed the AiiDA export file).
As the urls are built now, I'm afraid that the record_id in the url needs to be changed whenever there's a new version on the MCA. However, there can be a way to get the record_id of the latest version of the record, but I'm not sure if this would be simple. If this is possible then the approach to update v1.1 for all patch versions (without changing the name of the files) would work, and there would be no need to have a new release of aiida-pseudo every time. I think it should be possible but I'm not sure how complicated it is. Maybe @borellim or @vgranata can comment.

This seems to contain the archives for both 1.1 and 1.1.1. Given that what you have described as patch versions being really bug fixes, I am not sure if that is useful nor necessary. At least from the perspective of MCA, in v6 of the entry you could have simply overwritten the 1.1 archives with the fixed ones and not add them separately as 1.1.1 ones. The reasoning being that you could always go back to v5 to get the "original" v1.1. In this sense, there is a double versioning. Than again, this would remove some information for aiida-pseudo. Because now you could have two families of v1.1 installed in different profiles, thinking they are the same, but one could be really v1.1 and the other v1.1.1. Still, I am not sure if the current approach for the MCA is tractable since you will start to accumulate a lot of archives and data, that doesn't differ a whole lot.

I agree, and I would keep one list of files for v1.0 and only another one for v1.1 (so keeping only the latest patch release). Regarding the naming, I would prefer to have the full version v1.1.2, so it is clear to users that this differs from v1.1, and also in this case the user can go to the previous versions to get the "original" v1.1.
The downside is that this would force a new release of aiida-pseudo whenever we have a patch release of the files, since the name of the files would change. However, keep in mind that this downside effect will occur anyway if we are forced to update the record_id in the url.

@mbercx
Copy link
Member Author

mbercx commented May 12, 2021

The downside is that this would force a new release of aiida-pseudo whenever we have a patch release of the files, since the name of the files would change. However, keep in mind that this downside effect will occur anyway if we are forced to update the record_id in the url.

Are you forced to update the record id of the URL? If not, maybe you could add a file with a mapping of the minor version to the latest patch release. Then aiida-pseudo can first grab this file to see what the path to the latest patch release is for the requested minor version. In that way any patch updates only need changes on your end.

That said, it seems to me that releases of the SSSP are not that common, so maintenance on our end would be rather low. :)

@vgranata
Copy link

vgranata commented May 12, 2021

However, there can be a way to get the record_id of the latest version of the record, but I'm not sure if this would be simple.

Yes, this can be done

@mbercx
Copy link
Member Author

mbercx commented May 12, 2021

@vgranata Great! So we could just have a single URL that points to the current record_id which contains a (e.g. .yaml file) that maps the minor versions (1.0, 1.1) to the version of the latest patch version corresponding to that minor version (1.0.0, 1.1.2). If any change is made on your end, you would also change the .yaml mapping along with the filenames themselves.

In aiida-pseudo, then, we only support installation of minor versions (1.0, 1.1) as we do now, but we adapt the link to the files based on the patch version obtained from the .yaml file (with the message e.g. Checking for latest patch release...).

Just one question regarding "obtaining the latest record_id: Would this be done on the aiida-pseudo end? I.e. can we ping the Materials Cloud and figure our the latest version for the record_id we set now?

@giovannipizzi
Copy link
Member

Just to clarify (maybe I misunderstood, sorry in that case) - do you really need to ask the MC Archive for the most recent record_id? Can't you just map a given SSSP version to the correct record_id and filename? Once a new record (i.e. a new version) is published, the record_id and filename of that version will not change anymore.
New versions will get new record_ids; but anyway if there is a new version, the code should be aware of it, so I think you would need anyway to adapt manually the yaml with the mapping telling that the new version is now at a given URL (with a given record_id).

I mean, e.g. if you want a file specifically from v2 than you use e.g. this link, if you want files from v6 e.g. this and so on.

I agree that moving forward, we might not need to put all files of all versions in the most recent version (but probably now 1.0 and 1.1 both make sense so we want them to be visible in the most recent version)

@mbercx
Copy link
Member Author

mbercx commented May 13, 2021

Hi @giovannipizzi 👋! So the way I understood how we can make the publication of new patch releases work without having to update the aiida-pseudo is as follows:

  1. When the user asks to install a certain minor version (e.g. 1.1), ping the archive with the record id we set now, to see if there is there is a new published archive. Update the base_url in this case:

    base_url = f`https://archive.materialscloud.org/record/file?record_id={record_id}&filename=`
  2. Download the .yaml file stored in the record that maps the minor versions to the latest patch versions:

    1.0: 1.0
    1.1: 1.1.1 

    and check which is the latest patch version that corresponds to the minor version that the user has requested (e.g. 1.1.1 in this case).

  3. Format the full URL based on the base_url and the version obtained from step [2] (and the functional and protocol requested, of course):

    f'SSSP_{version}_{functional}_{protocol}'.json

That way patch releases can be entirely updated on the materials cloud, without any code changes needed in aiida-pseudo (I think?).

That said, I'd be happy to update aiida-pseudo whenever there is a new patch release of the pseudos. This doesn't happen often, and it's not a lot of work at all. Free commits, I say! ;)

@sphuber
Copy link
Contributor

sphuber commented May 13, 2021

That said, I'd be happy to update aiida-pseudo whenever there is a new patch release of the pseudos. This doesn't happen often, and it's not a lot of work at all. Free commits, I say! ;)

Just to note that the main objection here is not the added work on our side, it is rather that it would require a new release to get it into the hands of the users. And that would most likely require other plugins to update their dependencies given that now we are pre v1.0 and we recommend people pinning the minor version since we are changing a lot. Once we are stable this would be less of a problem, but otherwise we might have to start backporting and adding support releases, which all of a sudden make things complicated.

@mbercx
Copy link
Member Author

mbercx commented May 13, 2021

That said, I'd be happy to update aiida-pseudo whenever there is a new patch release of the pseudos. This doesn't happen often, and it's not a lot of work at all. Free commits, I say! ;)

Just to note that the main objection here is not the added work on our side, it is rather that it would require a new release to get it into the hands of the users. And that would most likely require other plugins to update their dependencies given that now we are pre v1.0 and we recommend people pinning the minor version since we are changing a lot. Once we are stable this would be less of a problem, but otherwise we might have to start backporting and adding support releases, which all of a sudden make things complicated.

Fair point! So I suppose my solution is worth it, then. If it works. 😏

@giovannipizzi
Copy link
Member

I see, thanks for the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants