Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geolocation to object store config #16370

Open
pauldg opened this issue Jul 5, 2023 · 6 comments
Open

Add geolocation to object store config #16370

pauldg opened this issue Jul 5, 2023 · 6 comments

Comments

@pauldg
Copy link
Contributor

pauldg commented Jul 5, 2023

To allow for smarter job scheduling a mechanism for tracking geographical location of object stores and compute destinations, allowing the job scheduling decision process (eg through TPV) to choose compute resources with “nearby” storage. This would especially be relevant in the context of #14073 , #15875 and the EuroScienceGateway project.

@bgruening
Copy link
Member

@pauldg lets at first try out what information we need as an annotation. For example we could put next to the TPV configfiles a simple mapping file that maps object-store-IDs to geolocations/IPs etc ...
Simple as:

object_store_italy_S3_01:
- latitude="50.0689816"
- longitude="19.9070188"
- other_stuff_that_we_find_useful: foobar

TPV can then read this small mapping file and we can experiment with different implementations. When we have a better feeling what might work and what not we could then proceed to move it into Galaxy, e.g. like https://github.com/usegalaxy-eu/infrastructure-playbook/pull/656/files#diff-892ad16ab91d143f9c5b8360026aa9f8e7f4c833c25760e26bf29a5624e095c3R167

Do you think this is a good short-term solution? We can keep this issue open and come back later and update with concreate examples.

@pauldg
Copy link
Contributor Author

pauldg commented Jul 5, 2023

Yes, starting from the TPV side of things seems like a good approach as well.

@hexylena
Copy link
Member

hexylena commented Jul 6, 2023

is lat/lon more useful? i would've expected AS numbers, or better yet a "path cost" would be more useful to describe the path between the central server and the data, and between various compute nodes and data storage. e.g. you'd have a cost of 0 within the DC. You could probably populate that pretty easily via pinging every other node to generate those costs

@bgruening
Copy link
Member

We do not know yet, AS will be tested, as well as HOPs etc... I guess we need to experiment with the file and try different things.

@hexylena
Copy link
Member

hexylena commented Jul 6, 2023

ttl, yes also good. physical location should only matter at distances of continents, within country/EU DC-DC connections should predominate and depends more on their peering

@pauldg
Copy link
Contributor Author

pauldg commented Jul 10, 2023

Here's a small PoC using lon/lat added to tpv's pytest suite: galaxyproject/total-perspective-vortex#108

@martenson martenson changed the title [Feature request] Add geolocation to object store config Add geolocation to object store config Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants