Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Get data #68

Open
VeckoTheGecko opened this issue Oct 1, 2024 · 6 comments
Open

CLI: Get data #68

VeckoTheGecko opened this issue Oct 1, 2024 · 6 comments
Assignees

Comments

@VeckoTheGecko
Copy link
Collaborator

VeckoTheGecko commented Oct 1, 2024

virtualship fetch: Downloads the data needed for the simulation as an explicit step so that users can do it separately. If virtualship is ran without this, then it will go and fetch this data.

@VeckoTheGecko
Copy link
Collaborator Author

Also see if we can use one environment for everything:

Data can be downloaded with the `download_data.py` script. For now a different conda env is needed for downloading, see comments in the script.

@iuryt
Copy link
Contributor

iuryt commented Nov 11, 2024

Is someone working on this? If not, I can try to help.
Should we add new variables to schedule.yaml, such as bbox for the area_of_interest?
The rest would be to basically adapt scripts/download_data.py to here

def fetch(path):
"""Entrypoint for the tool."""
raise NotImplementedError("Not implemented yet.")

Let me know your thoughts.

@VeckoTheGecko
Copy link
Collaborator Author

VeckoTheGecko commented Nov 12, 2024

I'm happy to work on this as well

Should we add new variables to schedule.yaml, such as bbox for the area_of_interest?

I think that would be good, but perhaps instead of in the config maybe we can have it as arguments to the virtualship fetch --bbox_min=lat,lon --bbox-max=lat,lon? And then if they don't provide them (i.e., just virtualship fetch it can suggest a bounding box for the user based on their schedule + some buffer). I think we can't purely go on waypoints in the schedule since I assume the students would be changing on the fly throughout a class, and we wouldn't want them having to redownload the data mid-exercise.

@ammedd, in terms of data fetching is the download_data.py script all that is needed? Was there any other data fetching as part of virtualship?

EDIT: Hmm, also the question of time domain then. Perhaps it would just be easiest to add spatial and temporal domain to the schedule config file

@iuryt
Copy link
Contributor

iuryt commented Nov 12, 2024

Yes, I lean more towards adding this to the config file. For example, if we have an experiment that relies only on drifters, the study area and time range will probably be considerably larger than the schedule.

If you haven't started working on this, I can initiate a PR and we can collaborate from there. What do you think?

@VeckoTheGecko
Copy link
Collaborator Author

@iuryt Sounds good! Happy for you to do a PR and I can jump in in review

@ammedd
Copy link
Collaborator

ammedd commented Nov 13, 2024

The download_data is all I used before. It was based on an area of interest based on the sampling stations, but indeed, it would be nice to extend this a bit. And in case of Argo/drifter deployment It downloaded a separate dataset that had an area 3degrees more on each side to allow for 3 additional weeks of data from the deployed instruments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants