Add/Implement data source mirrors #26

observingClouds · 2020-12-23T00:08:58Z

Hi guys,
I'm just in the process of uploading a new version of the radiosonde dataset. This time, it is not a tar archive, but the level1 and level2 data can be directly accessed through the AERIS THREDDS server.

@leifdenby do you want to update your zarr files, or change to the AERIS THREDDS server (https://observations.ipsl.fr/thredds/catalog/EUREC4A/PRODUCTS/MERGED-MEASUREMENTS/RADIOSOUNDINGS/v3.0.0/level2/catalog.html), or even better add both sources for a better availability in case a server is down.

I make an announcement in the data-channel, when the upload is final.
Cheers!

leifdenby · 2021-01-18T14:57:43Z

thanks @observingClouds! I was actually thinking that maybe I should remove my zarr-based mirrors from the main repository and we just use AERIS directly instead. What do you think? I'm happy to keep my zarr-based catalog available, but maybe I'll put that on a separate repository that we can link to from this main one? Maybe in mirrors.leifdenby_zarr or something like that? What do you think @d70-t?

observingClouds · 2021-01-18T16:11:39Z

Well, as long as you could keep the files up-to date (and I don't see that I should reprocess them soon) and/or make sure they see which version they are using (DOI), it might actually be great to still have that resource in case AERIS is down. It would be great, if one could have several possible resources in the catalog and intake switches (semi-)automatically, but I guess this is not yet implemented? You guys probably know more.

d70-t · 2021-01-19T01:17:50Z

I think references to Aeris should go into the catalog. However, having an active backup is also a very good idea. There is already some progress in intake/intake#557 on providing multiple locations for one dataset, but it is not done yet.

Having a mirror structure could be an addition, but I am not so sure if we really want to have that. A result of this would be that users would have to specify some form of path manually again and most likely we'll end up in having a couple of scripts passed around which only access the "mirror" tree. This can become particularly problematic if the mirror is not complete, such that some datasets will effectively work only on the main tree while others will probably only work on the mirror tree...

leifdenby · 2021-03-15T17:04:49Z

So, in the meantime (before mirroring is available) we could just go ahead and replace the entry backed by my server with the data on AERIS? I think adding a data_mirrors for now might be quite nice to keep this "backup" available. Does that sound ok?

d70-t · 2021-03-15T17:18:43Z

Puh... I really find this one hard to decide.

mirroring is absolutely something we should have. The OPeNDAP endpoint at Aeris had an uptime of 67% during the last two weeks.
having more than one possible path to a dataset of which sometimes one and sometimes the other works kind of defeats the purpose of the catalog (which to my mind is saving the user from pasting in urls or custom root folders or the like)

I have to 🤷 and hope that others have better arguments.

leifdenby · 2021-03-15T18:16:05Z

having more than one possible path to a dataset of which sometimes one and sometimes the other works kind of defeats the purpose of the catalog (which to my mind is saving the user from pasting in urls or custom root folders or the like)

Ah yes, you're absolutely right. I hadn't thought of that. We could instead adopt a convention of adding {product}__mirror entries in the catalog? E.g. we'd have radiosondes/bco__mirror. It's not pretty, but at least it's "nearby" in the catalog tree, so should make it easier to find.

d70-t · 2021-03-15T18:44:29Z

We could instead adopt a convention of adding {product}__mirror entries in the catalog?

I don't know if this makes the situation better or worse... If we'e implement this, then a user would need to access the data using something like:

def reliable_to_dask(cat, entry):
    try:
        return cat[entry].to_dask()
    except:
        return cat[f"{entry}__mirror"].to_dask()

cat = eurec4a.get_intake_catalog()
### some more code
ds = reliable_to_dask(cat.ATR, "track")

This has the potential of not creating a ton of hard-coded cat = cat.mirror lines, but it also is not entirely beautiful. And if in stead people start to sprinkle around things like ds = cat.ATR.track__mirror18 or the like, this will become horrible.

This includes a change from denby.io to Aeris. see eurec4a#26 for some discussion about this

d70-t mentioned this issue Jul 27, 2021

Update ATOMIC URLs #71

Merged

d70-t added a commit to d70-t/eurec4a-intake that referenced this issue Jul 27, 2021

updated radiosondes to v3

47807b2

This includes a change from denby.io to Aeris. see eurec4a#26 for some discussion about this

d70-t mentioned this issue Jul 27, 2021

updated radiosondes to v3 #72

Merged

observingClouds changed the title ~~New version of radiosonde dataset available~~ Add/Implement data source mirrors Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add/Implement data source mirrors #26

Add/Implement data source mirrors #26

observingClouds commented Dec 23, 2020

leifdenby commented Jan 18, 2021

observingClouds commented Jan 18, 2021

d70-t commented Jan 19, 2021

leifdenby commented Mar 15, 2021

d70-t commented Mar 15, 2021

leifdenby commented Mar 15, 2021

d70-t commented Mar 15, 2021

Add/Implement data source mirrors #26

Add/Implement data source mirrors #26

Comments

observingClouds commented Dec 23, 2020

leifdenby commented Jan 18, 2021

observingClouds commented Jan 18, 2021

d70-t commented Jan 19, 2021

leifdenby commented Mar 15, 2021

d70-t commented Mar 15, 2021

leifdenby commented Mar 15, 2021

d70-t commented Mar 15, 2021