-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDL latency #504
Comments
I am happy to defer to your recollection of what the normal CDL release date is -- I don't have any recollections of their release dates. Based on @ags-tolson comments next to the 426, he may have just been off by 30 days. Moving to |
If the data's released "in February" then you can't assume Feb 1, you have to assume Feb 28. Thats Who knows if that's the right way to do it, but that's what I was thinking. |
Right. That's a solid "i'm not going to ask for things that might not be there" latency number. I think Rob has turned more toward using a "only stop me from asking for things that there is 0 possibility that they exist" number. Right @bhbraswell ? |
I'm just noting that with the current value of latency parameter you are almost guaranteed to have some days or weeks where CDL data are available, but GIPS will not retrieve them, for example now. And this time right after the data are made available for some people might be the most important time to get it. I think I am partly responsible for the latency parameter but am not sure it is useful. Having a fetch fail because the data aren't ready yet, to me, is basically the same as having a fetch fail because the data were never collected. In any case this isn't a blocker for me because I changed the value to zero in my copy. Thanks for the feedback. |
Given your suggestion that it tends to be available in early February, and general agreement that you have a valid use case, I say it be made an env or settings configurable parameter. Your argument for 396 or less, is solid. The only risk in going lower is annoying the data provider, and possibly getting banned. (Think Google 503s, or remember when prism blacklisted us because someone Cron job mirrored the same 2 years of data every weekend?) |
Thanks Ian. This is obviously not the biggest deal in the world, sorry for taking so much of your time on it. I think eventually some sort of override switch is probably the way to go. I know CDL is sort of a weird case, but in general I think a lot of users will be interested in absolutely the lowest latency possible so maybe either trying to err on the side of low latency parameter values, or some sort of periodic review of the parameters might be useful. |
I agree. I think that the right position is to have default latency settings that are safe-guards against likely problematic queries (CDL2018 before 2019-1-1), and environment/gips.settings configurations that will allow people to (tuorum periculo) run in a manner that might result in aggravating data providers. The only reason for defaulting to the safeguarded mode is for naive users and for in an automated setting -- i.e. a pipeline that retries a job until it succeeds, but it isn't going to succeed for 396 days. |
How about:
|
I think that 426 days might be too long. Unless I don't understand how this variable is used, I assume we are comparing the current date with requested date (always Jan 1 of the requested year). For example I know that CDL for 2018 was just released which was a little later than normal but still only about 410 days in.
The text was updated successfully, but these errors were encountered: