Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 the variable data. #27

Open
gregchalmers opened this issue Mar 13, 2022 · 3 comments
Open

base64 the variable data. #27

gregchalmers opened this issue Mar 13, 2022 · 3 comments
Assignees

Comments

@gregchalmers
Copy link

would you guys be open to responding with the "data" encoded as base64 as well as JSON numbers / floats:

e.g.:
"data":[4.418702,4.45638,3.9872267,3.8685367]

and as base64:
"data":"LVuNQE6djkDOFn9AKZV3QA=="

We support this feature in the forecast-api and it has two benefits:

1 - you don't loose float precision.
2 - your requests end up between 30 to 40% smaller if you requests a good amount of data.

We do it via an accept http head: application/vnd.metocean.base64+json vs application/json, this way a client or service can opt into supporting it.

I've noticed netcdfs tend to be passed around internally between APIs, I think this change might remove the need to do this, and remove the HDF5 / NetCDF requirements for some internal clients i.e. you can stream into numpy arrays directly:

e.g.

import numpy.ma as ma
from base64 import b64decode

buffer = b64decode(var["data"])

data = ma.frombuffer(buffer, np.float64)
data = ma.frombuffer(buffer, np.float32)
data = ma.frombuffer(buffer, np.uint32)
@gregchalmers
Copy link
Author

I forgot to add we would need to describe the type to decode to in the response somehow, e.g.

"variables": {
  "freq": {
      "dtype": "<f64",
      "attributes": {}, 
      "dimensions": [
        "freq"
      ]
    },

or

"variables": {
  "freq": {      
      "attributes": {
        "dtype": "<f64"
      }, 
      "dimensions": [
        "freq"
      ]
    },

@aportagain
Copy link
Member

Interesting idea. Sounds appealing if we can figure out a way to have it optional / backwards-compatible / negotiable... Haven't thought it all the way through, keen to discuss sometime as well as hear from others, but here's a few questions or comments for now:

you don't loose float precision

I'm a big fan of that in principle :)

At the same time, I think in many cases we actually only need relatively low precision... many variables only need two or three digits before and one or two after the decimal point, right? More effort to generate, since that decision has to be made on a per-variable basis, but completely transparent for the consumers, and we (...) should already have that information the MDS anyway (each variable's "precision" attribute), and should not just make the uncompressed JSON smaller, but improve compression ratios too.

your requests end up between 30 to 40% smaller if you requests a good amount of data

Is that comparing gzip'ed in both cases?

I've noticed netcdfs tend to be passed around internally between APIs, I think this change might remove the need to do this, and remove the HDF5 / NetCDF requirements for some internal clients

So the main advantage would be allowing (some) clients or services to get rid of those library requirements?

you can stream into numpy arrays directly:

Do you mean "stream" as in actual piece-by-piece processing before the entire CF-JSON object has been transferred? That would require some additional restrictions, like having all the metadata first in both the top-level JSON object as well as the individual variables, right?

we would need to describe the type to decode to in the response

Yup, makes sense. Would be cool if we can stay compatible with NCO-JSON with that, I guess one of their "levels of pedanticness" :)

@gregchalmers
Copy link
Author

gregchalmers commented May 11, 2022

Yeah to make it backwards-compatible I would use the Accept http header if the client supplies vnd.metocean.cf-b64+json in Accept the server then replies using base64, otherwise keeps the existing JSON format.

For base64 binary / CF-JSON:

Client sends -> Accept: application/vnd.metocean.cf-b64+json
Server replies with -> Content-Type: application/vnd.metocean.cf-b64+json

For normal CF-JSON:

Client sends -> Accept: application/vnd.metocean.cf+json
Server replies with -> Content-Type: application/vnd.metocean.cf+json

OR

Client sends -> Accept: application/json
Server replies with -> Content-Type: application/json

Client could also use:
Accept: application/vnd.metocean.cf-b64+json;q=0.9, application/vnd.metocean.cf+json;q=0.8 meaning it prefers CF-B64, this options assumes the existing services handle using weights in the Accept header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants