-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserializing into spec.Swagger
is almost 20x slower than deserializing into map[string]interface{}
#315
Comments
/assign @alexzielenski |
I will try out Jordan's idea with your benchmark for spec.Swagger and see post results back here |
Can you please attach the api/openapi-spec/swagger.json file to the issue? Or link to it? |
protobuf has a nice JSON streaming API, albeit hidden in an internal package .... maybe we should benchmark & consider using (a copy of) that? |
Tried experiment with Jordan's suggestion. Here is a comparison of my changes along with Antoine's benchmark: master...alexzielenski:kube-openapi:80e24238cea362c6977be5ab3b0bf025e5a6d9e9 On my machine runs in 72ms to unmarshal JSON to spec.Swagger via
|
that seems way faster... how does that compare on your machine to the unmarshal times for:
|
on my same machine
423ms to |
Yeah, I get about the same thing
We're still 3x slower, much better, we can probably live with that. |
FWIW, a more precise, less noisy way to present and compare benchmark results is: go install golang.org/x/perf/benchstat@latest |
I got nerd-sniped by @rsc into looking at this. The behavior of JSON unmarshaling in Swagger is even worse than O(n^2) since it There's an experimental JSON module that permits custom types to unmarshal themselves in a truly streaming manner. With the experimental JSON API, we can unmarshal
EDIT: I realized that I modified the wrong module. Here's the same thing applied to |
@justinsb regarding #315 (comment): The A major challenge with |
Posting with memory allocations since that's also super critical in k8s: > benchstat bench_v1.txt bench_v2.txt
name old time/op new time/op delta
UnmarshalJSON/Swagger-8 538ms ± 8% 38ms ±25% -92.93% (p=0.000 n=10+10)
UnmarshalJSON/Interface-8 33.5ms ±13% 25.3ms ± 9% -24.37% (p=0.000 n=10+8)
name old alloc/op new alloc/op delta
UnmarshalJSON/Swagger-8 88.5MB ± 0% 21.7MB ± 0% -75.53% (p=0.000 n=9+10)
UnmarshalJSON/Interface-8 10.6MB ± 0% 10.0MB ± 0% -5.72% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
UnmarshalJSON/Swagger-8 1.25M ± 0% 0.11M ± 0% -91.49% (p=0.000 n=10+10)
UnmarshalJSON/Interface-8 164k ± 0% 140k ± 0% -14.38% (p=0.000 n=10+10) |
It'd be interesting to figure out how many allocations is needed at minimum to allocate the entire |
I think we can close this now? |
/close |
@apelisse: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Running the following benchmark on my laptop:
Yields the following results:
For readability purposes, that's 529ms and 29ms respectively.
For context, this is about
spec.Swagger
, the OpenAPI v2 definition which is mostly a clone ofgo-openapi
After a short investigation, the problem seems fairly obvious: the arbitrary vendor extensions (as defined by OpenAPI) forces the json to be deserialized multiple times, at many different levels within the object, causing the deserialization into
spec.Swagger
to reachO(n²)
complexity (my maths is probably dubious).Vendor extensions can appear at many different layers in the OpenAPI object, e.g. in:
spec.Swagger
spec.Header
spec.Paths
spec.Operations
The problem, or lack of good solutions, comes from the rigid API (
UnmarshalJSON(data []byte) error
) that forces the custom unmarshaler to receive a byte slice rather than an already decoded, or temporary format. Deserializing methods that do use more flexible APIs, like them YAML v3 parser (UnmarshalYAML(value *yaml.Node) error
), do not suffer of the same problem, as highlighted through #279 from @alexzielenski.This bug, which was improperly understood until now, has had various consequences on the entire Kubernetes ecosystem for the last 5 years:
spec.Swagger
was unacceptably slow for frequently invoked command-line tools, kubectl decided to usegnostic
/protobuf
even though the gnostic type is grossly unusable.gnostic
intospec.Swagger
efficiently, but theSwagger
tognostic
would also be needed, as well as a OpenAPI v3 version.Many of this was noticed by customers, users and Kubernetes providers, as the evidence can show:
For now, the solution discussed with @liggitt is to create a new
UnmarshalUnstructured(interface{}) error
interface that could replace the slowUnmarshalJSON
interface, maybe like the following:And
FromUnstructured
would automatically call theUnmarshalUnstructured
methods when available. One drawback is that it forces it to deserialize into amap[string]interface{}
first and then copy, which is possibly slower than deserializing into the object directly.A remark for the end, the exact same problem also applies to serialization/marshaling, though it is less critical.
The text was updated successfully, but these errors were encountered: