Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

union: add option to decoding into map with namespace #494

Closed
wants to merge 1 commit into from

Conversation

rockwotj
Copy link
Contributor

This patch adds an option to allow decoding generic data into a map
type so that the native decoded output matches the avro spec for union
JSON encoding. This is useful when handling generic/dynamic avro data and
allows us to drop a dependency on linkedin/goavro.

Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand the desired behaviour. It seems in most cases you are looking for the generic behaviour, but in specific cases you want it to return say a map[string]int which is not generic?

@rockwotj
Copy link
Contributor Author

I basically want it to be in this format: https://avro.apache.org/docs/1.11.1/specification/#json-encoding

Which means a schema like:

[null, "string", "int"]

Is decoded, then marshaled into JSON to be:

null
{"string": "foo"}
{"int": 42}

Where the current main branch only supports decoding this as:

null
"foo"
42

Which means that if you Unmarshal an avro type using any its not possible to get it into the official JSON encoding for avro using json.Marshal, even with post processing the native golang structure (because it's not clear when a map vs record is used for example).

@rockwotj
Copy link
Contributor Author

Let me remove the avro.Register stuff from the tests if that is confusing, we wouldn't do any of that, and it thats there mostly because of copy/paste of the tests

@nrwiersma
Copy link
Member

Which means that if you Unmarshal an avro type using any its not possible to get it into the official JSON encoding for avro using json.Marshal

This part of it I would consider a bug, except in the case that a union is directly decoded into an any. IMO you should not need the additional config. I was looking at a fix for it this morning, that does not require the config, but fixes decoding into an any if it is part of a larger map[string]any or []any.

@rockwotj
Copy link
Contributor Author

Well it should happen for "top level" types too (not part of a larger map[string]any or []any, otherwise it wouldn't match the spec.

FWIW an older version of the library actually did this right (we were on 2.22 and tried updating to 2.27 which broke our tests). It's possible to get the "non map" version of the union via post processing because we have an option to toggle between union encoding mode, our current logic is here:

https://github.com/redpanda-data/connect/blob/8638c319e8b305aade04ab81deb29f5bf09b0205/internal/impl/confluent/serde_hamba_avro.go#L112

This patch adds an option to allow decoding generic data into a `map`
type so that the native decoded output matches the avro spec for union
JSON encoding. This is useful when handling generic/dynamic avro data and
allows us to drop a dependency on linkedin/goavro.
@rockwotj
Copy link
Contributor Author

I updated the tests to drop the avro.Register call. I'm happy to just make this the default behavior too if you prefer that and drop the option (might be a breaking change for some people?).

@rockwotj
Copy link
Contributor Author

Also with respect to this being an option or not, it seems some people don't want the spec JSON encoding: #386

This is why I assumed an option would be desired.

@nrwiersma
Copy link
Member

Honestly, at some point I need to cut a new major to get behaviour to be defined at a point, but have not had the time for such a big push.

The old behaviour is still available in reader.ReadNext, but does not do any kind of schema composition. This could be an option in your case.

As for this PR, you are likely correct with adding the option, I had obviously forgotten the old issues. I think the last remaining issue is that a generic decode can return something like a map[string]int instead of a map[string]any which is a little weird behaviour wise. A way around this is to use newEfaceDecoder instead of the resolved decoder.

@rockwotj
Copy link
Contributor Author

The old behaviour is still available in reader.ReadNext, but does not do any kind of schema composition. This could be an option in your case.

nice! thank you for this, this helps us upgrade and I believe is faster too (less reflection): redpanda-data/connect#3154

As for this PR, you are likely correct with adding the option, I had obviously forgotten the old issues. I think the last remaining issue is that a generic decode can return something like a map[string]int instead of a map[string]any which is a little weird behaviour wise. A way around this is to use newEfaceDecoder instead of the resolved decoder.

I am happy to do this, however I've realized if you pass in a struct type to decode and supply this option, then there will be panics from trying to assign a map to a struct type... How would you feel about just closing this PR until you have time for the new major and getting the behavior here well defined and buttoned up? I'm happy with the reader workaround for now.

@nrwiersma
Copy link
Member

That is fine by me. Appreciate the effort.

@rockwotj rockwotj closed this Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants