Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing Avro from Google Big Query ( and Logical Type: Datetime) #69

Open
jpsgamboa opened this issue Dec 17, 2021 · 3 comments
Open
Labels
learning How to use the applicaiton?

Comments

@jpsgamboa
Copy link

jpsgamboa commented Dec 17, 2021

I'm having trouble deserializing data stored in BigQuery.

For context, BigQuery returns two elements:

  • AvroSchema: The schema in JSON
  • AvroRows: ByteString with multiple rows

I'm trying the following method:

var result = AvroConvert.DeserializeHeadless<MyEvent>(item.AvroRows.SerializedBinaryRows.ToByteArray(), readSession.AvroSchema.Schema);

But this only returns one instance of MyEvent. How should I parse a payload that should contain many roads?

Additionally, BigQuery returns datetime fields as Logical Type: datetime that AvroConvert states:
System.Runtime.Serialization.SerializationException: 'Unknown LogicalType schema :'datetime'.'

Is there a way to extend AvroConvert to accept this logical type?
I'm not too familiar with Avro and unsure if this is too specific to Google's specification, or something that could be included in AvroConvert.

I'm not sure if there is an issue here at all, or just my misunderstanding of Avro and the ecosystem, and I opened an issue on Google's repo as well, but I thought this could be relevant here as well!

@AdrianStrugala AdrianStrugala added the issue This doesn't seem right label Dec 17, 2021
@AdrianStrugala
Copy link
Owner

Hello,

  1. Collection of MyEvents can be deserialized by using:
var result = AvroConvert.DeserializeHeadless<List<MyEvent>>(item.AvroRows.SerializedBinaryRows.ToByteArray(), readSession.AvroSchema.Schema)
  1. I will take a look at the datetime logical type. Could you attach a sample file and schema? It will help me a lot with debugging.

Thanks,
Adrian

@jpsgamboa
Copy link
Author

Hey Adrian,

Please find below a zip with two files:

  • AvroSchema.json: As returned by readSession.AvroSchema.Schema
  • AvroRows: Bytes returned by item.AvroRows.SerializedBinaryRows.ToByteArray()

AvroSample.zip

I also exported an Avro file directly from the Google Cloud console, which may also help:
bq-sample.zip

Thank you!

@AdrianStrugala
Copy link
Owner

Hello,

I've spotted several issues with your files. First of all, there is no "datetime" logical type according to Avro documentation; Datetime type is often represented as timestamp-millis logical type or simply string. But I wasn't able to deserialize your AvroRows anyway - there are more issues in the AvroModel.

But don't worry - bq-sample work perfectly fine. Short tutorial for you, how to deal with Avro data:

  1. Get schema of the data
var avroBytes = File.ReadAllBytes("bq-sample");
var schema = AvroConvert.GetSchema(avroBytes);

That gives you schema in Json format

  1. Generate C# model:
    You can use https://avroconvertonline.azurewebsites.net/ for convenience.
    image

  2. Deserialize the data

var avroBytes = File.ReadAllBytes("bq-sample");
var result = AvroConvert.Deserialize<List<Root>>(avroBytes );

Produces a list of 1000 Root items.

Hope it helps,
Adrian

@AdrianStrugala AdrianStrugala added learning How to use the applicaiton? and removed issue This doesn't seem right labels Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
learning How to use the applicaiton?
Projects
None yet
Development

No branches or pull requests

2 participants