MessageToJson outputs the wrong type for uint64 and int64 in Python #2954

mortada · 2017-04-07T21:52:29Z

Here's a very simple proto file:

$ cat test.proto
syntax = "proto3";

message Message {
    uint64 foo1 = 1;
    uint32 foo2 = 2;
    float  foo3 = 3;
    double foo4 = 4;
    bool   foo5 = 5;
    int64  foo6 = 6;
    int32  foo7 = 7;
}

and I created a protobuf message in Python as follows:

In [1]: import test_pb2
In [2]: message = test_pb2.Message()

In [3]: message.foo1 = 11

In [4]: message.foo2 = 22

In [5]: message.foo3 = 33

In [6]: message.foo4 = 44

In [7]: message.foo5 = True

In [8]: message.foo6 = 66

In [9]: message.foo7 = 77

In [11]: message
Out[11]:
foo1: 11
foo2: 22
foo3: 33
foo4: 44
foo5: true
foo6: 66
foo7: 77

In [12]: from google.protobuf.json_format import MessageToJson

In [13]: print(MessageToJson(message))
{
  "foo6": "66",
  "foo4": 44,
  "foo7": 77,
  "foo5": true,
  "foo2": 22,
  "foo1": "11",
  "foo3": 33
}

Note that foo6 and foo1 should not have quotes as they are integers and not strings. Using the protobuf_to_dict module does not have this problem

In [15]: from protobuf_to_dict import protobuf_to_dict

In [16]: protobuf_to_dict(message)
Out[16]:
{'foo1': 11,
 'foo2': 22,
 'foo3': 33.0,
 'foo4': 44.0,
 'foo5': True,
 'foo6': 66,
 'foo7': 77}

In [17]: import json

In [18]: json.dumps(protobuf_to_dict(message))
Out[18]: '{"foo6": 66, "foo4": 44.0, "foo7": 77, "foo5": true, "foo2": 22, "foo1": 11, "foo3": 33.0}'

My protoc version is 3.2.0 and python version is 3.5.2

$ protoc --version
libprotoc 3.2.0

$ python --version
Python 3.5.2 :: Anaconda custom (x86_64)

The text was updated successfully, but these errors were encountered:

xfxyjwf · 2017-04-07T23:01:43Z

As per proto3 JSON spec, uint64/int64 fields should be printed as decimal strings. See:
https://developers.google.com/protocol-buffers/docs/proto3#json

The reason is that uint64/int64 is not part of JSON spec and many JSON libraries only support double precision. To prevent precision loss our proto3 JSON spec requires serializers to put int64/uint64 values in strings.

kirpit · 2017-11-19T23:56:03Z

I wanted ask the same question for MessageToDict here before creating another issue.

I respect MessageToJson converting int64 and uint64 into string for some reason even though it is hard to understand, why a long / integer value (without precision?) should be translated into string when there is no limit on a JSON integer by the specs.

Regardless to this unexpected behavior, I am assuming only by the name MessageToDict has nothing to do with JSON specs or whatnot, and all it is expected to do is to take a message object and give back a Python dictionary without touching its data types. Unfortunately, it's behavior is the same with MessageToJson, which makes us impossible to use these utility functions who deal with int64 data types.

The question is, should MessageToDict return these values in Python dictionary without touching their data types?

xfxyjwf · 2017-11-20T08:47:41Z

Isn't MessageToDict part of the json_format package? It should honor the same proto3 JSON spec as with MessageToJson.

WloHu · 2019-03-14T11:16:50Z

It's all nice that json_format package follows the spec but it's laughable that I can't do this:

m = test_pb2.Message.FromString(serialized_proto_bytes)
d = MessageToDict(m)
copy = test_pb2.Message(**d)

because of type inconsistency.
I know I'm using wrong package for this but if you went that far to produce Python dict which is JSON spec compatible and you allow to create message instances out of unpacked nested dict structures then why we have to use another Python package to add the missing link for proper dict serialization?

proximous · 2019-10-14T21:06:33Z

I'll add my vote to @WloHu that the code fragment above should work (ie the process should be reversible), even if we have to specify some optional argument like:
d = MessageToDict(m, preserve_int64_as_int=True)
Is something like that open for consideration or is there an existing recommended solution to this?

…SON (#5010) * Fix an issue with the timestamps in the endpoint response Signed-off-by: Chenran Li <[email protected]> According to issue #4037, the returned JSON of the endpoints has `creation_timestamp` and `last_updated_timestamp` as strings, not numbers. It's different from what was documented in the [official doc](https://www.mlflow.org/docs/latest/rest-api.html#mlflowregisteredmodel). The reason is we are calling Google's `MessageToJson` API to convert protobuf to json, which implicitly converts int64/fixed64/unit64 fields to strings. And they claimed it's a feature not a bug (see the [discussion](protocolbuffers/protobuf#2954 (comment))). According to the bug reporter, this bug doesn't exist in Azure ML mlflow server (which is essentially our Databricks mlflow server). That's because we are using ScalaPB's `ToJson()` API for all the Databricks endpoints, and it doesn't convert int64 to string. There is no way to let `MessageToJson` API not convert int64 to strings. Nor are there any other good Python proto-to-json libraries. So to fix this bug, we have to choose from: * (what I'm doing in this PR) manually converting the int64/uint64/fixed64 fields back to numbers after calling `MessageToJson` * (too risky so I chose not to do) writing our customized `MessageToJson` API

sany2k8 · 2022-06-02T12:32:40Z

For my case MessageToDict makes units to string because of int64 type (google.type.Money) I did process it manually, is there any further good soln regarding this issue? Actually, I don't want to use this solution. I was looking something like d = MessageToDict(m, preserve_int64_as_int=True).

data = {"roomNights":[{"night":{"year":2022,"month":6,"day":2},"price":{"currencyCode":"USD","units":"100","nanos":10},"tax":{"currencyCode":"USD","units":35,"nanos":10}}],"fees":[{"title":"State Tax","price":{"currencyCode":"USD","units":"8","nanos":10}},{"title":"Occupancy Tax","price":{"currencyCode":"USD","units":"2","nanos":10}},{"title":"Resort Fees","price":{"currencyCode":"USD","units":"25","nanos":10}}],"totalCost":{"currencyCode":"USD","units":"135","nanos":10},"totalFees":{"currencyCode":"USD","units":"35","nanos":10},"totalBeforeFees":{"currencyCode":"USD","units":"100","nanos":10}}

def cast_units_string_to_int(cost_data: dict, key: str):
    for k, v in cost_data.items():
        if isinstance(v, dict):
            for kk, vv in v.items():
                if isinstance(vv, dict):
                    cast_units_string_to_int(vv, key)
                else:
                    if kk == key:
                        cost_data[k][kk] = int(vv)
                    else:
                        cost_data[k][kk] = vv 
        elif isinstance(v, list):
            for _, vvv in enumerate(v):
                cast_units_string_to_int(vvv, key)
                        
    return cost_data
                
print(data)
updated_units = cast_units_string_to_int(data, "units")
print(updated_units)

Sschumac · 2023-05-01T16:14:48Z

Even if it is part of the json_format package, it is not formatting to json when calling MessageToDict. So why on earth keep this JSON convention as the default behavior?

xfxyjwf closed this as completed Apr 7, 2017

HughKu mentioned this issue Nov 30, 2019

python json_format.MessageToDict() instead of converting the uint64 field to int, it is converted to str #6933

Closed

This was referenced Sep 8, 2020

GRPC python client get_model_config triton-inference-server/server#1971

Closed

Document int64->string for as_json=true results in grpc client triton-inference-server/server#1997

Merged

bwanglzu mentioned this issue Jun 15, 2021

feat: unify the fields for sparse and dense array jina-ai/serve#2578

Merged

lichenran1234 mentioned this issue Nov 4, 2021

Fix a bug that converts int64 to string when converting Protobuf to JSON mlflow/mlflow#5010

Merged

27 tasks

Tingaev mentioned this issue Mar 17, 2023

MessageToDict BUG #12252

Closed

Sauci pushed a commit to Sauci/a2l-grpc that referenced this issue Jun 13, 2023

fix dump issue protocolbuffers/protobuf#2954

12dc699

Sauci pushed a commit to Sauci/a2l-grpc that referenced this issue Jun 13, 2023

fix dump issue protocolbuffers/protobuf#2954

6250c3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MessageToJson outputs the wrong type for uint64 and int64 in Python #2954

MessageToJson outputs the wrong type for uint64 and int64 in Python #2954

mortada commented Apr 7, 2017 •

edited

Loading

xfxyjwf commented Apr 7, 2017

kirpit commented Nov 19, 2017

xfxyjwf commented Nov 20, 2017

WloHu commented Mar 14, 2019 •

edited

Loading

proximous commented Oct 14, 2019

sany2k8 commented Jun 2, 2022

Sschumac commented May 1, 2023

MessageToJson outputs the wrong type for uint64 and int64 in Python #2954

MessageToJson outputs the wrong type for uint64 and int64 in Python #2954

Comments

mortada commented Apr 7, 2017 • edited Loading

xfxyjwf commented Apr 7, 2017

kirpit commented Nov 19, 2017

xfxyjwf commented Nov 20, 2017

WloHu commented Mar 14, 2019 • edited Loading

proximous commented Oct 14, 2019

sany2k8 commented Jun 2, 2022

Sschumac commented May 1, 2023

mortada commented Apr 7, 2017 •

edited

Loading

WloHu commented Mar 14, 2019 •

edited

Loading