-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #228
Comments
Just some observations from benchmarking some code dumping 1000 simple objects resulting in ~3.4 MB of JSON: @dataclass_json
@dataclass
class Test:
id : int
value : str
second : str
testvalue = [Test(i, TESTSTR, TESTSTR[0:200]) for i in range(1000)]
testvalue2 = [dict(id=i, value=TESTSTR, second=TESTSTR[0:200]) for i in range(1000)] I've created a number of methods to dump these lists of objects: def callDCJS():
len(Test.schema().dumps(testvalue, many=True))
import ujson
def callJS():
len(ujson.dumps(testvalue2))
import json
def callJS2():
len(json.dumps(testvalue2))
def callDCJS2(schema=Test.schema()):
len(schema.dumps(testvalue, many=True))
def callDCJS3(schema=Test.schema()):
len(ujson.dumps(schema.dump(testvalue, many=True))) As you can see the callJS functions are the ones the dump the native list of Python dictionaries, while the DCJS ones use dataclasses_json. And the astounding numbers suggest to me that dataclasses_json (marshmallow? Not sure if it uses it under the hood, haven't looked at the code yet) has optimization prospects:
The first time is the minimum. As you can see, the best strategy that you can currently use with dataclasses_json seems to use it to serialize to Python data structures, and then use the fastest JSON python package that you can find for your data. (And ujson seems to be fast, beating out the standard json module by factor 2. And the time differences between DCJS2/DCJS3 suggest that dataclasses_json use the default json module. |
Even when building the dict by accessing each element of the dataclass like Is there any timetable to give it better performance? |
Hi, it's interesting that you mention code generation. edit: there is now a release on pypi, to_dict, and some configurable options |
Hi, In my use case it gives a speedup of ~5x |
@molaxx, have you implemented that performance fix for
Caching |
Performance in general is on my radar as things to tackle next, as this library gains traction, and the top of a 1.0 release checklist.
In general after some thought I don't think caching / memoization is the right way to tackle this. A few reasons why:
Instead, I think an approach involving code generation is the way to go -- similar to how the
dataclasses
core module itself is implemented. When you think about it, a schema is only generated once and known at "module-load time". In other languages we might call this "compile-time". We can see the code-generation approach utilized in codec schema libraries in other languages, be itjson
or even other data-interchange formats likeprotobuf
Going this route, the schema now is loaded as just more code, so to speak, instead of living in memory.
The text was updated successfully, but these errors were encountered: