add caching, add tests #175

Germandrummer92 · 2020-02-11T16:37:20Z

Same as Use caching to improve performance #148 but with cachetools and correct caching

id should work for caching as the three functions in questions always will receive a class as an argument which will have same identity always. Tests are passing and I added some too.

Load test with cache_size=1:

It took 39.030911922454834s to convert to dict
It took 30.857404708862305s to convert from dict

After introducing caching:

It took 41.651854038238525s to convert to dict
It took 16.02430510520935s to convert from dict

So about a 2x speedup when converting from dict.

I did write a load_test and profiled it to get to those results. There is great improvement but as you can see with the graph the performance is still quite bad for encoding to the dict. I was trying to somehow improve performance of the as_dict as well but didn't find a good solution. A cache wouldn't really work there as the hashing itself would mean you need to do json.dumps or something to have a consistent hash, which I think (i didn't try) would not improve performance. Find the screenshot of the callgraph attached. If you have some other ideas to fix it that would be great!

from builtins import list
from random import randint
from typing import List, Dict, Any

from dataclasses_json import dataclass_json, DataClassJsonMixin
from time import time
from dataclasses import dataclass, field


@dataclass_json
@dataclass(frozen=True)
class AnotherNestedDataclass(DataClassJsonMixin):
    more_stuff: List[int] = field(default_factory=list)
    x: str = "a"
    y: str = "b"
    z: str = "c"
    dict_key: Dict[str, Any] = field(default_factory=dict)
    key: Any = "some_super_value"


@dataclass_json
@dataclass(frozen=True)
class NestedDataclass(DataClassJsonMixin):
    super_complicated: AnotherNestedDataclass = AnotherNestedDataclass()
    stuff: List[AnotherNestedDataclass] = field(default_factory=list)
    a: float = 0.0
    b: float = 0.0
    c: float = 0.0


@dataclass_json
@dataclass(frozen=True)
class BigDataclass(DataClassJsonMixin):
    nested: List[NestedDataclass] = field(default_factory=list)
    top_level_dict: Dict[str, Any] = field(default_factory=dict)


if __name__ == '__main__':
    to_parse: List[BigDataclass] = list()
    for i in range(1, 500):
        ints = [randint(1, 100)]*50
        nested_dataclass = NestedDataclass(stuff=[AnotherNestedDataclass(more_stuff=ints)]*50)
        to_parse.append(BigDataclass(nested=[nested_dataclass]*50))

    start = time()
    json = list(map(lambda dc: dc.to_dict(), to_parse))
    end = time()
    print(f"It took {end-start}s to convert to dict")

    start = time()

    list = list(map(BigDataclass.from_dict, json))

    end = time()

    print(f"It took {end-start}s to convert from dict")

profile results after caching:

@RunOrVeith

Germandrummer92 · 2020-02-12T09:08:09Z

Edit above, as i messed up the load_test before:

From 30.857404708862305s to convert from dict to 16.02430510520935s to convert from dict

RunOrVeith · 2020-02-12T10:59:07Z

You could try using https://deepdiff.readthedocs.io/en/latest/deephash.html to get object based hashes in to_dict, but I am not sure about introducing the dependency and if it improves the speed

lidatong

Hey @Germandrummer92 thanks for implementing this optimization. 2x speedup is great. Performance is definitely on the radar, as more people rely this library. I'll need to deep dive into it myself, but I have broad guesses at where other bottlenecks might be.

Also just want to say thank you to both you and @RunOrVeith for your contributions.

lidatong · 2020-02-23T03:46:58Z

~~I had to fix merge conflicts so moved off this fork into #182~~
Fixing merge directly here

This reverts commit 2208d4f.

Daniel Draper added 2 commits February 11, 2020 17:28

add caching, add tests

a75aa16

revert cache_size

6c6f917

Germandrummer92 requested a review from lidatong February 11, 2020 16:41

lidatong mentioned this pull request Feb 22, 2020

Use caching to improve performance #148

Closed

lidatong approved these changes Feb 23, 2020

View reviewed changes

lidatong mentioned this pull request Feb 23, 2020

Feat/caching #182

Closed

Merge branch 'master' into feat/caching

dca9a78

lidatong changed the base branch from master to feat/caching February 23, 2020 14:15

lidatong merged commit 2208d4f into lidatong:feat/caching Feb 23, 2020

lidatong added a commit that referenced this pull request Mar 13, 2020

Revert "add caching, add tests (#175)"

da09282

This reverts commit 2208d4f.

lidatong mentioned this pull request Jun 5, 2020

Move FieldOverrides creation outside of _user_overrides_or_exts #225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add caching, add tests #175

add caching, add tests #175

Germandrummer92 commented Feb 11, 2020 •

edited

Loading

Germandrummer92 commented Feb 12, 2020

RunOrVeith commented Feb 12, 2020

lidatong left a comment

lidatong commented Feb 23, 2020 •

edited

Loading

add caching, add tests #175

add caching, add tests #175

Conversation

Germandrummer92 commented Feb 11, 2020 • edited Loading

Germandrummer92 commented Feb 12, 2020

RunOrVeith commented Feb 12, 2020

lidatong left a comment

Choose a reason for hiding this comment

lidatong commented Feb 23, 2020 • edited Loading

Germandrummer92 commented Feb 11, 2020 •

edited

Loading

lidatong commented Feb 23, 2020 •

edited

Loading