Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage after upgrade #445

Closed
sahu-sunil opened this issue Nov 16, 2023 · 9 comments · Fixed by #464
Closed

High CPU usage after upgrade #445

sahu-sunil opened this issue Nov 16, 2023 · 9 comments · Fixed by #464
Labels
more-info-needed More information required.

Comments

@sahu-sunil
Copy link
Contributor

  • cattrs version: 23.1.2
  • Python version: 3.9
  • Operating System: Linux

We are using cattrs for config validation purpose of our tests in Asynchronous framework. Thousands of tests run every minute. We upgraded from 1.5.0 to 23.1.2 which has major changes on how cattrs back-end works. The cpu touches 100% in couple of mins. After profiling we found that cache (linecache) was the reason for high cpu usage. We disabled the caching (including detailed validations) with _cattrs_use_linecache=False which helped a bit, now it takes couple of hours (3-4h) to reach cpu to 100.
We see generate_unique_filename is still taking lot of time.
image

Is there any other way to handle this in cattrs latest version?

@Tinche
Copy link
Member

Tinche commented Nov 16, 2023

Interesting. That's quite a large upgrade though, a ton of things have changed. Are you using cattrs.Converter, cattrs.BaseConverter or functions imported directly from cattrs (like cattrs.structure)?

You can likely resolve your issue by switching to cattrs.BaseConverter, that converter class does minimal code generation. It's somewhat slower to structure and unstructure though. Would be curious for more information!

@Tinche Tinche added the more-info-needed More information required. label Nov 16, 2023
@sahu-sunil
Copy link
Contributor Author

I am using cattrs.Converter and then registering hooks like -

converter = Converter(detailed_validation=False)
converter.register_structure_hook(
    ClassName,
    make_dict_structure_fn(
    ClassName,
    converter,
    _cattrs_use_linecache=False,
    _cattrs_detailed_validation=False,
    ),
)
converter.structure(....)

Will try BaseConverter for less initialisation but AFAIK it will still go for functions mentioned in above profiler snapshot.

@Tinche
Copy link
Member

Tinche commented Nov 19, 2023

BaseConverter won't help if you're using make_dict_structure_fndirectly, since that's the source of the issue.

Well, the root issue is that you're probably recreating the same class a lot in your tests.

Let me give you some context. make_dict_structure_fn will compile a specialized function for structuring. It'll also store the source code of that function using Python's linecache module; this is necessary for good stack traces and for stepping through the function in a debugger.

But if you, for example, have a test that recreates the same class (or a converter) over and over again, you'll effectively create a leak there since all this source code will keep getting stored there, and generate_unique_filename is inefficient when dealing with a large number of classes with the same name (which is what you're seeing in the profile dump).

To me this indicates an inefficiency in your tests, any chance you could not generate so many converters or classes and hooks? Even if I worked around this in cattrs, it's still kind of bad practice.

That said, use_linecache=False should fix the issue so maybe it's a different class than what you think (or a nested class). You can debug by maybe setting up a breakpoint and inspecting linecache.cache manually; it'll probably have the same class name a ton.

import linecache

print(linecache.cache)

@sahu-sunil
Copy link
Contributor Author

Yes, I tried to avoid recreation of classes/converters at some places and it helped a bit but still there are places where I can't get rid of them.

I was going through make_dict_structure_fn function and wanted to clarify -
Should this be inside if condition ?

#fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
script = "\n".join(total_lines)
eval(compile(script, "", "exec"), globs)
if _cattrs_use_linecache:
     fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
     linecache.cache[fname] = len(script), None, total_lines, fname

Do we need this in case of linecache disabled?

@Tinche
Copy link
Member

Tinche commented Nov 22, 2023

Hm, you might be right. We can put that into the if block. If you want to put together a quick PR and the tests pass I'll merge that in. (Make the PR against the v23.2 branch so we can release it quickly.)

It might not help though. If you have classes that you can't control they'll enter that if block regardless.

You're just recompiling the exact same handlers over and over again, right? So the source code will be identical every time. I'm thinking there could be an optimization opportunity here where we could look for the script matching and just reuse the same linecache entry for all handlers. Then its size would be constant instead of O(n).

@sahu-sunil
Copy link
Contributor Author

Okay. In middle of something, won't be able to raise it today. Please see if you can else I will take a look later. Thanks

@Tinche
Copy link
Member

Tinche commented Nov 27, 2023

Try this branch: https://github.com/python-attrs/cattrs/tree/tin/linecache-opt

@Tinche Tinche linked a pull request Nov 27, 2023 that will close this issue
@sahu-sunil
Copy link
Contributor Author

sahu-sunil commented Nov 30, 2023

Yes It is working now. Thank you for quick responses & optimisations.

@Tinche
Copy link
Member

Tinche commented Nov 30, 2023

Cool, will release this soon.

@Tinche Tinche closed this as completed Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-info-needed More information required.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants