-
-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly speed up parsing netloc in URL objects #1112
Conversation
This reverts commit 855f57b.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1112 +/- ##
==========================================
+ Coverage 95.06% 95.08% +0.02%
==========================================
Files 30 30
Lines 4497 4600 +103
Branches 396 410 +14
==========================================
+ Hits 4275 4374 +99
- Misses 196 200 +4
Partials 26 26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Need to look at the aiohttp server case a bit more. In some cases sending an unencoded url beats the pre encoded case depending on what we access because of cache misses and reparse of the same data |
Home Assistant test case: 3 min up time (expect startup to resync more data) |
|
Will sleep on this one to see if there is any more coverage that would be good to add |
3 hours for comparison 2024-09-06 23:01:09.032 CRITICAL (SyncWorker_30) [homeassistant.components.profiler] Cache stats for lru_cache <function urlsplit at 0x7f23becb7ce0> at /usr/local/lib/python3.12/urllib/parse.py: CacheInfo(hits=57950, misses=2839, maxsize=128, currsize=128) |
13 hours 2024-09-07 07:01:46.909 CRITICAL (SyncWorker_32) [homeassistant.components.profiler] Cache stats for lru_cache <function URL._split_netloc at 0x7f4f1c6e6700> at /usr/local/lib/python3.12/site-packages/yarl/_url.py: CacheInfo(hits=157229, misses=129, maxsize=128, currsize=128) |
17 hours on another HA install So its turning out to be a really great use of an LRU |
What do these changes do?
Currently to make a URL object that is not already encoded we would call
urlsplit
to generate aSplitResult
, and than we would always call.hostname
,.port
,.username
, and.password
, which all have to reparse thenetloc
each time.To improve the performance
hostname
,port
,username
, andpassword
one time in a new function_split_netloc
which is a slimmed down version of whatSplitResult
does that only does what we need._split_netloc
inlru_cache
the same size as the oneurlsplit
uses since we tend to see the samenetloc
over and over again.URL
is created withencoded=False
, save the parsed values from thenetloc
andSplitResult
into the cache so they do not have to be calculated again the first time the properties are accessed as it avoids a second call intoSplitResult
's parsers of the same data.URL
is created withencoded=True
, parse thenetloc
as soon asraw_password
,raw_user
,raw_host
orexplict_port
are accessed and cache the result of the other values to avoid having to parse again when they are accessed since is almost guaranteed that we will access raw_password if we access raw_user and vise-versa. Same withexplict_port
andraw_host
.Even on the cache miss case this is much faster because we avoid so many reparses of the same data. For an
aiohttp
request with url passed as string we would previously have to parse thenetloc
8x (4x for the properties to build the netloc, and 4x for the reading the properties back fromyarl
)Are there changes in behavior for the user?
The additional LRU will use a tiny bit more memory, but is limited to 128 keys which will make it negligible
before
after
Related issue
#196 (only partially solves the issues mentioned there)