-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for people parsing email headers #505
Conversation
This is a substantial performance improvement for people who are parsing emails but do not need access to all the headers. In particular I am doing: Mail::Header.new(<str>)['From'].to_s on this example: https://gist.github.com/5901bbd810c08ed3d0b1
This corresponds to about a 10% time-saving when running the specs with bpot/ragel merged, and a bigger performance improvement for parsing workloads.
This has little effect on the specs, but on my header reading example it makes about a 10x performance difference, finally bringing it within one order of magnitude of the "fast hacky solution" at https://gist.github.com/5901bbd810c08ed3d0b1
Ok, I wouldn't normally add a new commit to the same branch, but I got a bit carried away! If we binary search of rspec on ruby-1.9.3:
|
Cool! Mail::FieldList#<< was the next thing I was going to look at. Binary search is probably what we want to do but we need to ensure that it preserves the order of headers that appear multiple times: 4c0962d |
@bpot This happens implicitly as I'm using See http://hg.python.org/cpython/file/2.7/Lib/bisect.py for the original code and http://docs.python.org/2/library/bisect.html for the docs. |
Ah cool! |
Mail::Multibyte is not defined in that case, so we just force everything to be a string.
Sweeet. |
Hey there, thanks for this, looks great :) When I run the specs based against master, I am getting a LOT of warnings:
Can you investigate and fix? If so, I'll get this merged in ASAP. |
@mikel should now be fixed. Thanks :). |
DONE :) Merged into Master |
The main change here is to defer as much header-parsing work as possible. This gives me a 2000x speedup for an (admittedly pathological, but real) example when combined with Pull #490
This email was sent around our company: https://gist.github.com/5901bbd810c08ed3d0b1, master currently takes
6.9s
to parse those headers!Pathological example:
(The "best case" would be as fast as my hacky
zippy_from
function, which is 0.0005s so still 7 times faster than I managed to make this. Most of the remaining time is spent in.to_crlf
and.split
, see profiler output, so I'm pretty content to leave it as-is)rspec on ruby-1.9.3:
(Other rubies had similar shapes, but different actual numbers)