-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue: InternetAddressList.TryParse Slow with Large Recipient Lists #1106
Comments
I took a quick look at this last night and nothing stood out as obviously That said, I have not yet stepped through this in the debugger to see what is going on. Maybe it's the performance issue that I mentioned, or it could be something else that I just didn't spot in my quick glance through the code. That said, when I was looking through the code, it looked like the parser would fail to parse That backtracking could easily cause an I'll try to get to actually debugging this as soon as I can. Hopefully I'll be able to carve out some time during the Thanksgiving holiday. BTW, kudos to creating the graph. I love it :-) |
First thing - my test results I presented in the issue was ran on MimeKit 4.0.0. Apologies for that; it was my mistake. However, to investigate case with email and with empty email (<>), I think we should compare performance between these two situations:
Test code [TestCase(1000, "")]
[TestCase(1000, "[email protected]")]
[TestCase(20000, "")]
[TestCase(20000, "[email protected]")]
[TestCase(40000, "")]
[TestCase(40000, "[email protected]")]
[TestCase(60000, "")]
[TestCase(60000, "[email protected]")]
public void ManyRecipientsTest(int participantsCount, string email)
{
// Arrange
StringBuilder recipientsStringBuilder = new();
for (int i = 0; i < participantsCount; i++)
{
recipientsStringBuilder.Append($"\"Fake User{i + 1} <{email}>\"");
if (i < participantsCount - 1)
{
recipientsStringBuilder.Append(", ");
}
}
string recipients = recipientsStringBuilder.ToString();
// Act
var stopwatch = Stopwatch.StartNew();
bool result = InternetAddressList.TryParse(new ParserOptions { AllowAddressesWithoutDomain = true }, recipients, out InternetAddressList addressList);
// Assert
Assert.IsTrue(result);
Assert.AreEqual(addressList.Count, participantsCount);
Console.WriteLine($"{participantsCount};{stopwatch.Elapsed}");
} Results: So it doesn't look like empty email (<>) causes a performance problem. |
Out of curiosity, how do the results change if you convert the recipients string into a byte[] using Encoding.UTF8.GetBytes() before starting the stopwatch? The first thing that InternetAddressList.TryParse() does is to convert the input string into byte[] and then calls the TryParse() overload that takes a byte[]. |
Okay, well, I'm stepping through this in the debugger and I see what is going on now... or at least part of what is going on. Each address is fully quoted: This means that See the Once that completes,
(which is technically inaccurate in this particular case, but it has the right idea). Then the parser rewinds back to the beginning of the buffer before proceeding to try again, this time parsing the first token as the This explains the O(N^2) behavior. |
Let's try: [TestCase(1000, true)]
[TestCase(1000, false)]
[TestCase(20000, true)]
[TestCase(20000, false)]
[TestCase(40000, true)]
[TestCase(40000, false)]
[TestCase(60000, true)]
[TestCase(60000, false)]
public void ManyRecipientsAndEncodingTest(int recipientsCount, bool encodeRecipientsBeforeStopwatch)
{
// Arrange
StringBuilder recipientsStringBuilder = new();
for (int i = 0; i < recipientsCount; i++)
{
recipientsStringBuilder.Append($"\"Fake User{i + 1} <[email protected]>\"");
if (i < recipientsCount - 1)
{
recipientsStringBuilder.Append(", ");
}
}
string recipients = recipientsStringBuilder.ToString();
var encodedRecipients = Encoding.UTF8.GetBytes(recipients);
// Act
var stopwatch = Stopwatch.StartNew();
bool result = InternetAddressList.TryParse(new ParserOptions { AllowAddressesWithoutDomain = true }, encodedRecipients, out InternetAddressList addressList);
// Assert
Assert.IsTrue(result);
Assert.AreEqual(addressList.Count, recipientsCount);
Console.WriteLine($"{recipientsCount};{stopwatch.ElapsedMilliseconds / 1000f / 60f} (encode={encodeRecipientsBeforeStopwatch})"); // minutes
} So there is almost no difference. To confirm, I ran this test [TestCase(20000)]
[TestCase(40000)]
[TestCase(60000)]
[TestCase(80000)]
[TestCase(100000)]
[TestCase(1000000)]
public void EncodingPerformanceTest(int recipientsCount)
{
// Arrange
StringBuilder recipientsStringBuilder = new();
for (int i = 0; i < recipientsCount; i++)
{
recipientsStringBuilder.Append($"\"Fake User{i + 1} <[email protected]>\"");
if (i < recipientsCount - 1)
{
recipientsStringBuilder.Append(", ");
}
}
string recipients = recipientsStringBuilder.ToString();
// Act
var stopwatch = Stopwatch.StartNew();
var encodedRecipients = Encoding.UTF8.GetBytes(recipients);
// Assert
Assert.AreEqual(encodedRecipients.Length, recipients.Length);
Console.WriteLine($"{recipientsCount};{stopwatch.ElapsedMilliseconds}");
} And it takes only 37ms for 1M recipients. So Encoding is insignificant. |
Yea, once I spotted the real issue, I knew the encoding wasn't going to make a difference. I should have told you to ignore my previous comment. |
In the case described in issue #1106, there are 1000's of "addresses" which are really only quoted-strings separated by commas. What this patch does is to prevent the parser from consuming the entire string as 1 display-name, then deciding that there's no address (reached end of string), so falling back to parsing the first qstring as a local-part instead. This was causing the parser to be O(N^2), so the larger that recipient list got, the worse the performance got. Fixes issue #1106
Describe the problem
The
InternetAddressList.TryParse
method hangs for an extended time when processing a large number of input data. For example, when parsing a list of 100,000 participants, the method takes more than 10 minutes to complete.Here's how the method scales with the size of the input data:
Platform:
To Reproduce
Run this unit test:
Expected behavior
The method should handle large input data efficiently without significant delays.
Additional context
I am attaching the original test file that led me to this problem:
100k-participants.zip
The text was updated successfully, but these errors were encountered: