Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash based lookup implementation #143

Merged
merged 3 commits into from
Aug 23, 2018
Merged

Hash based lookup implementation #143

merged 3 commits into from
Aug 23, 2018

Conversation

guliyevemil1
Copy link
Contributor

No description provided.

@guliyevemil1 guliyevemil1 force-pushed the master branch 2 times, most recently from 22d5185 to 53ad4a6 Compare August 14, 2018 09:49
@guliyevemil1
Copy link
Contributor Author

If you want me to add comments to explain the logic a bit, let me know. I basically took what you were doing with regexes and just did it via string manipulation and this made it much faster.

@guliyevemil1
Copy link
Contributor Author

Btw, there was a small bug that's kind of a gotcha in Go if you haven't seen it before. The logic for figuring out bestRule was actually buggy because the temporary variable rule is reused throughout the for loop and overwritten on each iteration. You can see the buggy behavior demonstrated here:

https://play.golang.org/p/72PdGvRy9Dh

@guliyevemil1
Copy link
Contributor Author

Hey, can I get a timeline of when you think you can merge this? I'd like to use your library but this is a blocker. Thanks!

@weppos
Copy link
Owner

weppos commented Aug 15, 2018

I need to allocate some time to look into it. I will try to find some time within this week.

@guliyevemil1
Copy link
Contributor Author

Hey man, I understand you're busy. I found your library from the official publicsuffix.org list and I liked your API better than x/net package. The performance was unacceptable for my use case so I decided to help you and improve it. It passes all your tests and the benchmarks are comparable or better than x/net (which I added in the first commit). Can you please spare some time soon to merge this? I would like to continue using your library instead of being forced to use my forked off version that I don't think I can maintain.

@weppos weppos self-assigned this Aug 23, 2018
@weppos
Copy link
Owner

weppos commented Aug 23, 2018

There is a subtle but important implication in this change, and I want to add a note here mostly as a reminder.

This change will assume that *.foo and foo won't exist in the same definition. This assumption may not leave forever, and may actually not even be correct.
https://groups.google.com/d/topic/publicsuffix-discuss/rY6yRK802Kg/discussion

For better or for worse, this is the same current limitation of
https://github.com/weppos/publicsuffix-ruby

Depending on the outcome of the discussion, the index key may have to be reshaped.
I actually also considered the idea of storing rules in a tree rather than a hash weppos/publicsuffix-ruby#134, that could be a potential solution as finding the proper rule would be walking a tree, and com/foo/* could be different than com/foo.

Copy link
Owner

@weppos weppos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guliyevemil1. I approve these changes, and I'll merge them.

Before going ahead, can you clarify the intent of the change in the DefaultList.rules assignment?

@@ -39,7 +39,9 @@ func init() {
{ {{$r.Type}}, "{{$r.Value}}", {{$r.Length}}, {{$r.Private}} },
{{end}}
}
DefaultList.rules = r[:]
for i := range r {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of this change? Is it because of #141?

if !rule.Match(name) {
continue
for {
rule, ok := l.rules[name]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle but important implication in this change, and I want to add a note here mostly as a reminder.
This change will assume that *.foo and foo won't exist in the same definition.

Please see the main ticket for further information.

@@ -8626,5 +8626,7 @@ func init() {
{1, "now.sh", 2, true},
{1, "zone.id", 2, true},
}
DefaultList.rules = r[:]
for i := range r {
DefaultList.AddRule(&r[i])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of this change? Is it because of #141?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, nevermind. I actually figured it out, we need to pass via AddRule because the internal rule is no longer a list.

My bad, sorry for the stupid question.

I need to determine the cost of this operation. It could be possible that it's more efficient to compile directly the r in the appropriate form and then assign.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak with absolute certainty, but my gut feeling is that this implementation eases the load on the garbage collector because nothing in this list should ever be garbage collected which means it's better to have all of them in a single giant array rather than allocating memory for each individually on the heap. I might be wrong and it might be worth benchmarking.

weppos added a commit that referenced this pull request Aug 23, 2018
Backported from GH-143 to see the impact before/after the merge.
See also GH-133.
@weppos weppos changed the title Added benchmarks, moved to a hash based implementation and improved performance to match that of golang.org/x/net library Hash based lookup implementation Aug 23, 2018
@weppos weppos merged commit b6e96f6 into weppos:master Aug 23, 2018
weppos added a commit that referenced this pull request Aug 23, 2018
This is an edge case resulting from #143. Once again, the match of wildcard rules is tricky as I also mentioned in #143 (comment)
@guliyevemil1
Copy link
Contributor Author

Thank you so much for taking the time to review the change and merging it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants