Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert url perf changes #1602

Merged
merged 3 commits into from
May 4, 2015
Merged

Revert url perf changes #1602

merged 3 commits into from
May 4, 2015

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 3, 2015

For discussion purposes at the moment, continuing from #1591, but this is the easiest path that I see to getting a less controversial 2.0.0 out.

@mikeal
Copy link
Contributor

mikeal commented May 3, 2015

+1

BTW, if someone wants to put this behind a flag they can do that in a minor release since it would only be considered a feature addition.

@jcrugzz
Copy link

jcrugzz commented May 3, 2015

I'm +1 here as well. I know there is more than a handful of possible places in my own modules or modules I maintain where this may result in a subtle breakage (including node-http-proxy possibly).

@bnoordhuis
Copy link
Member

LGTM. Let's not hold up the release.

@mscdex mscdex added the url Issues and PRs related to the legacy built-in url module. label May 3, 2015
@othiym23
Copy link
Contributor

othiym23 commented May 3, 2015

Wearing my npm maintainer's tiara, I'm +1 on this if the goal is to get [email protected] out any sooner than 10 days from now, which is the earliest date at which an npm build I'm going to feel is safe to use with this code will be available.

@chrisdickinson
Copy link
Contributor

+1. Let's get this in behind a flag (or as @mikeal suggests, published on npm as a -r preload module) in a minor after v2.0.0 so we can start paving the way for this to land.

@isaacs
Copy link
Contributor

isaacs commented May 3, 2015

+1. I apologize for not weighing in on #1561 initially.

I'd like to propose a rule that any commit message that mentions performance MUST include either specific data on the improvement of a pre-existing benchmark, or a new benchmark that shows improvement with the patch. If others here agree this is an idea worth pursuing, I'll send a PR for the contribution guide.

Without specific and reproducible benchmark results, it's impossible to even have a discussion about performance. Even if there are improved benchmarks, the benefit has to be weighed against the costs, including the costs of a lack of stability, and must be put in perspective of how big a performance difference it'll make to a real application. What's hurting me much more these days is child_process performance, not url parsing. Even before this change, we could parse 1,000,000 urls in just a few seconds.

If we do re-visit #1561, it should also be split into separate bits for the benefit of good atomic commit hygiene. Switching from object properties to accessors is a significant change. There are other changes to the parsing approach, renamed variables, and so on.

Just to try to put some level of data behind what the performance impact is here, I came up with this very simple benchmark:

# master today, with the "performance enhancements"
$ for i in 1 2 3 4 5; do node -p 'var url=require("url");var s=process.hrtime();for(var i=0;i<100000;i++)url.parse("http://a:[email protected]/asdf?q=1#hash");s=process.hrtime(s);100000/(s[0]+s[1]/1e9)'; done
156803.7345029713
152023.75643620887
150275.67862858463
151732.90939077525
150361.49225823936
# average = 152239.51424335586 url parses per second

# this patch, performance degraded
$ for i in 1 2 3 4 5; do node -p 'var url=require("url");var s=process.hrtime();for(var i=0;i<100000;i++)url.parse("http://a:[email protected]/asdf?q=1#hash");s=process.hrtime(s);100000/(s[0]+s[1]/1e9)'; done
159153.7304684274
156878.3701837095
156537.3031852965
145164.64711456324
148571.17474394385
# average = 153261.04513918812 url parses per second

There is no performance difference to be had. Even if there was, it's an operation that can already be performed in excess of 150k times per second, and no one has that many urls to parse.

Therefor, the only consideration should be whether or not the semantic changes on their own are worth the disruption. I think it's clear that they're not. It would be better if this module were taken to userland, and leave url.js as it was.

@chrisdickinson
Copy link
Contributor

The bench results were attached to the original PR, and didn't get ported over to the final PR (which was cut because github doesn't allow PRs to be retargeted). Sorry for the information loss (which is due to me switching the dev branch from v1.x to master)!

@Fishrock123
Copy link
Contributor

+1

I wouldn't mine it flagged or as a preload module, so long as that takes us basically no time to do.

This reverts commit 3fd7fc4.

It was agreed that this change contained too much potential ecosystem
breakage, particularly around the inability to `delete` properties off a
`Url` object. It may be re-introduced for a later release, along with
better work on ecosystem compatibility.

PR-URL: nodejs#1602
Reviewed-By: Mikeal Rogers <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Forrest L Norvell <[email protected]>
Reviewed-By: Chris Dickinson <[email protected]>
Reviewed-By: Isaac Z. Schlueter <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
@rvagg rvagg force-pushed the revert-url-changes branch from 20a2810 to f34b105 Compare May 4, 2015 03:04
@rvagg
Copy link
Member Author

rvagg commented May 4, 2015

https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/641/

will merge if we get a 👍 from CI

@rvagg rvagg merged commit f34b105 into nodejs:master May 4, 2015
@rvagg
Copy link
Member Author

rvagg commented May 4, 2015

merged f34b105

@rvagg rvagg deleted the revert-url-changes branch May 4, 2015 03:17
@rvagg
Copy link
Member Author

rvagg commented May 4, 2015

Really sorry about @petkaantonov, I was looking forward to getting this out there but I'm also glad that we're taking a more safe approach to compatibility with the ecosystem; hopefully we can find a path forward that gets this back in sooner than later. Thanks for the excellent work on this so far!

rvagg added a commit that referenced this pull request May 4, 2015
This reverts commit 6687721.

It was agreed that this change contained too much potential ecosystem
breakage, particularly around the inability to `delete` properties off a
`Url` object. It may be re-introduced for a later release, along with
better work on ecosystem compatibility.

PR-URL: #1602
Reviewed-By: Mikeal Rogers <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Forrest L Norvell <[email protected]>
Reviewed-By: Chris Dickinson <[email protected]>
Reviewed-By: Isaac Z. Schlueter <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
rvagg added a commit that referenced this pull request May 4, 2015
This reverts commit dbdd81a.

It was agreed that this change contained too much potential ecosystem
breakage, particularly around the inability to `delete` properties off a
`Url` object. It may be re-introduced for a later release, along with
better work on ecosystem compatibility.

PR-URL: #1602
Reviewed-By: Mikeal Rogers <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Forrest L Norvell <[email protected]>
Reviewed-By: Chris Dickinson <[email protected]>
Reviewed-By: Isaac Z. Schlueter <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
rvagg added a commit that referenced this pull request May 4, 2015
This reverts commit 3fd7fc4.

It was agreed that this change contained too much potential ecosystem
breakage, particularly around the inability to `delete` properties off a
`Url` object. It may be re-introduced for a later release, along with
better work on ecosystem compatibility.

PR-URL: #1602
Reviewed-By: Mikeal Rogers <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Forrest L Norvell <[email protected]>
Reviewed-By: Chris Dickinson <[email protected]>
Reviewed-By: Isaac Z. Schlueter <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
@rvagg
Copy link
Member Author

rvagg commented May 4, 2015

forgot to put metadata in all 3 commits, done and force pushed as 0f39ef4, 0daed24 and 702997c

@rvagg rvagg mentioned this pull request May 4, 2015
@petkaantonov
Copy link
Contributor

@isaacs The performance improvement is up to 25x, but at least 10x as shown in many issue/pr threads linked in the real PR. The absolute speed is only 10k parses per second, so it is a major bottleneck for web servers since express parses urls per each request. It improves TechEmpower single query benchmark score by 30% (yes, that benchmark queries a database as well as renders templates, that's how slow url parsing is).

The reason for making everything an accessor is so that 'url' can be made work just like it works in browsers, not for performance. Currently it works nothing like in browsers, although it does do some horrible things just because browsers do it, appending ':' to protocols for instance. For performance you only need to make .href an accessor (and even this can be debated, please see the real PR).

This has been in userland for a long time (without everything being an accessor though) now but as far as I can tell from anecdotal evidence, many people aren't using io.js simply because there is nothing really substantial over node. 30% faster web servers would have surely been such a thing.

@YurySolovyov
Copy link

Maybe current parser could be re-written in several passes, eliminating bottlenecks one-by-one?
BTW, where current "slowness" comes form?

@spion
Copy link

spion commented May 4, 2015

@isaacs I don't know what you've been testing, but here is what I get:

% for i in 1 2 3 4 5; do node -p 'var url=require("url");var s=process.hrtime();for(var i=0;i<100000;i++)url.parse("http://a:[email protected]/asdf?q=1#hash");s=process.hrtime(s);100000/(s[0]+s[1]/1e9)'; done
144431.19737550025
156379.18875731213
158876.53319329955
158296.33656418743
153327.86955603285

# petkaanotnov's url extracted to a separate file called url2.js
5838 % for i in 1 2 3 4 5; do node -p 'var url=require("./url2");var s=process.hrtime();for(var i=0;i<100000;i++)url.parse("http://a:[email protected]/asdf?q=1#hash");s=process.hrtime(s);100000/(s[0]+s[1]/1e9)'; done
1874383.4332845176
1943702.0690358658
1811787.4930533804
1794612.6159974784
1942542.0913905445

That is a 12x performance gain. Can you double-check if you actually tested the real thing?

edit: thats not even enough iterations for the JIT warmup to be taken into account, the actual number for the faster url parser is about 2.3 million, making it a 15x improvement for that particular test case

@Fishrock123
Copy link
Contributor

To be clear here:

I'd like to see it land in the future, but we need to do some prep on the ecosystem so that important things like npm actually still work. :)

@silverwind
Copy link
Contributor

Please move the discussion to #643.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.