Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast PRNG to generate _id for documents created outside methods #5161

Merged
merged 2 commits into from
Sep 25, 2015

Conversation

avital
Copy link
Contributor

@avital avital commented Sep 15, 2015

This change harmonizes server document ID generation regardless of whether
it happens inside of a method or not, by using Alea in both cases.

This cuts time of inserting small documents outside of methods
on the server by over 30%, and more importantly makes it easier to be
confident in benchmarking numbers.


BACKGROUND

When calling coll.insert() on the server within methods, we use the Alea PRNG
(which is fast, can be seeded, and not cryptographically secure) to generate
the _id field for the newly created document (unless an _id field was
explicitly passed).

The reason we use Alea is so that we can seed the PRNG from the client, as to
ensure consistently chosen IDs for methods that create multiple documents and
run on both client and server.

Prior to this change, when calling coll.insert() on the server not inside
methods, we'd use Node's cryptographically secure crypto.getRandomBytes()
which is slower (due to allocating buffers that need to cross from JS
into native code).

With this change, we always use Alea when generating a document ID.


CRYPTOGRAPHICALLY SECURE IDS STILL AVAILABLE

If an app wants to guarantee using a cryptographically secure PRNG
when generating IDs, just generate IDs yourself:
coll.insert({_id: Random.id(), ...}).

Random.id() still uses a CSPRNG (unless you're on IE, or
on the server and not enough entropy has been collected, which is
basically never the case).

f you want the faster Alea algorithm, use Random.fast.id()
(The Random.fast object has all the same methods as on Random)


BENCHMARK RESULTS

Here are the measured times for inserting 5000 documents, before
and after this change (on my machine):

Benchmark Before After
direct insert from meteor shell 2179ms 1520ms
a method called from a browser 1617ms 1570ms
a method called from the server 1491ms 1487ms
direct insert from the server 2272ms 1445ms

(The benchmark can be found here:
https://github.com/avital/mongo-timing/blob/f32ea073b7bd242ef9294bd916327238c5f03ba2/benchmark2.sh)

@avital
Copy link
Contributor Author

avital commented Sep 15, 2015

@glasser, would you have time to review this change?

@glasser
Copy link
Contributor

glasser commented Sep 15, 2015

Sure.

Initial impression: call it Random.insecure instead of Random.fast. There's a reason it's not the default, so we should give it a name that emphasizes why you should think carefully about using it instead of why you should be excited about using it.

// There was no scope passed in;
// the sequence won't actually be reproducible.
return Random;
// There was no scope passed in; the sequence won't actually be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above this method is now wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@glasser
Copy link
Contributor

glasser commented Sep 15, 2015

This requires a doc or history.md note to make clear that anyone relying on the behavior for security needs the explicit Random.id() now.

@avital
Copy link
Contributor Author

avital commented Sep 15, 2015

Thanks @glasser, I'll make those changes. Did you go through the entire change or should I expect potentially more comments?

@glasser
Copy link
Contributor

glasser commented Sep 15, 2015

I did a full first pass

This change harmonizes server document ID generation regardless of whether
it happens inside of a method or not, by using Alea in both cases.

This cuts time of inserting small documents outside of methods
on the server by over 30%, and more importantly makes it easier to be
confident in benchmarking numbers.

---

BACKGROUND

When calling `coll.insert()` on the server within methods, we use the Alea PRNG
(which is fast, can be seeded, and not cryptographically secure) to generate
the `_id` field for the newly created document (unless an `_id` field was
explicitly passed).

The reason we use Alea is so that we can seed the PRNG from the client, as to
ensure consistently chosen IDs for methods that create multiple documents and
run on both client and server.

Prior to this change, when calling `coll.insert()` on the server *not* inside
methods, we'd use Node's cryptographically secure `crypto.getRandomBytes()`
which is slower (due to allocating buffers that need to cross from JS
into native code).

With this change, we always use Alea when generating a document ID.

---

CRYPTOGRAPHICALLY SECURE IDS STILL AVAILABLE

If an app wants to guarantee using a cryptographically secure PRNG
when generating IDs, just generate IDs yourself:
`coll.insert({_id: Random.id(), ...})`.

`Random.id()` still uses a CSPRNG (unless you're on IE, or
on the server and not enough entropy has been collected, which is
basically never the case).

If you *want* the faster Alea algorithm, use `Random.fast.id()`
(The `Random.fast` object has all the same methods as on `Random`)

---

BENCHMARK RESULTS

Here are the measured times for inserting 5000 documents, before
and after this change (on my machine):

Benchmark                         | Before | After
--------------------------------- | ------ | ------
direct insert from `meteor shell` | 2179ms | 1520ms
a method called from a browser    | 1617ms | 1570ms
a method called from the server   | 1491ms | 1487ms
direct insert from the server     | 2272ms | 1445ms

(The benchmark can be found here:
https://github.com/avital/mongo-timing/blob/f32ea073b7bd242ef9294bd916327238c5f03ba2/benchmark2.sh)
@avital avital force-pushed the reliable-id-gen-performance branch from 87c5e4b to f21ba1d Compare September 20, 2015 20:32
@avital
Copy link
Contributor Author

avital commented Sep 20, 2015

and added History.md note. doesn't look like the docs need changing.

@glasser i think this is ready to be merged

@glasser
Copy link
Contributor

glasser commented Sep 21, 2015

:shipit:

@avital avital merged commit f21ba1d into devel Sep 25, 2015
@avital
Copy link
Contributor Author

avital commented Sep 25, 2015

Merged in d1ae8f2

@mitar
Copy link
Contributor

mitar commented Oct 9, 2015

How does one make so that secure IDs are generated also in the simulation mode?

Can secure IDs be enabled for the whole collection with some options to the collection constructor? (Like when we can choose between string and ObjectID IDs? Maybe there could also be "secure string" ID options there.)

@avital
Copy link
Contributor Author

avital commented Oct 9, 2015

The problem is that Meteor doesn't ship with an algorithm that's both
cryptographically secure and can be seeded (which is necessary for latency
compensated methods that generate multiple documents)

On Fri, Oct 9, 2015 at 3:48 AM, Mitar [email protected] wrote:

How does one make so that secure IDs are generated also in the simulation
mode?

Can secure IDs be enabled for the whole collection with some options to
the collection constructor? (Like when we can choose between string and
ObjectID IDs? Maybe there could also be "secure string" ID options there.)


Reply to this email directly or view it on GitHub
#5161 (comment).

@mitar
Copy link
Contributor

mitar commented Oct 9, 2015

Hm, I thought that this was how it was before?

Is there an extension point where community could then develop packages for this?

@glasser
Copy link
Contributor

glasser commented Oct 9, 2015

(it's less "Meteor doesn't ship with an algorithm that's both cryptographically secure and can be seeded" and more "being cryptographically secure literally means it can't be seeded (predictable)")

@mitar
Copy link
Contributor

mitar commented Oct 9, 2015

Oh, I get it. The text in history doc was a bit hard to understand. :-) OK. So this deals only with IDs outside of a method call, which are now made in the same way as inside a method call. So it does not change the behavior for method calls and situation when you want the same ID between client and server.

@avital
Copy link
Contributor Author

avital commented Oct 9, 2015

Yes Mitar that's right.

On Fri, Oct 9, 2015 at 2:25 PM, Mitar [email protected] wrote:

Oh, I get it. The text in history doc was a bit hard to understand. :-)
OK. So this deals only with IDs outside of a method call, which are now
made in the same way as inside a method call. So it does not change the
behavior for method calls and situation when you want the same ID between
client and server.


Reply to this email directly or view it on GitHub
#5161 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants