Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing user registration concurrency issues #2157

Closed
wants to merge 3 commits into from

Conversation

leblowl
Copy link

@leblowl leblowl commented Dec 8, 2023

Pull Request Checklist

  • I have linked this PR to related GitHub issue.
  • I have updated the CHANGELOG.md file with relevant changes (the file is located at the root of monorepo).

(Optional) Mobile checklist

Please ensure you completed the following checks if you did any changes to the mobile package:

  • I have run e2e tests for mobile
  • I have updated base screenshots for visual regression tests

Copy link
Contributor

@holmesworcester holmesworcester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two questions...

// false certificates and try to trick other users. To prevent
// that, peers verify that anything that is written to the
// certificate store is signed by the owner.
if (!this._permsData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? Why would we trigger a "finished issuing certificates" event if we are not the owner?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I think I'm mostly reorganizing existing code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is misleading. AFAIK it's only used for resolving promise in csrReplicatedPromiseMap in storage.

@@ -424,6 +438,7 @@ export class ConnectionsManagerService extends EventEmitter implements OnModuleI
this.serverIoProvider.io.emit(SocketActionTypes.PEER_DISCONNECTED, payload)
})
await this.storageService.init(_peerId)
await this.registrationService.init(this.storageService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a recommended way to inject dependencies.

Please add StorageModule in Registration Module:
imports: [StorageModule]
And in RegistartionService add:
private readonly storageService: StorageService

@Kacper-RF
Copy link
Contributor

Have you been able to reproduce this issue on your machine?
If so, do you have exact steps to do this? (For me it's very random)

Copy link
Contributor

@siepra siepra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how exactly those changes solves the issue?

@@ -318,11 +318,25 @@ export class ConnectionsManagerService extends EventEmitter implements OnModuleI
await this.launchCommunity(payload)
this.logger(`Created and launched community ${payload.id}`)
this.serverIoProvider.io.emit(SocketActionTypes.NEW_COMMUNITY, { id: payload.id })

if (!payload.ownerCsr || !payload.permsData) {
throw new Error("Owner CSR and PermsData required to create community!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would just show "abnormal backend termination" js error in application. Maybe it should be a return instead and emitError?

@Kacper-RF
Copy link
Contributor

And one very important thing - e2e tests should pass successfully on at least one system before merge.

@leblowl
Copy link
Author

leblowl commented Dec 11, 2023

Could you explain how exactly those changes solves the issue?

Hi Wiktor, I hope you're doing well my friend. I can. I wrote a little comment in the code, I can quote it here.

// Apparently, JS will run each function to completion. So assuming
// we do not have multiple threads, this function should run to
// completion before it is called again. Because we take the CSRs
// and then get the latest view of certificates, filter CSRs and
// then update certificates in a single function, we should never
// issue a CSR that is contained in the certificates list, at least
// in this function... as long as there is no other code that
// updates the certificates list. Something else to consider is that
// because this function is called asynchronously, there may be
// several invocations with different CSRs and they may run in an
// unexpected order. We could address that if it's an issue, but I
// think that might only affect the order of CSR registration.

I think it might be interested to discuss how race conditions can happen in Javascript. Do you have any thoughts?

@leblowl
Copy link
Author

leblowl commented Dec 11, 2023

I've learned a bit more about how JS works today and I think there is more to it. I don't think it will necessarily run registerUserCertificates all the way through synchronously. But in general, I think we can reason about the code we write and if we absolutely want something to run synchronously in a specific order, we can do that.

@leblowl
Copy link
Author

leblowl commented Dec 11, 2023

With the latest commit, I think it should fix things.

Here is the basic idea: If we want to be sure that we always write code in a specific order, then we can make sure that we do that. For example, if we always want the registration service and storage service to be ready to go before we register the owner certificate so that we are sure that we save it, then we can organize the code in that way. If we want to be sure that we always register one group of certificates at a time, save them to storage, and then register the next group, we can write the code in that way. We don't necessarily need tests to be able to look at the code and say yea there's a race condition. But in order to recognize where potential issue are, we need to understand how JS works at a deep enough level.

So with these changes, in the registerUserCertificates function, we read and write directly to the certificates store so that we are sure we get the most up-to-date view of certificates. And then we make sure that registerUserCertificates is run through to completion (including all await callbacks) before calling it again. There is a test, but it's a little wonky right now.

But in general, if you run registerUserCertificates sequentially (it runs all the way through and then run again), and within registerUserCertificates you are reading to and writing to the certificate store (and it's the only function doing that), then I think you can be pretty confident that it will not register duplicate usernames. Does that make sense?

@siepra
Copy link
Contributor

siepra commented Dec 12, 2023

How does it differ from the csrReplicatedPromiseMap that we already use in certificatesRequestsStore which holds execution of the issueCertificates method of registration.service until its iteration is finished?

@Kacper-RF
Copy link
Contributor

I checked the e2e multiple test locally from your branch and I think something is broken:
image

await this.issueCertificates(payload)
(payload: { csrs: string[]; certificates: string[]; id: string }) => {
this.registrationEvents.push({ csrs: payload.csrs, id: payload.id })
this.tryRegisterNextUserCertificates()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryRegisterNextUserCertificates() is async function so it needs to be awaited. Otherwise it's a fire-and-forget, which is a very bad idea

I think something may be broken again with our eslint. Do you not get an error on this line? It should complain about missing await. If you did want to actually fire-and-forget you should write it as

void this.tryRegisterNextUserCertificates()

but this is asually a very bad idea and rarely the behaviour you want

Copy link
Author

@leblowl leblowl Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

new Promise(something).then(another),

is equivalent to

test = async () => {
  await something
  another
}

and where the Promise body, something is entirely synchronous. I think it's only the promise callback, another, which is added to the event loop and is asynchronous. So if there is no line after an await, I don't think the await does anything. I don't think the async keyword actually makes the function async. It's still synchronous unless there is a promise callback or I/O callback or timer. I've been reading a bit more about how JS works internally and posted a lot of links in the Slack internal channel. I also wrote some interested tests over here: develop...race

I was confused myself and so I started diving into it a bit more. I don't have the whole picture and am likely still confused about many things, but this is my latest understanding

A situation like this I think is a little bit different:

test = async () => {
  await something
}

test2 = async () => {
  await test
  another
}

Where the solo await in test does matter. But I don't think EventEmitter uses await at all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there is no line after an await, I don't think the await does anything. I don't think the async keyword actually makes the function async

It's actually the opposite.

If you don't await this means that that the outer Promise will fire immediately. If you await the inner Promise the outer promise will wait for the inner one.
Even more importantly, if inner Promise throws a failure:
a) with await the outer Promise will be a failuer
b) without await the outer Promise will succeed and the failure from inner Promise will hit the generic error handler of the process

This second point is the main reason for always awaiting stuff. Without it, error handling is a nightmare.

@@ -18,32 +22,15 @@ export class RegistrationService extends EventEmitter implements OnModuleInit {
onModuleInit() {
this.on(
RegistrationEvents.REGISTER_USER_CERTIFICATE,
async (payload: { csrs: string[]; certificates: string[]; id: string }) => {
await this.issueCertificates(payload)
(payload: { csrs: string[]; certificates: string[]; id: string }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should still be async function

// @ts-ignore
registrationService.storageService = { certificatesStore: store }

registrationService.onModuleInit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing await


// Apparently, JS will run each function to completion. So assuming
// we do not have multiple threads, this function should run to
// completion before it is called again. Because we take the CSRs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inaccurate

JS will run the function synchronously until the first await statement.
It's a cool excercise to look up the actual implementation of this in C-node.
What happens underneath is that await is actually almost synonymous to yield. Async functions are really generators that "generate" Promises. Whenever Promise is returned, the handler applies to it's then() callback the logic that effectively calls next(resultOfPreviousPromise) on the generator.
If you think about it in these terms it becomes obvious what will and what will not get interrupted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this comment is outdated

if (!this.storageService) {
throw new Error("Storage Service must be initialized before the Registration Service")
}
console.log("Registering user certificates")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we refactor everthing to use nest logger for logging? If so, please fix this

// that, peers verify that anything that is written to the
// certificate store is signed by the owner.
if (!this._permsData) {
if (payload.id) this.emit(RegistrationEvents.FINISHED_ISSUING_CERTIFICATES_FOR_ID, { id: payload.id })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think eslint is completely off in this project. Is it not requesting that you wrap this in brackets?

@@ -53,15 +40,26 @@ export class RegistrationService extends EventEmitter implements OnModuleInit {
}
}

public async registerOwnerCertificate(payload: RegisterOwnerCertificatePayload): Promise<void> {
public async registerOwnerCertificate(payload: RegisterOwnerCertificatePayload): Promise<string | undefined> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string | undefined is annoying type to work with

For functions that return stuff conditionally it's generally recommended to go with string | null

hiddenService: identity.hiddenService,
certs: {
// Hacking, perhaps make certs.certificate optional
certificate: identity.userCertificate || '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it bikeshedding on my part but || is a logic operator, so there is problem with how it's typed
I highly recommend using ?? instead

@leblowl leblowl marked this pull request as draft December 12, 2023 17:08
@leblowl
Copy link
Author

leblowl commented Dec 12, 2023

@siepra

How does it differ from the csrReplicatedPromiseMap that we already use in certificatesRequestsStore which holds execution of the issueCertificates method of registration.service until its iteration is finished?

I'm not really sure, I'd love to walk through the code with you so we can have a better understanding.

Ah looking at it more, I think I can see:

this.store.events.on('replicated', async () => {
logger('Replicated CSRS')
this.csrReplicatedPromiseId++
const filteredCsrs = await this.getCsrs()
this.createCsrReplicatedPromise(this.csrReplicatedPromiseId)
// Lock replicated event until previous event is processed by registration service
if (this.csrReplicatedPromiseId > 1) {
const csrReplicatedPromiseMapId = this.csrReplicatedPromiseMap.get(this.csrReplicatedPromiseId - 1)
if (csrReplicatedPromiseMapId?.promise) {
await csrReplicatedPromiseMapId.promise
}
}

Unless there is something awaiting the actual 'replicated' event function itself, then I don't think it really matters what awaits you have in there, they only affect the order of operations inside the function. When a new 'replicated' event arrives it will just run the function again.

I don't think EventEmitter uses await or anything async. And here we can see on the OrbitDB side of things where the event is emitted:
https://github.com/orbitdb/orbit-db-store/blob/fbefa468d3cf423bbe6f4e351721bd524af0219d/src/Store.js#L123

I think we could fix this existing code a bit honestly and be done with that aspect. But I also think it might be worth it to refactor a bit like in this PR where we move the synchronization complexity closer to where it matters, which is in the registration service so that's it's self contained and no other service needs to be concerned about it.

Instead of doing the promise thing here, I've used a queue in this PR.

I still would love to walk through this stuff with you. I think it's fascinating! I love understanding how things work at deeper levels.

And again I could be totally wrong about all of this stuff. And I don't really care about being right or wrong, I'm just interested in getting closer to the truth and everytime I can uncover something new, I think that's wonderful.

@siepra
Copy link
Contributor

siepra commented Dec 13, 2023

@leblowl

I don't think it really matters what awaits you have in there, they only affect the order of operations inside the function

With this I disagree*, but:

I also think it might be worth it to refactor a bit like in this PR where we move the synchronization complexity closer to where it matters

Yes, I think I even like your refactor better as it seems more clean and self-explanatory but my point was to raise a question whether it affects the issue we're trying to fix here. I think it doesn't, because it just reinvents something that's been there already.

*you're right about 'replicated' callback event being executed simultaneously, but it awaits certain promise defined on a higher scope, before proceeding with the event emitter. In other words, it starts executing but is being held on a condition managed by the nodejs's event loop so it effectively won't do anything meaningful until the next tick.

@leblowl
Copy link
Author

leblowl commented Dec 13, 2023

@siepra I see, thanks for catching that!

@leblowl leblowl closed this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants