Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mediation issues with multiple agents attempting to connect concurrently #899

Closed
niall-shaw opened this issue Jun 22, 2022 · 3 comments · Fixed by #985
Closed

Mediation issues with multiple agents attempting to connect concurrently #899

niall-shaw opened this issue Jun 22, 2022 · 3 comments · Fixed by #985

Comments

@niall-shaw
Copy link
Contributor

We are encountering multiple issues (with not an individual identifiable cause) when attempting to initiate mediation with a single mediator agent from up to 30 client agents simultaneously.
Mediator agent is running mysql, via vdr-tools (a fork of indy-sdk) binary.

Some of the errors that have been occurring include the following:

  • SIGSEV (Segmentation fault)
  • Heap corruption
  • DEBUG: Request was aborted due to timeout. Not throwing error due to return routing on sent message

We believe one of the causes of this issue is that mysql enables queries to be executed considerably more concurrently than sqlite, and therefore when performing checks like below in parallel, it attempts to create the singleton record numerous times - resulting in a failures.

if (!this._mediatorRoutingRecord) {
            this.agentConfig.logger.debug('Mediator routing record not loaded yet, retrieving from storage');
            let routingRecord = await this.mediatorRoutingRepository.findById(this.mediatorRoutingRepository.MEDIATOR_ROUTING_RECORD_ID);
            // If we don't have a routing record yet, create it
            if (!routingRecord) {
                this.agentConfig.logger.debug('Mediator routing record does not exist yet, creating routing keys and record');
                const { verkey } = await this.wallet.createDid();
                routingRecord = new repository_1.MediatorRoutingRecord({
                    id: this.mediatorRoutingRepository.MEDIATOR_ROUTING_RECORD_ID,
                    routingKeys: [verkey],
                });
                await this.mediatorRoutingRepository.save(routingRecord);
            }
            this._mediatorRoutingRecord = routingRecord;
        }

This is one of these instances that I managed to identify, and I have created a temporary fix - by delaying all other calls of the function by 20ms, therefore allowing for the first saving query to finish execution, see below.

const thisQuery = ++this._totalWaitingQueries
        this.agentConfig.logger.debug('Retrieving mediator routing keys');
        // If the routing record is not loaded yet, retrieve it from storage
        if (thisQuery!==1) {
            await new Promise((resolve) => setTimeout(resolve, 20));
        }

However, this temporary fix is not optimal, as we should not add an arbitrary delay to all further operations.

@TimoGlastra
Copy link
Contributor

Thanks for opening this issue @niallshaw-absa! I think there's still lots to improve in running AFJ server side.

I'll think about some things we can do to improve this. Do you have any suggestions on how we can best solve this?

@niall-shaw
Copy link
Contributor Author

niall-shaw commented Jun 22, 2022

Do you have any suggestions on how we can best solve this?

@TimoGlastra - nothing concrete, potentially a queue system for the queries, but that's just me spitballing

@niall-shaw
Copy link
Contributor Author

Fixed in #985

@genaris genaris closed this as completed Aug 19, 2022
@genaris genaris linked a pull request Aug 19, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants