-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: memory leak in the broker #10567
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,4 +91,4 @@ | |
"engines": { | ||
"node": ">=18" | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,7 +107,7 @@ export class CachingBrokerFacade implements ServerCircuitProver { | |
// notify broker of cancelled job | ||
const abortFn = async () => { | ||
signal?.removeEventListener('abort', abortFn); | ||
await this.broker.removeAndCancelProvingJob(id); | ||
await this.broker.cancelProvingJob(id); | ||
}; | ||
|
||
signal?.addEventListener('abort', abortFn); | ||
|
@@ -147,6 +147,8 @@ export class CachingBrokerFacade implements ServerCircuitProver { | |
} | ||
} finally { | ||
signal?.removeEventListener('abort', abortFn); | ||
// we've saved the result in our cache. We can tell the broker to clear its state | ||
await this.broker.cleanUpProvingJobState(id); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I read this correctly. Does this mean the orchestrator effectively tells the broker when to clear state. This feels like it's quite coupled. Would a better approach be for the broker to simply e.g. delete all state for epochs < N - 1 when it is asked to prove something for epoch N. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you that's right. I think your suggestion makes a lot of sense! |
||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -159,15 +159,20 @@ export class ProvingBroker implements ProvingJobProducer, ProvingJobConsumer { | |||
return promiseWithResolvers.promise; | ||||
} | ||||
|
||||
public async removeAndCancelProvingJob(id: ProvingJobId): Promise<void> { | ||||
this.logger.info(`Cancelling job id=${id}`); | ||||
await this.database.deleteProvingJobAndResult(id); | ||||
|
||||
public async cancelProvingJob(id: ProvingJobId): Promise<void> { | ||||
// notify listeners of the cancellation | ||||
if (!this.resultsCache.has(id)) { | ||||
this.promises.get(id)?.resolve({ status: 'rejected', reason: 'Aborted' }); | ||||
this.logger.info(`Cancelling job id=${id}`); | ||||
await this.reportProvingJobError(id, 'Aborted', false); | ||||
} | ||||
} | ||||
|
||||
public async cleanUpProvingJobState(id: ProvingJobId): Promise<void> { | ||||
if (!this.resultsCache.has(id)) { | ||||
throw new Error(`Can't cancel busy proving job: id=${id}`); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. WDYT about not rethrowing and instead just logging the error here? Otherwise, if we go into the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question! The cache gets reaped once the prover finishes the current epoch or starts working on the next one
|
||||
} | ||||
|
||||
await this.database.deleteProvingJobAndResult(id); | ||||
this.jobsCache.delete(id); | ||||
this.promises.delete(id); | ||||
this.resultsCache.delete(id); | ||||
|
@@ -254,8 +259,10 @@ export class ProvingBroker implements ProvingJobProducer, ProvingJobConsumer { | |||
return; | ||||
} | ||||
|
||||
this.logger.debug( | ||||
`Marking proving job id=${id} type=${ProvingRequestType[item.type]} totalAttempts=${retries + 1} as failed`, | ||||
this.logger.warn( | ||||
`Marking proving job as failed id=${id} type=${ProvingRequestType[item.type]} totalAttempts=${ | ||||
retries + 1 | ||||
} err=${err}`, | ||||
); | ||||
|
||||
await this.database.setProvingJobError(id, err); | ||||
|
@@ -281,6 +288,11 @@ export class ProvingBroker implements ProvingJobProducer, ProvingJobConsumer { | |||
return filter ? this.getProvingJob(filter) : Promise.resolve(undefined); | ||||
} | ||||
|
||||
if (this.resultsCache.has(id)) { | ||||
this.logger.warn(`Proving job id=${id} has already been completed`); | ||||
return filter ? this.getProvingJob(filter) : Promise.resolve(undefined); | ||||
} | ||||
|
||||
const metadata = this.inProgress.get(id); | ||||
const now = this.timeSource(); | ||||
if (!metadata) { | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me: who's expected to be persisting proving data? I wouldn't want to clean up the broker data if we were counting on it to persist work info across crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 I misplaced my comment sorry about that #10567 (comment)