Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(robot-server): limit the maximum number of analyses stored per protocol #14885

Merged

Conversation

sanni-t
Copy link
Member

@sanni-t sanni-t commented Apr 12, 2024

Closes AUTH-317

Overview

Adds a max-analyses-per-protocol limit of 5 analyses. Auto-deletes the oldest analysis (es) before adding a new one if the number of analyses stored exceeds the max.

Test Plan

  • Tested on dev server and robot that number of analyses don't exceed 5 and the oldest analysis gets deleted first.
  • Also tested that if a dev server already had more than 5 analyses in the DB, then the first analyses addition since updating to this branch deletes all excess analyses.

Changelog

  • added analyses auto-deletion before new addition in CompletedAnalysisStore
  • also added a way to remove the excess analyses from memory cache

Review requests

Usual code review. I've added in-line comments for things that need extra attention.

Risk assessment

Low. Well tested and pretty isolated feature.

@sanni-t sanni-t requested a review from a team as a code owner April 12, 2024 05:22
@sanni-t sanni-t requested review from a team, jbleon95 and Elyorcv and removed request for a team and Elyorcv April 12, 2024 15:14
Comment on lines +365 to +367
analyses_to_delete = analyses_ids[
: len(analyses_ids) - MAX_ANALYSES_TO_STORE + 1
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely stylistic, no opinion from me either way, but you can also write this as analyses_to_delete = analyses_ids[:-MAX_ANALYSES_TO_STORE] if you think that's clearer.

Comment on lines +369 to +375
for analysis_id in analyses_to_delete:
self._memcache.remove(analysis_id)
delete_statement = sqlalchemy.delete(analysis_table).where(
analysis_table.c.id == analysis_id
)
with self._sql_engine.begin() as transaction:
transaction.execute(delete_statement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do all these deletions inside a single transaction?

And ideally, we'd actually go even further and have one transaction for deleting the old entries and inserting the new entry. That doesn't have to happen in this PR if it's a nontrivial refactor, but it is something that we should do, so can we think through what it would take? Like, does the AnalysisStore/CompletedAnalysisStore split get in the way of that?

Copy link
Member Author

@sanni-t sanni-t Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try exactly this. Maybe I am not yet familiar with the full range of capabilities of sqlalchemy yet but the code was getting a bit too clunky for my liking, esp when trying to delete all old entries in one transaction.

This was what I was going for:

delete_statement = analysis_table.delete()
insert_statement = analysis_table.insert().values(
    await completed_analysis_resource.to_sql_values()
)
for analysis_id in analyses_to_delete:
    self._memcache.remove(analysis_id)
    delete_statement = delete_statement.where(
        analysis_table.c.id == analysis_id
    )
# <add something to make sure we don't end up executing `analysis_table.delete()` >

with self._sql_engine.begin() as transaction:
    transaction.execute(delete_statement)
    transaction.execute(insert_statement)

self._memcache.insert(
    completed_analysis_resource.id, completed_analysis_resource
)

(needed to have a few more checks in there but that was basically the idea)
Do you know of a better way of composing statements with multiple conditions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. it just occurred to me that I could have tried this-

  insert_statement = analysis_table.insert().values(
      await completed_analysis_resource.to_sql_values()
  )
  delete_statements = []
  for analysis_id in analyses_to_delete:
      self._memcache.remove(analysis_id)
      delete_statements.append(analysis_table.delete().where(
          analysis_table.c.id == analysis_id
      ))
  with self._sql_engine.begin() as transaction:
      for delete_statement in delete_statements:
          transaction.execute(delete_statement)
      transaction.execute(insert_statement)
  self._memcache.insert(
      completed_analysis_resource.id, completed_analysis_resource
  )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is this^ any better than what is implemented?
It looks like there is a special engine configuration for executing multiple statements in a single transaction which I don't think we have enabled.

Copy link
Contributor

@SyntaxColoring SyntaxColoring Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  [...]
  for analysis_id in analyses_to_delete:
      self._memcache.remove(analysis_id)
      delete_statement = delete_statement.where(
         analysis_table.c.id == analysis_id
      )
  [...]

Do you know of a better way of composing statements with multiple conditions?

Yeah. There are a few ways to skin this cat, but I think the most direct translation of what you're trying to do is:

delete_statement = analysis_table.delete().where(analysis_table.c.id.in_(analyses_to_delete))

Which will generate a SQL statement like:

DELETE FROM analysis WHERE id IN ("abc", "def", "ghi")

Hmm.. it just occurred to me that I could have tried this-

[...]
with self._sql_engine.begin() as transaction:
   for delete_statement in delete_statements:
        transaction.execute(delete_statement)
[...]

Yep, that also works.

But is this^ any better than what is implemented?

Definitely. It's faster, safer in the face of concurrency that might be added in the future, and more semantically "correct."

It looks like there is a special engine configuration for executing multiple statements in a single transaction which I don't think we have enabled.

Hm, what do you mean? Executing multiple statements in a single transaction is a normal thing to do and we do it routinely. Where are you seeing that that needs special configuration?

@SyntaxColoring
Copy link
Contributor

TY!

Copy link
Contributor

@jbleon95 jbleon95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks good!

@sanni-t sanni-t merged commit f695e74 into edge Apr 12, 2024
7 checks passed
@sanni-t sanni-t deleted the AUTH-317-limit-the-number-of-analyses-to-be-stored-per-protocol branch April 12, 2024 20:21
sanni-t added a commit that referenced this pull request Apr 15, 2024
…es length bug (#14904)

Closes AUTH-347

# Overview

#14885 added the feature to limit number of analyses we store in DB. In
[this](#14885 (comment))
comment, @SyntaxColoring pointed out that we should consolidate the DB
transactions for better performance, so that's what this PR does.

Also fixes a bug where if the existing number of analyses in the DB was
3 and we were to add another analysis, then the formula for getting the
analysis IDs to delete would result in `analysis_ids[:-1]` and it would
delete all analyses except last one.

# Test Plan

- Tested the cases mentioned in #14885 
- Tested the bug case

# Risk assessment

Low. Refactor + small bug fix
Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024
…rotocol (#14885)

Closes AUTH-317

# Overview

Adds a max-analyses-per-protocol limit of 5 analyses. Auto-deletes the
oldest analysis (es) before adding a new one if the number of analyses
stored exceeds the max.

# Risk assessment

Low. Well tested and pretty isolated feature.
Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024
…es length bug (#14904)

Closes AUTH-347

# Overview

#14885 added the feature to limit number of analyses we store in DB. In
[this](#14885 (comment))
comment, @SyntaxColoring pointed out that we should consolidate the DB
transactions for better performance, so that's what this PR does.

Also fixes a bug where if the existing number of analyses in the DB was
3 and we were to add another analysis, then the formula for getting the
analysis IDs to delete would result in `analysis_ids[:-1]` and it would
delete all analyses except last one.

# Test Plan

- Tested the cases mentioned in #14885 
- Tested the bug case

# Risk assessment

Low. Refactor + small bug fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants