-
Notifications
You must be signed in to change notification settings - Fork 2
Performance and Load Testing #1911
Comments
This seems like a good idea to me. But would be good to get your opinion on this @l0b0. |
Updated the acceptance criteria to remove 'data retrieval' (retrieval is done through standard S3 API, so I don't think we need to test this) and I added in some useful numbers we can use for this. Probably worth discussing as a team though. Also added a link to the NFRs for context. |
Pros:
Cons:
Open questions:
|
My approach would be to set up a simple performance test (e.g. having a preloaded set of data on s3 that we can add to, and measure the time it takes to create and add an additional x number of dataset), and expand from there as usage grow. That way we can justify the costs (whether it is AWS or development time) along the way if we need to expand the scope of our performance test as a later stage. For the initial setup:
Doing this properly and to get the most value out of the exercise is going to costs (AWS and development time). That said, having something (even if it is a simple test to gather basic metrics once in a blue moon) is better than having nothing at all. We don't really want to find out from the end user on the 11th hour complaining that it is too slow for purpose. Without a benchmark, we wouldn't know if the system is able to scale (uploading one or two datasets from a developer's machine may not highlight some of the underlying problems with performance). My vote is to setup a simple test for now, and add additional metrics / measurement as the need arise. |
User Story
So that I know Geostore can scale and handle vast amounts of data, as a user, I want to ensure that I can continue adding datasets to Geostore without hitting any performance bottleneck. When new datasets are added (and existing datasets retrieved), I want to get a response in a timely manner.
Currently we have no visibility on how Geostore performs when put under load. How long does it take for Geostore to update its catalog given that there is already an existing large amounts of datasets in place. Can it scale? Does it suffer from any performance bottleneck (e.g. can PyStac traverse through the entire tree in s3 efficiently, can lambda handle these requests without timing out)? Some metrics would be helpful to identify any potential roadblocks that should be looked at early on.
Acceptance Criteria
Additional context
The non-functional requirements listed here, provide a minimum baseline for testing, but we should probably have a higher threshold that that for our own testing. These are probably the two most relevant NFRs
https://github.com/linz/geostore/blob/master/.github/ISSUE_TEMPLATE/user_story.md#definition-of-done
Tasks
Definition of Ready
Definition of Done
CODING guidelines
increase 10% every year
validated, imported and stored within 24 hours
maintenance windows < 4 hours and does not include operational support
< 12 hours
The text was updated successfully, but these errors were encountered: