You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ultimately, we want a variety of tests: unit tests for small portions of functionality (e.g. algorithmic operations, network/service interations) as well as larger/integrative tests. This ticket is about integration tests, which will make sure that multiple components work together correctly.
Traditionally, we might have tests that run outside of the application. That is, we might run some code, and ask if it did the right thing. However, in production... we have nothing. We might catch errors on a micro level ("did this file save?"), but we don't know if the process as a whole is progressing.
Inspired by the architecture of nanopass compilers, we will integrate our testing and validation directly into the pipeline of the application itself. We will then run this both locally (when developing) as well as in production. This way, we cannot let our tests fall behind our application, because they are part of the application itself.
In a picture, our application might have looked like this:
sequenceDiagram
participant f as fetch
participant e as extract
participant p as pack
participant s as serve
participant v as validator
f ->> e: fetch content, queue for extraction
f ->> v: validate fetch output
e ->> p: extract, queue for packing
e ->> v: validate extract output
p ->> s: pack, queue for search
p ->> v: validate pack output
s ->> v: confirm search results
Loading
Or:
After fetching a page, we should have a file in S3. That file should have a JSON metadata document as well as a .raw file. We should be able to confirm the existence of the metadata file, its contents (e.g. is it valid JSON? Does it have the fields we expect, like content-length or content-type?) as well as the existence of the .raw file and its size (e.g. at least check, is it non-zero size? Within 1% of the content-length reported in the metadata?)
After extracting content, we should have a JSON document with a content field, and that field should be of non-zero length. Ultimately, we might have more robust checks.
Once we have packed content, there should be a metadata/manifest file about everything packed and an sqlite database in the serve bucket. We can confirm at this point that the DB is valid, and that all of the URLs in the manifest are present.
Finally, we should be able to pick some words from the extract files and run a query against the live search component for the database. We expect at least one result. This confirms the search is working, serving content, and that the particular site was completely indexed/updated.
The validator can become a living component of the system. It should not be an external process that is run only when we are testing, but it should instead run always, as part of the production stack. In this way, every application action is validated constantly. If the validator ever fails/throws an error, we know something very, very bad has happened. (Cats and dogs, living together... mass hysteria!
It is, of course, possible to then run test scenarios against the stack. That is, in CI/CD, we can have a test site that provides "easy path" and "diabolical" test cases. Our system should, then, always pass all our test cases (e.g. sites to crawl). And, in production, we will be able to see when we venture off the map, and into the region where there be 🐉 ...
There are no security concerns; the validator runs within the stack, and communicates only with the queue. Having continuous validation of inputs/outputs is a security-enhancing feature.
Process checklist
Has a clear story statement
Can reasonably be done in a few days (otherwise, split this up!)
Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.
The text was updated successfully, but these errors were encountered:
The first step is complete, which the app is included (locally), and it responds to a validate_fetch. However, rules are not implemented, nor are other services.
At a glance
In order to have confidence in the application
as a developer
I want integration tests
Acceptance Criteria
We use DRY behavior-driven development wherever possible.
then...
Shepherd
Background
Ultimately, we want a variety of tests: unit tests for small portions of functionality (e.g. algorithmic operations, network/service interations) as well as larger/integrative tests. This ticket is about integration tests, which will make sure that multiple components work together correctly.
Traditionally, we might have tests that run outside of the application. That is, we might run some code, and ask if it did the right thing. However, in production... we have nothing. We might catch errors on a micro level ("did this file save?"), but we don't know if the process as a whole is progressing.
Inspired by the architecture of nanopass compilers, we will integrate our testing and validation directly into the pipeline of the application itself. We will then run this both locally (when developing) as well as in production. This way, we cannot let our tests fall behind our application, because they are part of the application itself.
In a picture, our application might have looked like this:
Or:
.raw
file. We should be able to confirm the existence of the metadata file, its contents (e.g. is it valid JSON? Does it have the fields we expect, likecontent-length
orcontent-type
?) as well as the existence of the.raw
file and its size (e.g. at least check, is it non-zero size? Within 1% of thecontent-length
reported in the metadata?)content
field, and that field should be of non-zero length. Ultimately, we might have more robust checks.sqlite
database in theserve
bucket. We can confirm at this point that the DB is valid, and that all of the URLs in the manifest are present.extract
files and run a query against the live search component for the database. We expect at least one result. This confirms the search is working, serving content, and that the particular site was completely indexed/updated.The
validator
can become a living component of the system. It should not be an external process that is run only when we are testing, but it should instead run always, as part of the production stack. In this way, every application action is validated constantly. If thevalidator
ever fails/throws an error, we know something very, very bad has happened. (Cats and dogs, living together... mass hysteria!It is, of course, possible to then run test scenarios against the stack. That is, in CI/CD, we can have a test site that provides "easy path" and "diabolical" test cases. Our system should, then, always pass all our test cases (e.g. sites to crawl). And, in production, we will be able to see when we venture off the map, and into the region where there be 🐉 ...
Security Considerations
Required per CM-4.
There are no security concerns; the
validator
runs within the stack, and communicates only with the queue. Having continuous validation of inputs/outputs is a security-enhancing feature.Process checklist
If there's UI...
The text was updated successfully, but these errors were encountered: