diff --git a/teps/0xxx-tekton-results-db-index.md b/teps/0xxx-tekton-results-db-index.md new file mode 100644 index 000000000..1bd96f1c4 --- /dev/null +++ b/teps/0xxx-tekton-results-db-index.md @@ -0,0 +1,249 @@ +--- +status: proposed +title: Configurable Indexing on Tekton Results DB +creation-date: '2024-09-23' +last-updated: '2024-09-23' +authors: +- '@khrm' +collaborators: [] +--- + +# TEP-0xxx: Tekton Results: Configurable Indexing on Tekton Results DB + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Use Cases](#use-cases) + - [Requirements](#requirements) +- [Proposal](#proposal) + - [Notes and Caveats](#notes-and-caveats) +- [Design Details](#design-details) +- [Design Evaluation](#design-evaluation) + - [Reusability](#reusability) + - [Simplicity](#simplicity) + - [Flexibility](#flexibility) + - [User Experience](#user-experience) + - [Performance](#performance) + - [Risks and Mitigations](#risks-and-mitigations) + - [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Implementation Plan](#implementation-plan) + - [Test Plan](#test-plan) + - [Infrastructure Needed](#infrastructure-needed) + - [Upgrade and Migration Strategy](#upgrade-and-migration-strategy) + - [Implementation Pull Requests](#implementation-pull-requests) +- [References](#references) + + +## Summary +TektonResults stores PipelineRuns, TaskRuns, Events as Records. Moreover, these are linked to the parent object - PipelineRun or TaskRun via the `results` Table row. +At present, there are no indexes created for querying these Records/Results. This proposes a standard way in which Tekton Results can create certain indexes. + + +## Motivation +Different platforms have different annotations/labels which they use to filter out records. +Results can't create predefined Indexes. They should be configurable via certain configs. +As a single row in the `results` Table has a relation to many rows in the `records` Table, we should utilize the `results` Table to generate faster queries. + +The `results` Table row contains annotations and record summary annotations to integrate with different platforms. These platforms can communicate on what labels/annotations to store by these JSON values to annotation `results.tekton.dev/resultAnnotations` or `results.tekton.dev/recordSummaryAnnotations`. We store these JSON values as JSONB in the annotation or recordSummaryAnnotation column. + +Now integrators let's say `workflow` has `components`, `application` + + +### Goals +- We should be able to list or get PipelineRun faster by leveraging the DB Indexing. +- Admins should be able to specify at the start indexes to be created by Results via a configuration. + +### Non-Goals + + + +### Use Cases + +- Platform should be able to query faster or resolve bottlenecks via indexing fields in annotations/summary annotations. + + +### Requirements + +-JSONB values in the `results` table to be indexed based on Tekton Results Admin specified configurations. + +## Proposal + +The events from Pipelineruns and Taskruns should be archived. And end user should be able to access them via API. + +### Notes and Caveats + + +## Design Details +Let's say a platform `workflow` creates the following labels on Runs: +`workflow-foo-service/application` +`workflow.bar-service.io/type` +`workflow.foo-service/component` +`workflow.bar-service/scenario` + +Now these labels and values should also be passed as results annotations so that the platform can communicate what value to store in the annotations/summary annotations row. Ref: https://github.com/tektoncd/results/pull/426/files +``` +apiVersion: tekton.dev/v1 +kind: PipelineRun +metadata: + generateName: hello-run- + annotations: + results.tekton.dev/resultAnnotations: |- + {"workflow-foo-service/application":"scanner", "workflow.bar-service.io/type": "test", "workflow.foo-service/component": "scanner","workflow.bar-service/scenario": "contract" } +``` + +Now, Tekton Results can index all these four values. + +One more advantage of having these fields is we don't need to filter out based on `PipelineRun` if platform generates `PipelineRun` and needs to display values from `PipelineRun`. +We can store some more relevant but limited number of fields from Run Status or Spec in the annotations column of `results` Table. + +Also, even without indexes, making a query on `results` Table outperforms the query on the `records` Table because of one-to-many relations and the `records` table having much more number of rows and data per column. + +We have observed in certain productions environment, records rows reaching more than half million for just fifty thousand record. One PipelineRuns having 9 TaskRuns. +A sample of this: +``` +SELECT count(*) FROM "records" WHERE parent = 'scanner-build' +892455 +SELECT count(*) FROM "results" WHERE parent = 'scanner-build' +59101 +``` +Unless UI want to show TaskRuns, it should only query the `results` Table. + + +## Design Evaluation + + +### Reusability + + + +### Simplicity + + + +### Flexibility + + + +### Conformance + + + +### User Experience + + + +### Performance + + +### Risks and Mitigations + + + +### Drawbacks + + + +## Alternatives + +## Implementation Plan + + + + +### Test Plan + +### Infrastructure Needed + + + +### Upgrade and Migration Strategy + + + +### Implementation Pull Requests + +## References + + +