Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ADR for a query option to filter get_urls #88

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions api/TimeAddressableMediaStore.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1290,6 +1290,18 @@ paths:
schema:
default: false
type: boolean
- name: accept_get_urls
in: query
description: |
A comma separated list of labels of flow segment `get_urls` to include in the response.
Omitting `accept_get_urls` will result in all `get_urls` returned.
A 'default' label selects URLs with a 'default' or no label.
An empty `accept_get_urls` results in an empty or no `get_urls` in the response.
Without `get_urls`, the response from the service could be substantially faster if it is not required to
generate a large number of pre-signed URLs for example.
schema:
type: string
pattern: ^[^,]*(,[^,]*)*$
- $ref: '#/components/parameters/trait_resource_paged_key'
- $ref: '#/components/parameters/trait_paged_limit'
responses:
Expand Down Expand Up @@ -1380,6 +1392,18 @@ paths:
schema:
default: false
type: boolean
- name: accept_get_urls
in: query
description: |
A comma separated list of labels of flow segment `get_urls` to include in the response.
Omitting `accept_get_urls` will result in all `get_urls` returned.
A 'default' label selects URLs with a 'default' or no label.
An empty `accept_get_urls` results in an empty or no `get_urls` in the response.
Without `get_urls`, the response from the service could be substantially faster if it is not required to
generate a large number of pre-signed URLs for example.
schema:
type: string
pattern: ^[^,]*(,[^,]*)*$
- $ref: '#/components/parameters/trait_resource_paged_key'
- $ref: '#/components/parameters/trait_paged_limit'
responses:
Expand Down
2 changes: 2 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ For more information on how we use ADRs, see [here](./adr/README.md).
| [0018](./adr/0018-restrict-direct-source-modification.md) | Restrict direct Source modification |
| [0019](./adr/0019-consolidate-modified-updated-terms.md) | Rename `modified_by` properties in Source and Flow schemas to `updated_by` |
| [0020](./adr/0020-version-signalling.md) | Improving the signalling of the supported API version in implementations |
| [0021](./adr/0021-storage-label-format.md) | Guidance for use of `get_urls` labels on Flow Segments |
| [0023](./adr/0023-filter-segment-get-urls.md) | Add query option to filter Flow Segment `get_urls` by `label` |

\* Note: ADR 0004a was the unintended result of a number clash in the early development of TAMS which wasn't caught before publication

Expand Down
78 changes: 78 additions & 0 deletions docs/adr/0023-filter-segment-get-urls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
status: "proposed"
---
# Query Option to Filter get_urls

## Context and Problem Statement

Each Flow Segment returned by the `/flows/<flow id>/segment` endpoint includes a `get_urls` property that may include (labelled) URLs for fetching media objects.
The calculation of pre-signed URLs for example can result in an not insignificant increased request time.
Clients do not need the `get_urls` if they don't intend to fetch the media objects.
This ADR proposes adding a query option to filter the URLs that need to be included by a TAMS.

It was found that calculating a pre-signed URL for fetching a media object takes around 0.35 milliseconds when using the Python [boto3](https://github.com/boto/boto3) library.
That means a request to get 5000 Flow Segments will include around 1.75 seconds overhead to calculate the pre-signed URLs.
This was observed in a request to a TAMS which took around 4 seconds to return Flow Segments with pre-signed URLs and only around 2 seconds to return the same Flow Segments without pre-signed URLs.

The [boto3](https://github.com/boto/boto3) method to generate pre-signed URLs could also be optimised to reduce the time taken.
It was found that the older deprecated [boto](https://github.com/boto/boto) library took around 0.14 milliseconds to calculate a pre-signed URL, a 60% speed reduction when compared to [boto3](https://github.com/boto/boto3).
There are likely even more optimisations that could be made.

The pre-signed URL calculation may use a highly optimised implementation that avoids the large time overhead.
This would still result in larger response sizes because the pre-signed URLs tend to be quite large as they include signatures, keys and tokens.

It is difficult to predict what the expected number of flow segments will be in practice.
If it is low in all cases then including or not including pre-signed URLs doesn't make much difference.
If it is unknown then having an option to avoid the pre-signed calculation overhead may be useful in some cases.

## Considered Options

* Option 1: Always return all `get_urls`
* Option 2: Add a query option to return none or all `get_urls`
* Option 3: Add a simple query option to filter `get_urls` based on the `label`
* Option 4: Add more complex query option(s) to filter `get_urls`

## Decision Outcome

Chosen option: "Option 3: Add a simple query option to filter `get_urls` based on the `label`", because this covers the requirement to disable calculation of pre-signed URLs and at the same time allows other URLs to be retained.

### Implementation

Implemented in [PR #88](https://github.com/bbc/tams/pull/88).

## Pros and Cons of the Options

### Option 1: Always return all `get_urls`

This is the current state.

* Bad, because TAMS is doing work to calculate pre-signed URLs even though clients may not need them

### Option 2: Add a query option to return none or all `get_urls`

This adds a boolean option named `include_get_urls` that if set to `false` results in no `get_urls` in the response and therefore no pre-signed URLs are calculated by TAMS.

* Good, because it allows clients to indicate that `get_urls` are not required
* Bad, because some TAMS implementations may provide different types of URLs and a client does not have the option to filter out the ones that cause increased request time whilst retaining ones that don't

### Option 3: Add a simple query option to filter `get_urls` based on the `label`

This extends [Option 2](#option-2-add-a-query-option-to-return-none-or-all-get_urls) to add a (comma-separated) list option named `accept_get_urls` that specifies the `labels` associated with URLS to include in the response.
Omitting `accept_get_urls` will result in all URLs in the response.
Setting `accept_get_urls` to an empty string will result in no URLs in the response.
If `accept_get_urls` includes a `default` label then URLs with label `default` or no label are included in the response.

* Good, because it allows clients to indicate that `get_urls` are not required
* Good, because clients can retain URLs that don't cause increased request times
* Neutral, because clients may want more complex filters based on what is available

### Option 4: Add more complex query option(s) to filter `get_urls`

This changes [Option 3](#option-3-add-a-simple-query-option-to-filter-get_urls-based-on-the-label) to allow more complex filtering based on what URLs are available.
E.g. include a URL for direct access to the object if available and otherwise a pre-signed URL.

* Good, because it allows clients to indicate that `get_urls` are not required
* Good, because clients can retain URLs that don't cause increased request times
* Neutral, because clients can use more complex filters based on what is available
* Neutral, because it isn't clear whether there is a requirement for more complex filters
* Neutral, because [0021](./0021-storage-label-format.md) thus far only proposes an App note for guidance on the use of `get_urls` labels on flow segments
Loading