Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at adding support for blob time-to-live. #1

Merged
merged 7 commits into from
Feb 8, 2022

Conversation

kristofferknudsen
Copy link
Contributor

I've made a pass at supporting blob time-to-live. I've made a number of choices throughout, which would very much benefit from a second opinion.

I am treating ttl as a special tag. If set when a blob is created, the blob will be automatically deleted by the server when the specified duration has passed. Currently, I demand that the duration merely parses using time.ParseDuration (which accepts things like 48h and 2h30m), but this may be too lax. The HackMD documentation specifies an interval in integer seconds. I opted for the ParseDuration because it was at least as easy for me to write, while being nicer for the user.

Regardless, ttl is treated as a special tag, the value is validated, and an expires_at attribute (mirroring created_at) is now part of the metadata in the database. It's currently stored as a nullable int64 - int64 to mirror created_at, and nullable because it is optional. I note that an index covering expiration might be in order at some point.

Garbage collection was already in place for staged metadata, and it handles expired blobs with only a minor change to the query in GetPageOfExpiredBlobMetadata.

When metadata is presented to the client, a metadata object with an expiration will have an expires attribute. I chose expires to mirror the HTTP header of the same name - most of the meta attributes are very header-like.

I've renamed a few things, primarily because they were dealing exclusively with staged metadata before, but now deal with metadata in general. The changes in interface meant I had to regenerate the mocks. The tests are all running for me, but I don't claim deep insight into that particular operation.

api/create.go Outdated
@@ -150,8 +163,8 @@ func CombineTagValidators(validators ...TagValidator) TagValidator {
}
}

func ValidateAndStoreOptionalSystemTag(tagName string, tagValues []string, field **string) error {
if err := systemTagValidator(tagName, tagValues); err != nil {
func ValidateAndStoreOptionalSystemTag(tagName string, tagValues []string, field **string, validator func (string, []string) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use type TagValidator instead instead of the func signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem clearer. I'll sort it out.

core/types.go Outdated
@@ -4,6 +4,7 @@ package core

import (
"context"
"database/sql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unnatural to take a sql dependency here.
I think instead you could move ExpirationToTime to the database package, since that is the only place it is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It went into types to mirror UnixTimeMsToTime, but having it in database makes sense. I'll move it over.

rows, err := r.db.
Model(blobMetadata{}).
Select(`subject, id`).
Where(`staged = ? AND created_at < ?`, true, olderThan.UnixMilli()).
Where(`(staged = ? AND created_at < ?) OR (expires_at < ?)`, true, olderThan.UnixMilli(), olderThan.UnixMilli()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some databases don't optimize some queries with ORs very well, and fall back to table scans. Have you checked the postgres query plan for this? Another approach is to do a UNION ALL with two queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not checked, but I've had a look:

EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE (staged = true AND created_at < 1637748781707) OR expires_at < 1637748781707 ORDER BY created_at ASC LIMIT 200;
QUERY PLAN                                                                                                           
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Limit  (cost=4.29..4.29 rows=1 width=42) (actual time=0.047..0.048 rows=0 loops=1)                                   
  ->  Sort  (cost=4.29..4.29 rows=1 width=42) (actual time=0.047..0.047 rows=0 loops=1)                              
        Sort Key: created_at                                                                                         
        Sort Method: quicksort  Memory: 25kB                                                                         
        ->  Seq Scan on blob_metadata  (cost=0.00..4.28 rows=1 width=42) (actual time=0.028..0.028 rows=0 loops=1)   
              Filter: ((staged AND (created_at < '1637748781707'::bigint)) OR (expires_at < '1637748781707'::bigint))
              Rows Removed by Filter: 152                                                                            
Planning Time: 0.075 ms                                                                                              
Execution Time: 0.060 ms                                                                                             
(9 rows)
EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE staged = true AND created_at < 1637750129345 UNION ALL SELECT subject, id FROM "blob_metadata" WHERE expires_at < 1637750129345
QUERY PLAN                                                                                                                  
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Append  (cost=0.00..7.83 rows=2 width=34) (actual time=0.030..0.030 rows=0 loops=1)                                         
  ->  Seq Scan on blob_metadata  (cost=0.00..3.90 rows=1 width=34) (actual time=0.019..0.019 rows=0 loops=1)                
        Filter: (staged AND (created_at < '1637750129345'::bigint))                                                         
        Rows Removed by Filter: 152                                                                                         
  ->  Seq Scan on blob_metadata blob_metadata_1  (cost=0.00..3.90 rows=1 width=34) (actual time=0.011..0.011 rows=0 loops=1)
        Filter: (expires_at < '1637750129345'::bigint)                                                                      
        Rows Removed by Filter: 152                                                                                         
Planning Time: 0.075 ms                                                                                                     
Execution Time: 0.040 ms                                                                                                    
(9 rows)

It looks like the UNION query is indeed more efficient. I'm not sure it'll matter in any appreciable way - both queries are pretty light as far as I can tell - but I'm happy to go with the UNION ALL approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining problem is that you don't have any kind of index to make the query for expired records efficient. I have ~ 10 million rows in the blob_metadata table.

This query uses a filtered index:

EXPLAIN ANALYZE 
SELECT subject, Id 
FROM blob_metadata
WHERE staged = true and created_at < 1637786288230

QUERY PLAN
Index Scan using staged on blob_metadata  (cost=0.12..8.14 rows=1 width=35) (actual time=0.017..0.017 rows=0 loops=1)
  Index Cond: (created_at < '1637786288230'::bigint)
Planning Time: 0.144 ms
Execution Time: 0.040 ms

This one does not:

EXPLAIN ANALYZE
SELECT subject, Id
FROM blob_metadata
WHERE expires_at < 1637786288230

Gather  (cost=1000.00..194822.96 rows=1 width=35) (actual time=580.996..584.048 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on blob_metadata  (cost=0.00..193822.86 rows=1 width=35) (actual time=542.972..542.973 rows=0 loops=3)
        Filter: (expires_at < '1637786288230'::bigint)
        Rows Removed by Filter: 3489451
Planning Time: 0.098 ms
JIT:
  Functions: 12
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.476 ms, Inlining 0.000 ms, Optimization 2.073 ms, Emission 21.643 ms, Total 26.191 ms
Execution Time: 585.222 ms

But I can fix that with this index:

CREATE INDEX expires on blob_metadata 
(
    expires_at
)
WHERE expires_at IS NOT NULL

Now the same query is handled much more efficiently:

EXPLAIN ANALYZE 
SELECT subject, Id 
FROM blob_metadata
WHERE staged = true and created_at < 1637786288230

QUERY PLAN
Index Scan using expires on blob_metadata  (cost=0.12..8.14 rows=1 width=35) (actual time=0.007..0.028 rows=0 loops=1)
  Index Cond: (expires_at < '1637786288230'::bigint)
Planning Time: 0.125 ms
Execution Time: 0.126 ms

Now we can compare the two original queries (note that you don't need an ORDER BY).

EXPLAIN ANALYZE 
SELECT subject, Id 
FROM blob_metadata
WHERE staged = true AND created_at < 1637786288230 OR expires_at < 1637786288230

QUERY PLAN
Bitmap Heap Scan on blob_metadata  (cost=8.27..12.28 rows=1 width=35) (actual time=0.019..0.021 rows=0 loops=1)
  Recheck Cond: (((created_at < '1637786288230'::bigint) AND staged) OR (expires_at < '1637786288230'::bigint))
  ->  BitmapOr  (cost=8.27..8.27 rows=1 width=0) (actual time=0.016..0.017 rows=0 loops=1)
        ->  Bitmap Index Scan on staged  (cost=0.00..4.13 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1)
              Index Cond: (created_at < '1637786288230'::bigint)
        ->  Bitmap Index Scan on expires  (cost=0.00..4.13 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
              Index Cond: (expires_at < '1637786288230'::bigint)
Planning Time: 0.300 ms
Execution Time: 0.124 ms

And

EXPLAIN ANALYZE 
SELECT subject, Id 
FROM blob_metadata
WHERE staged = true AND created_at < 1637786288230
UNION ALL 
SELECT subject, Id 
FROM blob_metadata
WHERE expires_at < 1637786288230

QUERY PLAN
Append  (cost=0.12..16.31 rows=2 width=35) (actual time=0.689..0.690 rows=0 loops=1)
  ->  Index Scan using staged on blob_metadata  (cost=0.12..8.14 rows=1 width=35) (actual time=0.676..0.677 rows=0 loops=1)
        Index Cond: (created_at < '1637786288230'::bigint)
  ->  Index Scan using expires on blob_metadata blob_metadata_1  (cost=0.12..8.14 rows=1 width=35) (actual time=0.009..0.009 rows=0 loops=1)
        Index Cond: (expires_at < '1637786288230'::bigint)
Planning Time: 0.551 ms
Execution Time: 1.016 ms

Looks like the OR is more efficient in the end.

Copy link
Contributor Author

@kristofferknudsen kristofferknudsen Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually pretty damn interesting!

I've reverted to the OR query in the latest commit, and I've attempted to add an index:

	ExpiresAt   sql.NullInt64  `gorm:"index:expires,where:expires_at is not null"`

It looks fine in the list of indexes (CREATE INDEX expires ON public.blob_metadata USING btree (expires_at) WHERE (expires_at IS NOT NULL) but it doesn't appear to affect the query:

EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE (staged = true AND created_at < 1637922750201) OR expires_at < 1637922750201 LIMIT 200
QUERY PLAN                                                                                                     
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
Limit  (cost=0.00..5.56 rows=1 width=34) (actual time=0.154..0.156 rows=0 loops=1)                             
  ->  Seq Scan on blob_metadata  (cost=0.00..5.56 rows=1 width=34) (actual time=0.152..0.153 rows=0 loops=1)   
        Filter: ((staged AND (created_at < '1637922750201'::bigint)) OR (expires_at < '1637922750201'::bigint))
        Rows Removed by Filter: 190                                                                            
Planning Time: 0.248 ms                                                                                        
Execution Time: 0.189 ms                                                                                       
(6 rows)

I'm still getting a sequence scan, the index doesn't seem to be involved. Do I just need more rows, to make the index the faster option, or did I miss something?

Edit: I added some rows to the table, and the index is now being used.

@johnstairs
Copy link
Contributor

I think that we should not be returning a blob if after its expireation time - regardless of whether the GC has cleaned it up or not.

@kristofferknudsen
Copy link
Contributor Author

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

In this vein, I think we should respond 400 on a create with a negative ttl.

@johnstairs
Copy link
Contributor

You will need to increment the schema version in order to signal that the database schema needs migration. I suggest something like this:

const (
	schemaVersionInitial        = 1
	schemaVersionAddExpiresAt   = 2
	schemaVersionLatest         = schemaVersionAddExpiresAt
	schemaVersionCompleteStatus = "complete"
)

@johnstairs
Copy link
Contributor

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

Expired blobs are still showing up in searches and in get latest and in direct metadata reads.

@kristofferknudsen
Copy link
Contributor Author

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

Expired blobs are still showing up in searches and in get latest and in direct metadata reads.

I've been working on getting them filtered out- let me commit what I have at the moment.

I've gone with filtering the search- and blob queries based on expiration. Get- and SearchBlobMetadata now take an expiresAfter time argument (usually now). This has added a filter to the queries, and some arguments to the database calls. It's not ideal, in the sense that expiration is now something a lot of code has to care about (at least a little).

I considered just deleting the blobs when they expired. It would keep the queries simpler, and generally be fine. There would be some timer-juggling in the background - I don't think it would be less code in the end - but the heart of the code base would not need to worry about expiration. We would, however, lose the ability to serve up data past it's expiration. I don't think this is vital, but I like having a grace period where late clients can get at the results even though they technically expired.

I decided that latest - even though is it a data endpoint - would not observe a grace period. I think it would surprise a lot of clients to find that the recent blob they created with super-short ttl will be returned by the latest endpoint for a long time after it's expiry.

Kristoffer Langeland Knudsen added 2 commits November 26, 2021 10:02
Expired data can still be reached, if you already know where to find it. This gives late searches a grace period where they can still get the data.

Added some tests for expired blob visibility.

Regenerated mocks - the interface changed.
Reverted GC query to use OR in stead of UNION ALL.

Added an index to expires_at.
@kristofferknudsen
Copy link
Contributor Author

I have reverted the default paths for the sqlite database and the blobs to the _data folder in the current directory. I didn't intend to commit it here, but since it's in: I don't think we should be writing anything to the file system root by default. It's fine for the docker container, but it should be configured explicitly in the Dockerfile (which I have not done).

@johnstairs
Copy link
Contributor

I have reverted the default paths for the sqlite database and the blobs to the _data folder in the current directory. I didn't intend to commit it here, but since it's in: I don't think we should be writing anything to the file system root by default. It's fine for the docker container, but it should be configured explicitly in the Dockerfile (which I have not done).

I agree that's better. The reason I did this was to make the default configuration work properly in a container. But I see that was at the expense of the local go run use case. Setting default environment variables in the dockerfile is a better approach.

@johnstairs
Copy link
Contributor

Apologies for the delay, @kristofferknudsen, I will get on this today.

@johnstairs
Copy link
Contributor

If I (mistakenly) add on a ttl parameter to a search:
GET http://localhost:3333/v1/blobs?subject=123&session=mysession&name=NoiseCovariance&ttl=2m
ttl is interpreted as a custom tag that could heve have been set, so I get an empty resultset. So I think this should be an error.
Also, on create, I wonder if the parameter should be _ttl to indicate that it is not a tag, similar to how we have _at and _ct for searches.

@johnstairs
Copy link
Contributor

We should document this new capability on the README.md

Copy link
Contributor

@johnstairs johnstairs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A couple of fit and finish items. Sorry for taking so long to review.

api/create.go Outdated

for _, t := range tagValues {
if _, err := time.ParseDuration(t); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add the parameter name to the error. Right now, we return this message:
"message": "time: invalid duration \"never\""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, good errors are worth their weight in gold.

r.Get("/{combined-id}", handler.MakeBlobEndpoint(handler.BlobMetadataResponse))
r.Get("/{combined-id}/data", handler.MakeBlobEndpoint(handler.BlobDataResponse))
r.Get("/{combined-id}", handler.MakeBlobEndpoint(handler.BlobMetadataResponse, 0 * time.Second))
r.Get("/{combined-id}/data", handler.MakeBlobEndpoint(handler.BlobDataResponse, 30 * time.Minute))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add the expiration datetime to the response headers at the data endpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I don't know why I forgot that.

e2e_test.go Outdated
require.Nil(t, err)

almostEqual := func (a time.Time, b time.Time) bool {
return math.Abs(a.Sub(b).Seconds()) < 5e-2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more tolerant here, since tests that are based on timing tend to fail randomly, and the precision here is much higher than is actually required for a 10m ttl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to half a second, but even that seems a bit arbitrary. The fuzz is there to account for the time it takes the server to make the expiration record in the database, and get back to the client. The test keeps and eye on the times, to make sure that the actual expiration is as the client expects, but we can have several seconds of fuzz without any issues.

e2e_test.go Outdated
@@ -96,6 +97,8 @@ func TestInvalidTags(t *testing.T) {
{"tag name that is too long", fmt.Sprintf("subject=sub&%s=abc", strings.Repeat("a", 65))},
{"Location", "subject=s&location=l"},
{"Last-Modified", "subject=s&lastModified=2021-10-18T16:56:15.693Z"},
{"ttl", "not-an-interval"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ttl" is the (sub) test case name, "not-an-interval" is the query string. Which means that this test is failing because the entire query string is invalid, not because the ttl is not recognized.

api/create.go Outdated
func ValidateTimeToLive(tagName string, tagValues []string) error {

for _, t := range tagValues {
if _, err := time.ParseDuration(t); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should validate that the the ttl is not negative

body := "An expired blob should not be displayed in search results."
subject := fmt.Sprint(time.Now().UnixNano())

for _, ttl := range []string{"0s", "10m", "10m"} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it delibrate to have 10m twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual interval doesn't matter, but I use the same value twice to not involve the ordering of the search result in the test.

@kristofferknudsen
Copy link
Contributor Author

kristofferknudsen commented Feb 8, 2022

If I (mistakenly) add on a ttl parameter to a search: GET http://localhost:3333/v1/blobs?subject=123&session=mysession&name=NoiseCovariance&ttl=2m ttl is interpreted as a custom tag that could heve have been set, so I get an empty resultset. So I think this should be an error. Also, on create, I wonder if the parameter should be _ttl to indicate that it is not a tag, similar to how we have _at and _ct for searches.

Absolutely sensible, I've swapped time-to-live to use _ttl. I'll have to update Python, Matlab and Gadgetron at one point, but it should be a one-character-change.

I've made a pass on your comments; all very sensible. I've not updated the README yet, I hope to get around to that soon.

@johnstairs
Copy link
Contributor

Ok let's get this in. Follow-up items to consider:

  • Maybe disallowing thettl tag to avoid confusion
  • Error on searches specifying _ttl (and maybe ttl)
  • Updating the README.

@johnstairs johnstairs merged commit e0c6a63 into ismrmrd:main Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants