First pass at adding support for blob time-to-live. #1

kristofferknudsen · 2021-11-18T15:15:18Z

I've made a pass at supporting blob time-to-live. I've made a number of choices throughout, which would very much benefit from a second opinion.

I am treating ttl as a special tag. If set when a blob is created, the blob will be automatically deleted by the server when the specified duration has passed. Currently, I demand that the duration merely parses using time.ParseDuration (which accepts things like 48h and 2h30m), but this may be too lax. The HackMD documentation specifies an interval in integer seconds. I opted for the ParseDuration because it was at least as easy for me to write, while being nicer for the user.

Regardless, ttl is treated as a special tag, the value is validated, and an expires_at attribute (mirroring created_at) is now part of the metadata in the database. It's currently stored as a nullable int64 - int64 to mirror created_at, and nullable because it is optional. I note that an index covering expiration might be in order at some point.

Garbage collection was already in place for staged metadata, and it handles expired blobs with only a minor change to the query in GetPageOfExpiredBlobMetadata.

When metadata is presented to the client, a metadata object with an expiration will have an expires attribute. I chose expires to mirror the HTTP header of the same name - most of the meta attributes are very header-like.

I've renamed a few things, primarily because they were dealing exclusively with staged metadata before, but now deal with metadata in general. The changes in interface meant I had to regenerate the mocks. The tests are all running for me, but I don't claim deep insight into that particular operation.

…g a blob.

johnstairs · 2021-11-23T17:24:13Z

api/create.go

@@ -150,8 +163,8 @@ func CombineTagValidators(validators ...TagValidator) TagValidator {
 	}
 }

-func ValidateAndStoreOptionalSystemTag(tagName string, tagValues []string, field **string) error {
-	if err := systemTagValidator(tagName, tagValues); err != nil {
+func ValidateAndStoreOptionalSystemTag(tagName string, tagValues []string, field **string, validator func (string, []string) error) error {


You can use type TagValidator instead instead of the func signature

That does seem clearer. I'll sort it out.

johnstairs · 2021-11-23T17:28:50Z

core/types.go

@@ -4,6 +4,7 @@ package core

 import (
 	"context"
+	"database/sql"


It seems unnatural to take a sql dependency here.
I think instead you could move ExpirationToTime to the database package, since that is the only place it is called.

It went into types to mirror UnixTimeMsToTime, but having it in database makes sense. I'll move it over.

johnstairs · 2021-11-23T17:58:08Z

database/database.go

 	rows, err := r.db.
 		Model(blobMetadata{}).
 		Select(`subject, id`).
-		Where(`staged = ? AND created_at < ?`, true, olderThan.UnixMilli()).
+		Where(`(staged = ? AND created_at < ?) OR (expires_at < ?)`, true, olderThan.UnixMilli(), olderThan.UnixMilli()).


Some databases don't optimize some queries with ORs very well, and fall back to table scans. Have you checked the postgres query plan for this? Another approach is to do a UNION ALL with two queries.

I had not checked, but I've had a look:

EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE (staged = true AND created_at < 1637748781707) OR expires_at < 1637748781707 ORDER BY created_at ASC LIMIT 200;

QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Limit (cost=4.29..4.29 rows=1 width=42) (actual time=0.047..0.048 rows=0 loops=1) -> Sort (cost=4.29..4.29 rows=1 width=42) (actual time=0.047..0.047 rows=0 loops=1) Sort Key: created_at Sort Method: quicksort Memory: 25kB -> Seq Scan on blob_metadata (cost=0.00..4.28 rows=1 width=42) (actual time=0.028..0.028 rows=0 loops=1) Filter: ((staged AND (created_at < '1637748781707'::bigint)) OR (expires_at < '1637748781707'::bigint)) Rows Removed by Filter: 152 Planning Time: 0.075 ms Execution Time: 0.060 ms (9 rows)

EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE staged = true AND created_at < 1637750129345 UNION ALL SELECT subject, id FROM "blob_metadata" WHERE expires_at < 1637750129345

QUERY PLAN ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Append (cost=0.00..7.83 rows=2 width=34) (actual time=0.030..0.030 rows=0 loops=1) -> Seq Scan on blob_metadata (cost=0.00..3.90 rows=1 width=34) (actual time=0.019..0.019 rows=0 loops=1) Filter: (staged AND (created_at < '1637750129345'::bigint)) Rows Removed by Filter: 152 -> Seq Scan on blob_metadata blob_metadata_1 (cost=0.00..3.90 rows=1 width=34) (actual time=0.011..0.011 rows=0 loops=1) Filter: (expires_at < '1637750129345'::bigint) Rows Removed by Filter: 152 Planning Time: 0.075 ms Execution Time: 0.040 ms (9 rows)

It looks like the UNION query is indeed more efficient. I'm not sure it'll matter in any appreciable way - both queries are pretty light as far as I can tell - but I'm happy to go with the UNION ALL approach.

The remaining problem is that you don't have any kind of index to make the query for expired records efficient. I have ~ 10 million rows in the blob_metadata table.

This query uses a filtered index:

EXPLAIN ANALYZE SELECT subject, Id FROM blob_metadata WHERE staged = true and created_at < 1637786288230 QUERY PLAN Index Scan using staged on blob_metadata (cost=0.12..8.14 rows=1 width=35) (actual time=0.017..0.017 rows=0 loops=1) Index Cond: (created_at < '1637786288230'::bigint) Planning Time: 0.144 ms Execution Time: 0.040 ms

This one does not:

EXPLAIN ANALYZE SELECT subject, Id FROM blob_metadata WHERE expires_at < 1637786288230 Gather (cost=1000.00..194822.96 rows=1 width=35) (actual time=580.996..584.048 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on blob_metadata (cost=0.00..193822.86 rows=1 width=35) (actual time=542.972..542.973 rows=0 loops=3) Filter: (expires_at < '1637786288230'::bigint) Rows Removed by Filter: 3489451 Planning Time: 0.098 ms JIT: Functions: 12 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 2.476 ms, Inlining 0.000 ms, Optimization 2.073 ms, Emission 21.643 ms, Total 26.191 ms Execution Time: 585.222 ms

But I can fix that with this index:

CREATE INDEX expires on blob_metadata ( expires_at ) WHERE expires_at IS NOT NULL

Now the same query is handled much more efficiently:

EXPLAIN ANALYZE SELECT subject, Id FROM blob_metadata WHERE staged = true and created_at < 1637786288230 QUERY PLAN Index Scan using expires on blob_metadata (cost=0.12..8.14 rows=1 width=35) (actual time=0.007..0.028 rows=0 loops=1) Index Cond: (expires_at < '1637786288230'::bigint) Planning Time: 0.125 ms Execution Time: 0.126 ms

Now we can compare the two original queries (note that you don't need an ORDER BY).

EXPLAIN ANALYZE SELECT subject, Id FROM blob_metadata WHERE staged = true AND created_at < 1637786288230 OR expires_at < 1637786288230 QUERY PLAN Bitmap Heap Scan on blob_metadata (cost=8.27..12.28 rows=1 width=35) (actual time=0.019..0.021 rows=0 loops=1) Recheck Cond: (((created_at < '1637786288230'::bigint) AND staged) OR (expires_at < '1637786288230'::bigint)) -> BitmapOr (cost=8.27..8.27 rows=1 width=0) (actual time=0.016..0.017 rows=0 loops=1) -> Bitmap Index Scan on staged (cost=0.00..4.13 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: (created_at < '1637786288230'::bigint) -> Bitmap Index Scan on expires (cost=0.00..4.13 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (expires_at < '1637786288230'::bigint) Planning Time: 0.300 ms Execution Time: 0.124 ms

And

EXPLAIN ANALYZE SELECT subject, Id FROM blob_metadata WHERE staged = true AND created_at < 1637786288230 UNION ALL SELECT subject, Id FROM blob_metadata WHERE expires_at < 1637786288230 QUERY PLAN Append (cost=0.12..16.31 rows=2 width=35) (actual time=0.689..0.690 rows=0 loops=1) -> Index Scan using staged on blob_metadata (cost=0.12..8.14 rows=1 width=35) (actual time=0.676..0.677 rows=0 loops=1) Index Cond: (created_at < '1637786288230'::bigint) -> Index Scan using expires on blob_metadata blob_metadata_1 (cost=0.12..8.14 rows=1 width=35) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: (expires_at < '1637786288230'::bigint) Planning Time: 0.551 ms Execution Time: 1.016 ms

Looks like the OR is more efficient in the end.

That's actually pretty damn interesting!

I've reverted to the OR query in the latest commit, and I've attempted to add an index:

ExpiresAt sql.NullInt64 `gorm:"index:expires,where:expires_at is not null"`

It looks fine in the list of indexes (CREATE INDEX expires ON public.blob_metadata USING btree (expires_at) WHERE (expires_at IS NOT NULL) but it doesn't appear to affect the query:

EXPLAIN ANALYZE SELECT subject, id FROM "blob_metadata" WHERE (staged = true AND created_at < 1637922750201) OR expires_at < 1637922750201 LIMIT 200

QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────────────── Limit (cost=0.00..5.56 rows=1 width=34) (actual time=0.154..0.156 rows=0 loops=1) -> Seq Scan on blob_metadata (cost=0.00..5.56 rows=1 width=34) (actual time=0.152..0.153 rows=0 loops=1) Filter: ((staged AND (created_at < '1637922750201'::bigint)) OR (expires_at < '1637922750201'::bigint)) Rows Removed by Filter: 190 Planning Time: 0.248 ms Execution Time: 0.189 ms (6 rows)

I'm still getting a sequence scan, the index doesn't seem to be involved. Do I just need more rows, to make the index the faster option, or did I miss something?

Edit: I added some rows to the table, and the index is now being used.

johnstairs · 2021-11-23T18:16:46Z

I think that we should not be returning a blob if after its expireation time - regardless of whether the GC has cleaned it up or not.

kristofferknudsen · 2021-11-24T10:21:20Z

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

In this vein, I think we should respond 400 on a create with a negative ttl.

johnstairs · 2021-11-25T18:46:24Z

You will need to increment the schema version in order to signal that the database schema needs migration. I suggest something like this:

const (
	schemaVersionInitial        = 1
	schemaVersionAddExpiresAt   = 2
	schemaVersionLatest         = schemaVersionAddExpiresAt
	schemaVersionCompleteStatus = "complete"
)

johnstairs · 2021-11-25T20:54:06Z

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

Expired blobs are still showing up in searches and in get latest and in direct metadata reads.

kristofferknudsen · 2021-11-26T08:59:54Z

I agree in the sense that we should never reveal the metadata for an expired blob - I'm okay with a grace period for the data itself. It would give very late searches a chance to use the results.

Expired blobs are still showing up in searches and in get latest and in direct metadata reads.

I've been working on getting them filtered out- let me commit what I have at the moment.

I've gone with filtering the search- and blob queries based on expiration. Get- and SearchBlobMetadata now take an expiresAfter time argument (usually now). This has added a filter to the queries, and some arguments to the database calls. It's not ideal, in the sense that expiration is now something a lot of code has to care about (at least a little).

I considered just deleting the blobs when they expired. It would keep the queries simpler, and generally be fine. There would be some timer-juggling in the background - I don't think it would be less code in the end - but the heart of the code base would not need to worry about expiration. We would, however, lose the ability to serve up data past it's expiration. I don't think this is vital, but I like having a grace period where late clients can get at the results even though they technically expired.

I decided that latest - even though is it a data endpoint - would not observe a grace period. I think it would surprise a lot of clients to find that the recent blob they created with super-short ttl will be returned by the latest endpoint for a long time after it's expiry.

Expired data can still be reached, if you already know where to find it. This gives late searches a grace period where they can still get the data. Added some tests for expired blob visibility. Regenerated mocks - the interface changed.

Reverted GC query to use OR in stead of UNION ALL. Added an index to expires_at.

kristofferknudsen · 2021-11-26T10:17:42Z

I have reverted the default paths for the sqlite database and the blobs to the _data folder in the current directory. I didn't intend to commit it here, but since it's in: I don't think we should be writing anything to the file system root by default. It's fine for the docker container, but it should be configured explicitly in the Dockerfile (which I have not done).

johnstairs · 2021-11-26T13:01:36Z

I have reverted the default paths for the sqlite database and the blobs to the _data folder in the current directory. I didn't intend to commit it here, but since it's in: I don't think we should be writing anything to the file system root by default. It's fine for the docker container, but it should be configured explicitly in the Dockerfile (which I have not done).

I agree that's better. The reason I did this was to make the default configuration work properly in a container. But I see that was at the expense of the local go run use case. Setting default environment variables in the dockerfile is a better approach.

johnstairs · 2022-01-14T12:27:25Z

Apologies for the delay, @kristofferknudsen, I will get on this today.

johnstairs · 2022-02-07T21:29:54Z

If I (mistakenly) add on a ttl parameter to a search:
GET http://localhost:3333/v1/blobs?subject=123&session=mysession&name=NoiseCovariance&ttl=2m
ttl is interpreted as a custom tag that could heve have been set, so I get an empty resultset. So I think this should be an error.
Also, on create, I wonder if the parameter should be _ttl to indicate that it is not a tag, similar to how we have _at and _ct for searches.

johnstairs · 2022-02-07T21:33:47Z

We should document this new capability on the README.md

johnstairs

Looks good overall. A couple of fit and finish items. Sorry for taking so long to review.

johnstairs · 2022-02-07T19:41:59Z

api/create.go

+
+	for _, t := range tagValues {
+		if _, err := time.ParseDuration(t); err != nil {
+			return err


I suggest we add the parameter name to the error. Right now, we return this message:
"message": "time: invalid duration \"never\""

No problem, good errors are worth their weight in gold.

johnstairs · 2022-02-07T19:47:32Z

api/api.go

-			r.Get("/{combined-id}", handler.MakeBlobEndpoint(handler.BlobMetadataResponse))
-			r.Get("/{combined-id}/data", handler.MakeBlobEndpoint(handler.BlobDataResponse))
+			r.Get("/{combined-id}", handler.MakeBlobEndpoint(handler.BlobMetadataResponse, 0 * time.Second))
+			r.Get("/{combined-id}/data", handler.MakeBlobEndpoint(handler.BlobDataResponse, 30 * time.Minute))


I suggest we add the expiration datetime to the response headers at the data endpoint

Of course! I don't know why I forgot that.

johnstairs · 2022-02-07T21:40:49Z

e2e_test.go

+	require.Nil(t, err)
+
+	almostEqual := func (a time.Time, b time.Time) bool {
+		return math.Abs(a.Sub(b).Seconds()) < 5e-2


I would be more tolerant here, since tests that are based on timing tend to fail randomly, and the precision here is much higher than is actually required for a 10m ttl.

I've updated this to half a second, but even that seems a bit arbitrary. The fuzz is there to account for the time it takes the server to make the expiration record in the database, and get back to the client. The test keeps and eye on the times, to make sure that the actual expiration is as the client expects, but we can have several seconds of fuzz without any issues.

johnstairs · 2022-02-07T21:48:06Z

e2e_test.go

@@ -96,6 +97,8 @@ func TestInvalidTags(t *testing.T) {
 		{"tag name that is too long", fmt.Sprintf("subject=sub&%s=abc", strings.Repeat("a", 65))},
 		{"Location", "subject=s&location=l"},
 		{"Last-Modified", "subject=s&lastModified=2021-10-18T16:56:15.693Z"},
+		{"ttl", "not-an-interval"},


"ttl" is the (sub) test case name, "not-an-interval" is the query string. Which means that this test is failing because the entire query string is invalid, not because the ttl is not recognized.

johnstairs · 2022-02-07T21:48:37Z

api/create.go

+func ValidateTimeToLive(tagName string, tagValues []string) error {
+
+	for _, t := range tagValues {
+		if _, err := time.ParseDuration(t); err != nil {


You should validate that the the ttl is not negative

johnstairs · 2022-02-07T21:51:26Z

e2e_test.go

+	body := "An expired blob should not be displayed in search results."
+	subject := fmt.Sprint(time.Now().UnixNano())
+
+	for _, ttl := range []string{"0s", "10m", "10m"} {


Was it delibrate to have 10m twice?

The actual interval doesn't matter, but I use the same value twice to not involve the ordering of the search result in the test.

kristofferknudsen · 2022-02-08T11:10:21Z

If I (mistakenly) add on a ttl parameter to a search: GET http://localhost:3333/v1/blobs?subject=123&session=mysession&name=NoiseCovariance&ttl=2m ttl is interpreted as a custom tag that could heve have been set, so I get an empty resultset. So I think this should be an error. Also, on create, I wonder if the parameter should be _ttl to indicate that it is not a tag, similar to how we have _at and _ct for searches.

Absolutely sensible, I've swapped time-to-live to use _ttl. I'll have to update Python, Matlab and Gadgetron at one point, but it should be a one-character-change.

I've made a pass on your comments; all very sensible. I've not updated the README yet, I hope to get around to that soon.

johnstairs · 2022-02-08T13:47:59Z

Ok let's get this in. Follow-up items to consider:

Maybe disallowing thettl tag to avoid confusion
Error on searches specifying _ttl (and maybe ttl)
Updating the README.

First pass at adding support for specifying time-to-live when creatin…

7bf5f44

…g a blob.

johnstairs reviewed Nov 23, 2021

View reviewed changes

Minor changes in accordance with feedback/comments.

1efdb12

Merge branch 'main' into respect-ttl

1166e0f

Kristoffer Langeland Knudsen added 2 commits November 26, 2021 10:02

Updating database schema version.

c812664

Reverted GC query to use OR in stead of UNION ALL. Added an index to expires_at.

Merge branch 'main' into respect-ttl

23a8228

kristofferknudsen requested a review from johnstairs January 12, 2022 13:05

hansenms mentioned this pull request Feb 3, 2022

Gadgetron Storage using MRD Storage Server gadgetron/gadgetron#1059

Closed

johnstairs approved these changes Feb 8, 2022

View reviewed changes

Updated a few things, re review.

c066f21

johnstairs merged commit e0c6a63 into ismrmrd:main Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First pass at adding support for blob time-to-live. #1

First pass at adding support for blob time-to-live. #1

kristofferknudsen commented Nov 18, 2021

johnstairs Nov 23, 2021

kristofferknudsen Nov 24, 2021

johnstairs Nov 23, 2021

kristofferknudsen Nov 24, 2021

johnstairs Nov 23, 2021

kristofferknudsen Nov 24, 2021

johnstairs Nov 25, 2021

kristofferknudsen Nov 26, 2021 •

edited

Loading

johnstairs commented Nov 23, 2021

kristofferknudsen commented Nov 24, 2021

johnstairs commented Nov 25, 2021

johnstairs commented Nov 25, 2021

kristofferknudsen commented Nov 26, 2021

kristofferknudsen commented Nov 26, 2021

johnstairs commented Nov 26, 2021

johnstairs commented Jan 14, 2022

johnstairs commented Feb 7, 2022

johnstairs commented Feb 7, 2022

johnstairs left a comment

johnstairs Feb 7, 2022

kristofferknudsen Feb 8, 2022

johnstairs Feb 7, 2022

kristofferknudsen Feb 8, 2022

johnstairs Feb 7, 2022

kristofferknudsen Feb 8, 2022

johnstairs Feb 7, 2022

johnstairs Feb 7, 2022

johnstairs Feb 7, 2022

kristofferknudsen Feb 8, 2022

kristofferknudsen commented Feb 8, 2022 •

edited

Loading

johnstairs commented Feb 8, 2022

First pass at adding support for blob time-to-live. #1

First pass at adding support for blob time-to-live. #1

Conversation

kristofferknudsen commented Nov 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kristofferknudsen Nov 26, 2021 • edited Loading

Choose a reason for hiding this comment

johnstairs commented Nov 23, 2021

kristofferknudsen commented Nov 24, 2021

johnstairs commented Nov 25, 2021

johnstairs commented Nov 25, 2021

kristofferknudsen commented Nov 26, 2021

kristofferknudsen commented Nov 26, 2021

johnstairs commented Nov 26, 2021

johnstairs commented Jan 14, 2022

johnstairs commented Feb 7, 2022

johnstairs commented Feb 7, 2022

johnstairs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kristofferknudsen commented Feb 8, 2022 • edited Loading

johnstairs commented Feb 8, 2022

kristofferknudsen Nov 26, 2021 •

edited

Loading

kristofferknudsen commented Feb 8, 2022 •

edited

Loading