Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add materialized views #3000

Merged
merged 17 commits into from
Sep 16, 2024

Conversation

AndrewSisley
Copy link
Contributor

Relevant issue(s)

Resolves #2951

Description

Adds materialized views. Also makes materialized views the default (see discord discussion).

The caching behaviour of views in tests is now selected via an environment variable, meaning (with the exception of a few specific examples) a test with a view will test both cacheless and materialized variants - in the CI this adds a new dimension to the matrix, although materialized views are only executed using the simple settings (in-mem store, go client, etc) for now.

#2999 has been mostly fixed in this PR, but not completely - this is why some logic in the lens node has changed.

@AndrewSisley AndrewSisley added feature New feature or request area/query Related to the query component labels Sep 13, 2024
@AndrewSisley AndrewSisley added this to the DefraDB v0.14 milestone Sep 13, 2024
@AndrewSisley AndrewSisley requested a review from a team September 13, 2024 17:08
@AndrewSisley AndrewSisley self-assigned this Sep 13, 2024
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 76.21053% with 113 lines in your changes missing coverage. Please review.

Project coverage is 79.41%. Comparing base (ea3a74f) to head (5bd2de4).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
internal/db/view.go 68.60% 21 Missing and 17 partials ⚠️
internal/planner/view.go 76.19% 14 Missing and 6 partials ⚠️
http/handler_store.go 50.00% 12 Missing and 5 partials ⚠️
http/client.go 41.18% 5 Missing and 5 partials ⚠️
cli/view_refresh.go 80.95% 4 Missing and 4 partials ⚠️
internal/core/key.go 74.07% 7 Missing ⚠️
internal/db/store.go 50.00% 3 Missing and 3 partials ⚠️
internal/core/view_item.go 96.49% 1 Missing and 1 partial ⚠️
internal/db/collection_define.go 66.67% 1 Missing and 1 partial ⚠️
internal/request/graphql/schema/collection.go 90.48% 1 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3000      +/-   ##
===========================================
- Coverage    79.49%   79.41%   -0.08%     
===========================================
  Files          329      331       +2     
  Lines        25225    25670     +445     
===========================================
+ Hits         20051    20384     +333     
- Misses        3756     3828      +72     
- Partials      1418     1458      +40     
Flag Coverage Δ
all-tests 79.41% <76.21%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cli/cli.go 100.00% <100.00%> (ø)
client/collection_description.go 83.05% <100.00%> (+0.29%) ⬆️
client/db.go 100.00% <ø> (ø)
internal/db/definition_validation.go 95.52% <100.00%> (+0.11%) ⬆️
internal/db/errors.go 65.72% <100.00%> (+1.11%) ⬆️
internal/db/lens.go 72.15% <100.00%> (+0.72%) ⬆️
internal/planner/lens.go 73.49% <100.00%> (+4.66%) ⬆️
internal/request/graphql/schema/manager.go 98.44% <100.00%> (+0.01%) ⬆️
internal/request/graphql/schema/types/types.go 100.00% <100.00%> (ø)
http/http_client.go 76.74% <0.00%> (ø)
... and 10 more

... and 16 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea3a74f...5bd2de4. Read the comment docs.

to the view will recieve items accessible to the user refreshing the view's permissions.

Example: refresh all views
defradb view refresh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: defradb client view refresh

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers Keenan, nice spot :)

  • Fix CLI doc

AuthorView {
name
books {
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: did you forget to uncomment or remove this?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did! Thanks Keenan

  • Uncomment test, and revert the second half of the test

Comment on lines 31 to 32
View is refreshed as the current user, meaning results returned for all subsequent query requests
to the view will recieve items accessible to the user refreshing the view's permissions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thanks for documenting this.

thought: It makes me think though that this is most likely not the way we want to handle access long term. Access should be reflecting the rights of the user doing the request, not that of the user who refreshed the view. Curious to have your opinion on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current behaviour long term, and see very few alternatives (we could have the node-user refresh the view).

Worth remembering is that medium-long term view items will be shareable across P2P without needing to share the underlying docs. And we have Lens stuff at play too (not all view items are sourced from any document, Lens can create their own).

The view has its own set of permissions that should be respected, although this was not handled when ACP was introduced to Defra #2018.

IMO a view is a copy of the transformed data, a user has created a view and is sharing it with who ever they see fit, as a convenient alternative to printing the results out and mailing it to people.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth remembering is that medium-long term view items will be shareable across P2P without needing to share the underlying docs.

This is specially why we can't have this behaviour long term. We can't allow some random user to access the contents of a view because the user that created the view had the rights to the documents.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user that created the view would need to give that random user access to it.

There is nothing we (or anyone else, really) can do to stop users sharing data they already have read access to - that is not a technical problem (see print and mail example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring view-viewers to also have read access to the underlying data would totally undermine what I see as the primary use case - exposing data aggregates without exposing the data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user that created the view would need to give that random user access to it.

Right. Hence why I'm saying that the description is not what we want long term.

There is nothing we (or anyone else, really) can do to stop users sharing data they already have read access to - that is not a technical problem (see print and mail example).

This tells me we have a misunderstanding. We can clarify things over a call if you'd like.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on Discord, Fred and I were chatting about different things. I'll tweak the wording.

  • Reword docs

// If it is true, they will be, if false, the data returned on query will be calculated
// at query-time from source.
//
// At the moment this can only be set to `false` if this collection sources it's data from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: its data

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Cheers

  • Typo

client/db.go Outdated
@@ -196,6 +196,14 @@ type Store interface {
transform immutable.Option[model.Lens],
) ([]CollectionDefinition, error)

// RefreshViews refreshes the caches of all views matching the given options. If no options are set all views
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: If no options are set,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this one as is, I don't think the comma helps, and I think I'm usually guilty of over-using them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding that without the comma it reads If no options are set-all-views...

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on discord, adding comma

  • add comma

// RefreshViews refreshes the caches of all views matching the given options. If no options are set all views
// will be refreshed.
//
// The cached result is dependent on the ACP settings of the source data and the permissions of the user making
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This makes me think that there might be need for setting who is allowed to cache views on a given node. Otherwise this could be abused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, we'll either want it covered by admin ACP or allow users to own views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth documenting this in: #2640

// CollectionRootID is the Root of the Collection that this item belongs to.
CollectionRootID uint32

// ItemID is the unique (to this CollectionRootID) of the View item.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: is the unique ID?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers, took me a while to figure out what you meant on re-read :)

  • Add ID after closing of brackets

{ "op": "replace", "path": "/1/IsMaterialized", "value": false }
]
`,
ExpectedError: "non-materialized collections (only views) are not supported. Collection: User",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This reads a little weird. I would suggest to move the parenthesis at the end non-materialized collections are not supported. (only views)

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change, but I'll probably just remove the brackets.

  • tweak error message

Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just half way through. Looks good so far.

if existingCol.IsMaterialized && !col.IsMaterialized {
// If the collection is being de-materialized - delete any cached values.
// Leaving them around will not break anything, but it would be a waste of
// storage space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: thanks for comments. Very helpful

return err
}

hasNext, err := source.Next()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think hasValue better reflects the nature of the var

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, although I think hasNext is used more frequently in the codebase.

  • Rename to hasValue

Switch will be extended shortly, andis nicer than lots of if-elses
Comment on lines 31 to 32
View is refreshed as the current user, meaning results returned for all subsequent query requests
to the view will receive items generated using the user refreshing the view's permissions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This is better but still not very clear. I assume a new user looking at this would still get confused.

suggestion: I said this at some point previously but I can't remember where exactly and John just reminded me of it: We should probably materialize views from the view point of the node and not that of a given user. Queries on the view can then be filtered through ACP for the user doing the request. This will alleviate the need to have multiple materialized versions of the view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will alleviate the need to have multiple materialized versions of the view.

This is an alternative, and long term we probably want both anyway (e.g. here's my personal quick view, go and have a look).

I just picked the easiest, as there is time pressure on this PR and getting the node user is a bit more involved.

This is better but still not very clear. I assume a new user looking at this would still get confused

Suggestions appreciated, it is a moderately complex thing to state in a line or two.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets clear up the question of what user is used to generate the view and then we can improve the documentation. I'll bring it up during standup so we can get consensus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets clear up the question of what user is used to generate the view and then we can improve the documentation.

That sounds like unwanted scope creep forced into a time-sensitive and deliberately lean initial PR, but sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in standup, user-identity is fine for now, long term we may want something else

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the standup. This implementation is working for the use case that triggered this PR and as such we can leave it as is.

As for the wording of the documentation, here is an possible alternative:

Suggested change
View is refreshed as the current user, meaning results returned for all subsequent query requests
to the view will receive items generated using the user refreshing the view's permissions.
View is refreshed as the current user, meaning the cached items will reflect that user's
permissions. Subsequent query requests to the view, regardless of user, will receive
items from that cache.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update, thanks Fred

  • Update doc

internal/db/definition_validation.go Outdated Show resolved Hide resolved
internal/db/view.go Show resolved Hide resolved
Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{ "op": "replace", "path": "/2/IsMaterialized", "value": true }
]
`,
ExpectedError: "materialized views do not support ACP. Collection: UserView",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thanks for not forgetting this :)

}
}`,
Results: map[string]any{
// Even though UserView was created after the document was created, the results are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: It says: Even though UserView was created after the document was created but correct me if I am wrong weather UserView is created after or even before the results will always be empty unless the views are refreshed manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I want the createDoc before the view though, as it is more likely to change/break in the future (e.g. auto-refreshing on View create) - this test will catch that as a behaviour change, whereas creating the doc after defining the view would not.

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -117,3 +107,142 @@ func (n *viewNode) Close() error {

return nil
}

func convertBetweenMaps(srcMap *core.DocumentMapping, dstMap *core.DocumentMapping, src core.Doc) core.Doc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: would be nice to have a comment for this function. From the name it's not clear why it's called "convert" although both source and destination have the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the first two params to this function are srcMap and dstMap, and the place it is called is heavily documented with the reason for calling it.

Comment on lines +208 to +212
if err != nil {
return err
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: return err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I dislike that if it is not with the immediate result of another func call (e.g. return foo()), otherwise it looks like it is actually returning a non-nil error. Leaving as is.

}

var err error
n.currentValue, err = core.UnmarshalViewItem(n.documentMapping, result.Value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is this view stuff really part of the core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it in encoding but that created a circular dependency, so it had to go here.

@@ -220,6 +227,22 @@ type CreateView struct {
ExpectedError string
}

type RefreshViews struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: would be nice to have the struct documented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: The coverage file names will clash currently as you forget to add the new type after line 225

Just append after this block:

coverage\
            _${{ matrix.os }}\
            _${{ matrix.client-type }}\
            _${{ matrix.database-type }}\
            _${{ matrix.mutation-type }}\
            _${{ matrix.lens-type }}\
            _${{ matrix.acp-type }}\
            _${{ matrix.database-encryption }}\
            

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers Shahzad, sorted.

Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job Andy!

@AndrewSisley AndrewSisley merged commit 4989901 into sourcenetwork:develop Sep 16, 2024
42 of 43 checks passed
@AndrewSisley AndrewSisley deleted the 2951-cached-views branch September 16, 2024 23:08
shahzadlone added a commit that referenced this pull request Sep 17, 2024
## Relevant issue(s)

Resolves #3014 

## Description

Failed run due to code-cov upload file's name clashing:

https://github.com/sourcenetwork/defradb/actions/runs/10893445627/job/30228379591

Happened in merge commit:
[#4989901](4989901)

Just needed to fix the typo introduced in [`15f244d`
(#3000)](15f244d)

Should be `matrix.view-type` not `matrix.matrix.view-type`, my bad I
missed it even while re-reviewing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for materialized views
5 participants