-
Notifications
You must be signed in to change notification settings - Fork 129
Better support for scalar lists #404
Comments
Loving this. Just an idea: can the same things also be applied to 'normal relations' (pop, push, remove, set)? E.g. nested mutations |
Relations don't have a defined order, so |
Well, |
Regarding the insert algorithm. If you use a string instead of a number to specify the ordering, you have a lot more room to navigate without having to 'reorder'. Given 3 items with a,b,c, adding a new item at position 2 (between b and c) would give it order ba. Next, inserting at position 3 (between 'ba' and 'c') would give 'bb'. Now, inserting between 'ba' and 'bb' would give 'baa'. A total of 5 characters already gives you space to store 12 million ordered elements. A certain 'branch' of the table can become 'heavy' if a lot of elements are inserted close to eachother (e.g. you can end up with a lot of letters). You could always have a database maintenance task to 'smooth out' the ordering, basically doing a reordering that is 'flatter' or leaves more gaps. For example, you have 1000 items in the list. You need three letters to order them (262626), so you can skip each odd letter (a-c-e-....), so you don't have to reorder soon. The big benefit of this that you never have to reorder when you are inserting (causing a delay), because there's always room between any 2 records. A maintenance task can be scheduled. |
@kbrandwijk I like the idea of using a varchar for the position column. This essentially offloads the work of injecting space at the node creation time to the database. We will perform some benchmarks before implementing this feature to better understand performance impact on query and insert as well as the storage requirements. Supporting some of the same mutation functionality for ordinary relations is a great idea that we should explore! |
@sorenbs Another option might be using a double-linked list, with |
Would there be any additional functionality for unique scalar lists? For example, with this schema: |
What does
I'm not convinced that covering this use case with unique scalar lists is a good idea. |
Maybe something like |
If we do add support for explicitly declaring that all elements must be distinct, I like the idea of adding a separate directive. It might be instructive to see how MongoDB handles this https://docs.mongodb.com/manual/reference/operator/update-array/ They have an addToSet operator that only add an element if it is not present yet. It is still possible to add duplicates to the array using the normal push operation though. -- Regarding @isUnique - we can't easily support the unique guarantee on scalar lists. There is no way to enforce this on the database level, so even if we did add support, it would add a lot of overhead |
Unfortunately, there's no consensus yet on graphql/graphql-spec#190, otherwise introducing something like |
Yes I believe the idea of something like |
We should consider supporting the native JSON type in SQL databases for implementing this feature. MySQL has had JSON support since 5.7, but only added support for retrieving a slice of an array in 8.0.2: https://bugs.mysql.com/bug.php?id=79052 As MySQL 8 does not have a stable release yet, no major cloud provider support it yet. Another complicating with using native JSON support is that the implementation will be different between databases. Another issue is that we would be limited to a simple filter with
|
Another thing that could be added in more than the existing nested filter is something like "field_in_exclusive". For instance the "field_in : ["A", "B" ]" will use a OR to match the condition and in a lot of use cases we can need a AND. And I don't see any proper way to implement for myself yet ( so if you have any clue :) ) |
Hi guys, After talking with @marktani, I would like to know what is the status about length filters ? |
This is checked off in beta-3 of Primsa which has now been released:
Should we expect to be able to run filters within arrays in Primsa? Or not yet? (cc @marktani ) |
Strangely, it's also referenced into the post 1.0 release list... |
Hey @philcockfield and @johannpinson, thanks for bringing this up! This paves the way for adding scalar list filters into the Prisma API, without introducing a breaking change. We'll tackle this feature shortly, but there is no concrete timeline yet. |
Cool - thanks @marktani....as you know ( 😄 ) a lot clean design choice rest on being able to filter into lists/arrays. Really looking forward to having this!! |
Tuesday Nov 14, 2017 at 15:05 GMT
Originally opened as https://github.com/graphcool/prisma/issues/1275
The current data structure for scalar lists is not able to provide the advanced filters and mutations we want to support.
New Features
Given the schema
Filters
Example
Specification
A scalar list field generate the same 3 top-level filter arguments as normal relations:
The nested filter arguments are almost the same as for normal relations, except for scalar lists the value is contained directly in the list, instead of being in a node in the list. So we don't need a field prefix:
Mutations
Create
The create mutation will work as it currently does:
Update
The update mutation supports the following new list operations
push
The
push
operation adds one or more elements to the end of the listThe position argument can be used to specify the location of the new elements. 0 means the elements are pushed at the beginning, 1 means they are pushed behind the first element etc. If the position argument is greater than the length of the list, the new elements are pushed at the end of the list
before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,42,1337,5,6,7,8,9,10]
pop
Removes one or more elements from the beginning or end of the list. Pass a positive value to remove from the end of the list, pass a negative value to remove from the beginning of the list:
before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,5,6,7,8,9]
before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,5]
before: [1,2,3,4,5,6,7,8,9,10]
after: [6,7,8,9,10]
remove
Removes all elements from the list that match a filter
before: [1,2,3,4,5,6,7,8,9,10]
after: [5,6,7,8,9,10]
set
Removes existing elements and add new elements
before: [1,2,3,4,5,6,7,8,9,10]
after: [1,3,5]
Data Structure
Requirements
The default data structure for scalar lists represents a compromise of the following requirements:
get element 50.000 through 50.010
insert element at position 50.000
get nodes where list contain the element "Abba"
Table Structure
A scalar list is stored in a separate table with the structure:
id
column in the model table and hason cascade: delete
Algorithm
Inserting
The first element is inserted at position 1.000, second element at position 2.000 and so on.
When inserting an element at a position that is not at the beginning or end, the position is calculated like this:
pos(x) + floor((pos(x+1) - pos(x)) / 2)
.Given these positions:
[1000, 2000]
Inserting a new element at index 1 (between the two existing elements) would result in the following positions:
[1000, 1500, 2000]
This is a fast operation.
We can keep inserting elements "in between" like this; 1250, 1125, 1062, 1031, 1015, 1007, 1003, 1001
When we run out of space between two positions we have to increment all positions after the one where the new element will be inserted. This is an expensive operation:
[1000, 1001, 1003, ...]
=>[1000, 2001, 2003, ...]
=>[1000, 1500, 2001, 2003, ...]
The sql for this operation is of the form:
Querying
Getting a range of elements:
Filtering
Join the model table and the scalar list table and use the normal filter logic
Limitations
Insert performance
Inserting an element at beginning or end is generally fast.
Inserting an element in the middle of the list is fast as long as there is enough space. If we need to add space to the position column, this becomes a slow operation. For a list with 100.000 elements this operation takes on the order of 1 second. For 500.000 elements this operation takes 5-20 seconds.
Number of elements
As performing the space injection operation is expensive on large lists, we recommend keeping scalar lists below 50.000 elements.
The position is stored in a signed INT which requires 4 bytes and can store values in the -2147483648 to 2147483647 range. This enables us to store 2+ million values in the positive space and negative space.
If values are continuously inserted and deleted in a way that requires us to perform the space injection operation, it is possible to exhaust the available position space. The first implementation will not try to reclaim space, but this can be implemented in the future
The text was updated successfully, but these errors were encountered: