You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Key dataset blocks are all kinds of blocks, except those that represent data nodes (AddData, ExecuteTransform).
Most of them, like SetDataSchema, Seed, SetPollingSource, can only have 1 active version at a given moment of time.
Only AddPushSource may have several active versions, which are distinguishable by a source name.
Key dataset blocks are often requested in all parts of the system, including:
HTTP/GraphQL APIs (under transaction)
planning phases of long operations (under transaction)
execution phases of long operatinos (no transaction).
Implement a mechanism to allow satisfying most visitors with the database-cached versions of key events.
When transaction is open, it could mean a query to the database, given event type and sequence number or hash representing upper node boundary. Without the transaction they need to be pre-fetched at planning phase in some way and propagated to execution phases.
Caching layer may be added as well for both cases (with and without transaction), so that the same type of block is not requeried multiple times within the same operation.
Although many scanning algorithms except to see "last data block", attempting to replicate it's history in the database is senseless, as it will create a copy of the dataset metadata. It looks cheaper to access the chain directly, ensuring the scanning starts from the safe version of HEAD that was known at the beginning of the operation.
Note that some visiting patterns inherently except to iterate entire dataset, i.e. transform planning, sync, querying. Here we can't really optimize any accesss at all, but we must ensure that a parallel write does not confuse this scanning with events committed by a parallel transaction.
Key dataset blocks must be kept in sync transactionally:
implement indexing solution for initial fill for existing datasets that are scanned this way for the first time
implement update procedure that follows a change of HEAD reference, and updates key dataset blocks in the same transaction as the reference
The text was updated successfully, but these errors were encountered:
Depends on #978
Key dataset blocks are all kinds of blocks, except those that represent data nodes (
AddData
,ExecuteTransform
).Most of them, like
SetDataSchema
,Seed
,SetPollingSource
, can only have 1 active version at a given moment of time.Only
AddPushSource
may have several active versions, which are distinguishable by a source name.Key dataset blocks are often requested in all parts of the system, including:
Implement a mechanism to allow satisfying most visitors with the database-cached versions of key events.
When transaction is open, it could mean a query to the database, given event type and sequence number or hash representing upper node boundary. Without the transaction they need to be pre-fetched at planning phase in some way and propagated to execution phases.
Caching layer may be added as well for both cases (with and without transaction), so that the same type of block is not requeried multiple times within the same operation.
Although many scanning algorithms except to see "last data block", attempting to replicate it's history in the database is senseless, as it will create a copy of the dataset metadata. It looks cheaper to access the chain directly, ensuring the scanning starts from the safe version of HEAD that was known at the beginning of the operation.
Note that some visiting patterns inherently except to iterate entire dataset, i.e. transform planning, sync, querying. Here we can't really optimize any accesss at all, but we must ensure that a parallel write does not confuse this scanning with events committed by a parallel transaction.
Key dataset blocks must be kept in sync transactionally:
The text was updated successfully, but these errors were encountered: