-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): add core enums, traits and functions for versioned storage #1172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added few comments.
Let me know if you think I should add some comments in the code as well.
#[derive(Copy, Clone, Debug, Display, Eq, PartialEq, EnumString, AsRefStr, EnumIter)] | ||
#[strum(serialize_all = "snake_case")] | ||
pub enum ContentType { | ||
Beacon, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mostly placeholders for now. Their purpose is to define the type of content that is stored in one store, and for store to differentiate between types (e.g. different column in a table, or completely different table name).
@ogenev Maybe Beacon should have multiple types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently store four types for beacon:
LightCllentBootstrap
(incontent_data
table, but we will need to purge any values older than the subjectivity period (a few months))LightClientUpdate
(inlc_update
table)LightClientOptimisticUpdate/LightClientFIialityUpdate
(those are stored in a cache, because we keep only the latest values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the LightCllentBootstrap
, there is no concept of distance (in deciding whether we store) or storage limit?
Meaning, we should store all fresh content and discard/purge all older content, right?
|
||
/// Creates the instance of the store. This shouldn't be used directly. Store should be | ||
/// created using `create_store` function. | ||
fn create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that anybody can just call this (as comments says, create_store
function should be used). But I don't know how to enforce that.
I'm open for suggestions
trin-storage/src/versioned/utils.rs
Outdated
) -> Result<S, ContentStoreError> { | ||
let conn = config.sql_connection_pool.get()?; | ||
|
||
let old_version = lookup_store_version(content_type, &conn)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have some logic to detect legacy version (e.g. Legacy*
).
It's not important until we have implementations.
Once this PR is merged, I will create an Issue to track this.
trin-storage/src/versioned/mod.rs
Outdated
pub use store::VersionedContentStore; | ||
pub use utils::create_store; | ||
|
||
#[derive(Copy, Clone, Debug, Display, Eq, PartialEq, EnumString, AsRefStr, EnumIter)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are we actually using all of these derived traits? I would avoid adding a derived trait unless it's necessary
- I think a simple
impl ToString for ContentType
would accomplish all you need here. So far, in this codebase, we've done this rather than use thestrum
crate. Imo, the priority here is to maintain consistency within the codebase. Now, the question as to whetherstrum
is better thanimpl ToString
is a valid question. But that should be addressed separately, and if we decide to usestrum
, then it's worth combing the codebase and making all the other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I removed derived that I'm not using.
- I found only one
impl ToString
and two usages ofstrum
. I assume you mean:fmt::Display
+FromStr
because we have many of those.
I personally prefer strum
because it's very nice and easy, and has many options and use cases. If there is no strong objection, I would go and replace all current usages with it.
If there is preference not to use strum
, I will change this one as well.
trin-storage/src/versioned/mod.rs
Outdated
#[strum(serialize_all = "snake_case")] | ||
pub enum StoreVersion { | ||
LegacyContentData, | ||
IdIndexed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the vision here, if you can help me understand. For each new "version" we would add an entry to the StoreVersion
enum? Is this better than just using numbers or semver(might be overkill)? If we just use these "strings" to identify each version...
- it is helpful to understand the changes included in the update (though, this can be accomplished with comments)
- it's not intuitive to understand the order of the upgrades (though, this can be accomplished with comments)
I'm not sure, it's just seems like using something likev0
,v1
..v100
is a bit more intuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that different content types can be stored and potentially have completely different scheme that are not compatible and transitions are not possible (e.g current content_data
and lc_updatetables, or MasterAccumulator as another
StoreVersion`). So they don't have to be just for the portal network types.
So versioning using numbers or v1,v2 ... wouldn't work (because it doesn't tell you for which type it is). but something like IdIndexedV1, IdIndexedV2, ... , LightClientUpdatesV1, LightClientUpdatesV2, ...
could.
I'm not set on the names and what they should be, but I would be happier if they would describe at least for what content type they are (and maybe how they are different from other versions).
Another ideas is that for each StoreVersion
, we should have exactly one implementation of the VersionedContentStore
. System should be flexible so that each implementation can do whatever it wants, even being completely in memory, or not using SQLite and using some other DB, or just use simple file (e.g. what MasterAccumulator is doing) or anything else.
In near future, we should have something like this:
pub enum ContentType {
Beacon,
History,
State,
LightClient, // maybe not the best name
}
pub enum StoreVersion {
IdIndexedLegacy, // current `content_data` table
IdIndexedV1, // upcoming new version
LightClientUpdatesV1, // current `lc_update` table
}
with following usages:
- HistoryStorage
IdIndexedLegacy
withContentType::History
(with transition toIdIndexedV0
when we think it's ready)
- BeaconStorage - two content stores
IdIndexedLegacy
withContentType::Beacon
(with transition toIdIndexedV0
when we think it's ready)LightClientUpdatesV1
withContentType::LightClient
.
- StateStorage
IdIndexedV0
withContentType::State
CREATE TABLE IF NOT EXISTS store_info ( | ||
content_type TEXT PRIMARY KEY, | ||
version TEXT NOT NULL | ||
)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a timestamp here, so that we can identify when the db was last updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it, but I'm not sure why that timestamp would be useful and how it will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a great reason... Maybe just to check when the db was last migrated? But I'm not sure that's a strong reason, since the version to which it was last migrated is already available, and that's the important information. So feel free to disregard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left more detailed explanation about my idea on one comment in the trin-storage/src/versioned/mod.rs
file.
I'm happy to have a video call and discuss things in order to move faster with this.
trin-storage/src/versioned/mod.rs
Outdated
pub use store::VersionedContentStore; | ||
pub use utils::create_store; | ||
|
||
#[derive(Copy, Clone, Debug, Display, Eq, PartialEq, EnumString, AsRefStr, EnumIter)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I removed derived that I'm not using.
- I found only one
impl ToString
and two usages ofstrum
. I assume you mean:fmt::Display
+FromStr
because we have many of those.
I personally prefer strum
because it's very nice and easy, and has many options and use cases. If there is no strong objection, I would go and replace all current usages with it.
If there is preference not to use strum
, I will change this one as well.
CREATE TABLE IF NOT EXISTS store_info ( | ||
content_type TEXT PRIMARY KEY, | ||
version TEXT NOT NULL | ||
)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it, but I'm not sure why that timestamp would be useful and how it will be used.
trin-storage/src/versioned/mod.rs
Outdated
#[strum(serialize_all = "snake_case")] | ||
pub enum StoreVersion { | ||
LegacyContentData, | ||
IdIndexed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that different content types can be stored and potentially have completely different scheme that are not compatible and transitions are not possible (e.g current content_data
and lc_updatetables, or MasterAccumulator as another
StoreVersion`). So they don't have to be just for the portal network types.
So versioning using numbers or v1,v2 ... wouldn't work (because it doesn't tell you for which type it is). but something like IdIndexedV1, IdIndexedV2, ... , LightClientUpdatesV1, LightClientUpdatesV2, ...
could.
I'm not set on the names and what they should be, but I would be happier if they would describe at least for what content type they are (and maybe how they are different from other versions).
Another ideas is that for each StoreVersion
, we should have exactly one implementation of the VersionedContentStore
. System should be flexible so that each implementation can do whatever it wants, even being completely in memory, or not using SQLite and using some other DB, or just use simple file (e.g. what MasterAccumulator is doing) or anything else.
In near future, we should have something like this:
pub enum ContentType {
Beacon,
History,
State,
LightClient, // maybe not the best name
}
pub enum StoreVersion {
IdIndexedLegacy, // current `content_data` table
IdIndexedV1, // upcoming new version
LightClientUpdatesV1, // current `lc_update` table
}
with following usages:
- HistoryStorage
IdIndexedLegacy
withContentType::History
(with transition toIdIndexedV0
when we think it's ready)
- BeaconStorage - two content stores
IdIndexedLegacy
withContentType::Beacon
(with transition toIdIndexedV0
when we think it's ready)LightClientUpdatesV1
withContentType::LightClient
.
- StateStorage
IdIndexedV0
withContentType::State
I don't understand how this migration system would work in practice. Here is how I imagine this: use rusqlite::{params, Connection, Result};
struct Migration {
version: i64,
up: &'static str,
}
const MIGRATIONS: &[Migration] = &[
Migration {
version: 1,
up: "CREATE TABLE IF NOT EXISTS content_store (version INTEGER PRIMARY KEY)",
},
Migration {
version: 2,
up: "CREATE TABLE IF NOT EXISTS table_name1 (...)",
},
Migration {
version: 3,
up: "modify content_store table (...)",
},
Migration {
version: 4,
up: "CREATE TABLE IF NOT EXISTS table_name2(...)",
},
// Add more migrations here as needed.
];
pub fn migrate(conn: &Connection) -> Result<()> {
let tx = conn.transaction()?;
tx.execute(
"CREATE TABLE IF NOT EXISTS content_store (version INTEGER PRIMARY KEY)",
params![],
)?;
let current_version: i64 = tx
.query_row("SELECT version FROM content_store ORDER BY version DESC LIMIT 1", params![], |row| row.get(0))
.optional()?
.unwrap_or(0);
for migration in MIGRATIONS {
if migration.version > current_version {
tx.execute(migration.up, params![])?;
tx.execute(
"INSERT INTO content_store (version) VALUES (?1)",
params![migration.version],
)?;
}
}
tx.commit()
} Maybe we are trying to accomplish a similar thing here but I'm not sure if I understand it. |
I don't think we are trying to accomplish similar things. In my model, you have to manually write transition from The way that I envisioned is that migration from one version to another can be quite complicated and not always possible just via SQL. For example, to migrate current
This process can take a long time, but I don't think that's the issue as it would happen on start. The only issue is that if program is stopped mid transition (but I think we can deal with that if we make sure new table is empty when we start). Also, as I said in the other comment, Version doesn't mean that it's used only for one type (regular content key/value data). Storing LC_Updates can/should be one store Version as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Copy, Clone, Debug, Display, Eq, PartialEq, EnumString, AsRefStr, EnumIter)] | ||
#[strum(serialize_all = "snake_case")] | ||
pub enum ContentType { | ||
Beacon, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently store four types for beacon:
LightCllentBootstrap
(incontent_data
table, but we will need to purge any values older than the subjectivity period (a few months))LightClientUpdate
(inlc_update
table)LightClientOptimisticUpdate/LightClientFIialityUpdate
(those are stored in a cache, because we keep only the latest values).
match old_version { | ||
Some(old_version) => { | ||
// Migrate if version doesn't match | ||
if S::version() != old_version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to migrate only from an older version to the new version, i.e.
if S::version() != old_version { | |
if S::version() > old_version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versions are not strictly ordered. In this context, the old_version
should be interpreted as "previous version".
If someone tries to migrate from newer to the older version, the migrate_from
function should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Excited to see how this works out in state network-land
CREATE TABLE IF NOT EXISTS store_info ( | ||
content_type TEXT PRIMARY KEY, | ||
version TEXT NOT NULL | ||
)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a great reason... Maybe just to check when the db was last migrated? But I'm not sure that's a strong reason, since the version to which it was last migrated is already available, and that's the important information. So feel free to disregard
What was wrong?
Our storage system needs update. See #1157 for details.
How was it fixed?
Created core types for the new versioned storage system:
ContentType
- the type of content that is storedStoreVersion
- the version of the store usedVersionedContentStore
- the trait for the actual content storeVersionedContentStore
for eachStoreVersion
(ideally it should be exactly one, but we are fine missing some implementations until we migrate all existing implementations)MemoryContentStore
- in memory implementation of theVersionedContentStore
create_store
function - creates instance of theVersionedContentStore
and migrates from previous version (if needed and migration is implemented)It's worth noting that one network might want to use multiple
VersionedContentStore
(for example, this might be the case for beacon), in which case it will need some custom logic on top of it.New schema (different table schema and different prune logic) will be done in the followup PR.
To-Do