Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data retrieval failure in batch_info due to difference between the precision of the search key and the precision of the stored key #382

Open
kimhanbeom opened this issue Dec 16, 2024 · 3 comments

Comments

@kimhanbeom
Copy link
Contributor

kimhanbeom commented Dec 16, 2024

Related issue: QA/QC communication repository's issue 115

  • Currently, when viewing rounds by cluster in the UI, the data is not displaying correctly.
    image
  • This issue is caused by the following reason
    • To get the rounds by cluster, the DB is queried in the following order. (roundsByCluster GraphQL API)
      • Get model_id(i32) and batch_ts list(Vec<NaiveDateTime>) from the column_description table in postgres.
      • Get a Vec<ModelBatchInfo> from rocksdb's batch_info cf with model_id, batch_ts as keys.
    • One of the columns in the column_description table, batch_ts, is using the timestamp without time zone type and is stored with a precision down to micro second.
    • The keys used when inserting with batch_info cf are the model_id and the id (time of the event), and in the case of id, a timestamp with a nano second value.
    • That's why we can't find the value by looking up the batch_info cf with batch_ts as the key.
@kimhanbeom kimhanbeom changed the title Data retrieval failure in batch_info due to mismatch between store key and search key types. Data retrieval failure in batch_info due to mismatch between store key and search key types Dec 16, 2024
@kimhanbeom kimhanbeom changed the title Data retrieval failure in batch_info due to mismatch between store key and search key types Data retrieval failure in batch_info due to difference between the precision of the search key and the precision of the stored key Dec 16, 2024
@kimhanbeom
Copy link
Contributor Author

I believe this issue can be resolved by either adding a new column or changing the type of the existing column (batch_ts). If anyone has a better solution, please feel free to share it by commenting on this issue.

@sehkone
Copy link
Contributor

sehkone commented Dec 17, 2024

I appreciate your investigation and suggestions. Based on what you mentioned, I think it would be better to change the current type so as to support nano seconds.

@MW-Kim
Copy link

MW-Kim commented Dec 18, 2024

@sehkone @syncpark @kimhanbeom
The batch_ts in the column_description table is of type timestamp without time zone, so it only stores values up to microseconds. The type of batch_ts can be changed to bigint to store values up to nanoseconds. This change requires the migration of existing data.
The migration process is expected to involve the following steps:

  • Step1 : Adding a new column
ALTER TABLE column_description 
ADD COLUMN batch_ts_bigint bigint;
  • Step2 : Converting and transferring data
UPDATE column_description 
SET batch_ts_bigint = EXTRACT(EPOCH FROM batch_ts) * 1000000000;
  • Step3 : Dropping the existing column and renaming
ALTER TABLE column_description 
DROP COLUMN batch_ts;

ALTER TABLE column_description 
RENAME COLUMN batch_ts_bigint TO batch_ts;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants