Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include table stats in DatabaseStats #398

Closed
casey opened this issue Oct 13, 2022 · 11 comments · Fixed by #723
Closed

Include table stats in DatabaseStats #398

casey opened this issue Oct 13, 2022 · 11 comments · Fixed by #723

Comments

@casey
Copy link
Contributor

casey commented Oct 13, 2022

I'm auditing space usage in the ord index, and having information about individual tables would be super useful. In particular, the total size of each table in bytes, and how many keys it holds.

@casey
Copy link
Contributor Author

casey commented Oct 14, 2022

Also, perhaps DatabaseStats could also check for space leaks, i.e. check that the number of pages reachable from current tables is the number of allocated pages, and include a statistic that had that information. The ordinals index is now a whopping 117G (last I checked it was 50G), so I'm curious if there's some kind of leak there.

@cberner
Copy link
Owner

cberner commented Oct 15, 2022

Would a separate method on Table to retrieve just that table's stats work for you? I'd rather not add it to DatabaseStats, because then it's size in memory would be unbounded.

Ya, a leak check makes sense. I've been planning to add a fsck() method that would also verify checksum integrity.

@casey
Copy link
Contributor Author

casey commented Oct 15, 2022

Would a separate method on Table to retrieve just that table's stats work for you? I'd rather not add it to DatabaseStats, because then it's size in memory would be unbounded.

Sort of! The issue is then, I don't have a way of getting information on all tables in the database. I know what tables I think are in the database, and I can open them one by one and get stats about them, but I also want to see if there are any tables in the database I don't know about, and get information on those.

ReadTransaction::list_tables exists, but it returns strings, which can't be used to open tables. So I could see if there were other tables in the database that I didn't have definitions for, but I couldn't get definitions for them to open them and get the stats.

Ya, a leak check makes sense. I've been planning to add a fsck() method that would also verify checksum integrity.

That would be rad. A fsck that reported any problems/leaks it encountered would be sweet.

@cberner
Copy link
Owner

cberner commented Oct 15, 2022

Hmm, ya I see. How would you end up with a table that you didn't know about? It's not exactly pretty, but to find out the type of a table you can just open it with two random types and you'll get an error message telling you the types:

"{} is of type Table<{}, {}> not Table<{}, {}>",

@casey
Copy link
Contributor Author

casey commented Oct 15, 2022

I think it could actually happen pretty easily. For example, we release a version which changes the name of a table, but forget to remove the table with the old name from the database.

One idea is to have a separate Database::table_stats function which either returns Vec<TableStats>, or a Iterator<Item = TableStats>.

@cberner
Copy link
Owner

cberner commented Oct 16, 2022

Hmm, ok lemme think this over. Perhaps I can change list_tables to return structs that have the name, and also allow you to fetch the stats.

Alternately, what if I change delete_table to take a &str instead of needing a TableDefinition? That way you could list all the tables, diff against the expected tables, and delete any unexpected ones without needing to know their type

@casey
Copy link
Contributor Author

casey commented Oct 17, 2022

Hmm, ok lemme think this over. Perhaps I can change list_tables to return structs that have the name, and also allow you to fetch the stats.

That sounds good.

Alternately, what if I change delete_table to take a &str instead of needing a TableDefinition? That way you could list all the tables, diff against the expected tables, and delete any unexpected ones without needing to know their type

That would be useful for cleaning up unexpected tables, but if we have unexpected tables, I'd also like to have stats (type, size, etc) to try figure out what they are and what they contain, instead of just blindly deleting them.

@cberner
Copy link
Owner

cberner commented Mar 27, 2023

@casey I'm finally getting around to this issue. Do you still need this API? I fixed the second issue (#542), and I could add an API like open_table_untyped() which takes a impl TableHandle. That would return an UntypedTable instead of a Table which would only let you call APIs (like .stats()) that don't need to know the K & V types.

@cberner
Copy link
Owner

cberner commented Apr 8, 2023

Documenting my proposed API, but am going to close this for now.

struct UntypedTable;

impl UntypedTable {
    pub fn stats(&self) -> Result<TableStats> {
      ... 
   }
}

impl ReadOnlyTransaction {
   pub fn open_table_untyped(&self, handle: impl TableHandle) -> Result<UntypedTable> {
      ...
   }
}

@cberner cberner closed this as completed Apr 8, 2023
@casey
Copy link
Contributor Author

casey commented Nov 21, 2023

Sorry for not following up on this! Yah, this would be great. I just updated ord index info command to return table info, and it's pretty heinous.

Also, the TableStats type isn't showing up in the docs. For example, here the TableStats type is showing up in grey. Kind of odd. It think maybe the type is pub but it isn't exported from the crate.

@cberner
Copy link
Owner

cberner commented Nov 24, 2023

Np! Can you give the linked PR a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants