Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system command to show #25844

Closed
pauldix opened this issue Jan 16, 2025 · 7 comments
Closed

Add system command to show #25844

pauldix opened this issue Jan 16, 2025 · 7 comments
Assignees
Labels

Comments

@pauldix
Copy link
Member

pauldix commented Jan 16, 2025

We expose all information about the server and its state through system tables. The user can execute queries from the command line to get at this, but it would be a nice convenience to have a command to quickly show the most recent records in any system table.

Something like:

influxdb3 show system parquet_files --limit=10

Limit should default to 100 so it is optional. The help should display what are valid names for the system table we're querying. Under the covers the CLI should just execute the query for the user.

This would likely be good enough to close out #25792

@pauldix pauldix added the v3 label Jan 16, 2025
@waynr waynr self-assigned this Jan 24, 2025
@waynr
Copy link
Contributor

waynr commented Jan 24, 2025

Okay I've started looking into how system tables are implemented to figure out how best to validate system table name arguments (ie the parquet_files argument in your example) and I think I can use some of the code in the system tables module to perform that validation, ie

pub const SYSTEM_SCHEMA_NAME: &str = "system";
pub const TABLE_NAME_PREDICATE: &str = "table_name";
pub(crate) const QUERIES_TABLE_NAME: &str = "queries";
pub(crate) const LAST_CACHES_TABLE_NAME: &str = "last_caches";
pub(crate) const DISTINCT_CACHES_TABLE_NAME: &str = "distinct_caches";
pub(crate) const PARQUET_FILES_TABLE_NAME: &str = "parquet_files";
const PROCESSING_ENGINE_PLUGINS_TABLE_NAME: &str = "processing_engine_plugins";
const PROCESSING_ENGINE_TRIGGERS_TABLE_NAME: &str = "processing_engine_triggers";

But now I'm wondering -- should this subcommand be constructing an SQL query and reaching out to another node's FlightSQL endpoint or should it be constructing its own system tables internally based on available object store connection parameters? I think with the latter approach there would probably be a lot of missing information since my rough understanding is that a lot of system table info lives in-memory only (at least, that's how it was in IOx).

So the former approach is probably preferable, right? But in that case it's probably better not to worry about using the system table names in code to validate CLI table name arguments.

@pauldix
Copy link
Member Author

pauldix commented Jan 24, 2025

The CLI should just be constructing a SQL query and running it through the front door. Can be FlightSQL or the HTTP API, whatever's easiest. What system tables exist can be established by querying the DB through the front door using SHOW TABLES, but you probably don't need to do that. The set of system tables exist in the code (i.e. their schema) so you should be able to pull that in the build.

The thing we'd want the user to be able to see, either through the CLI help or through a quick command is what system tables are available to query (which can be established through a query)

@hiltontj
Copy link
Contributor

Currently, queries to a given system table go through the QueryExecutor::query_sql interface with queries like,

select * from system.parquet_files

The implementation of that is here in core.

For the CLI described in the issue, one way would be to compose the request to the /api/v3/query_sql endpoint, and that would just use the code mentioned above, or, a new HTTP endpoint could provide a different interface, e.g.,

GET /api/v3/system_tables/parquet_files?limit=10&table_name=foo[...]

the contents of which could be parsed in the API handler and used to construct a query and pass it in to the above interface.

I agree that an API to show available system tables would be nice, and we probably have the means to do that.

When a query comes in, the system table provider is constructed on the fly here:

let system_schema_provider = Arc::new(SystemSchemaProvider::AllSystemSchemaTables(
AllSystemSchemaTablesProvider::new(
Arc::clone(&db_schema),
Arc::clone(&self.query_log),
Arc::clone(&self.write_buffer),
Arc::clone(&self.sys_events_store),
),
));

We could have a new method on the QueryExecutor trait whose purpose is to show_system_tables that just constructs the provider to query its schema. There are similar methods dedicated, e.g., to show_databases, which is used directly by the GET /api/v3/configure/database API.

@pauldix
Copy link
Member Author

pauldix commented Jan 24, 2025

I'd hold off on creating APIs for these things and instead lean on using the query API and its results. I want to keep the HTTP API surface area as small as possible, which is why we have system tables for everything in the DB. The system command on the CLI is really just a convenience function around constructing SQL queries and displaying their results.

@waynr
Copy link
Contributor

waynr commented Jan 24, 2025

I'd hold off on creating APIs for these things and instead lean on using the query API and its results.

Yeah this is what I was leaning toward myself -- it should be more resilient to code changes like the addition or removal of system tables.

As far as the subcommand structure goes, I was thinking of something a little different than what's described in this issue:

influxdb3 system <command> [args]

Where <command> would have the following variants:

  • list - lists available system tables
  • get <system-table-name> [get-args] - performs the default (probably SELECT * FROM <system-table-name> LIMIT 100 query operation for the named system table
  • summary [summary-args] - perform a series of default queries across several system tables to display them all with one command
    • could introduce flags to generate different types of summaries for use in troubleshooting/support scenarios

@pauldix
Copy link
Member Author

pauldix commented Jan 24, 2025

Your suggested list looks good to me

@waynr
Copy link
Contributor

waynr commented Jan 29, 2025

Closing this issue after addressing in #25912

@waynr waynr closed this as completed Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants