-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cancel db query on stop #15403
Changes from 13 commits
c6ec8e8
822e2f3
1f4df58
6d20937
cda7c0e
efb86e5
0f91637
54da5e4
0f49471
ec39477
4706b43
ef8436b
bfc8d60
e1fc4b3
e5223ec
78bb78d
517c792
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1304,6 +1304,30 @@ def get_column_spec( | |
) | ||
return None | ||
|
||
@classmethod | ||
def get_cancel_query_id(cls, cursor: Any, query: Query) -> Optional[str]: | ||
""" | ||
Select identifiers from the database engine that uniquely identifies the | ||
queries to cancel. The identifier is typically a session id, process id | ||
or similar. | ||
|
||
:param cursor: Cursor instance in which the query will be executed | ||
:param query: Query instance | ||
:return: Query identifier | ||
""" | ||
return None | ||
|
||
@classmethod | ||
def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to return a boolean indicating if the query was canceled? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed by 517c792 |
||
""" | ||
Cancel query in the underlying database. | ||
|
||
:param cursor: New cursor instance to the db of the query | ||
:param query: Query instance | ||
:param cancel_query_id: Value returned by get_cancel_query_payload or set in | ||
other life-cycle methods of the query | ||
""" | ||
betodealmeida marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
# schema for adding a database by providing parameters instead of the | ||
# full SQLAlchemy URI | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
|
||
from superset.db_engine_specs.postgres import PostgresBaseEngineSpec | ||
from superset.errors import SupersetErrorType | ||
from superset.models.sql_lab import Query | ||
from superset.utils import core as utils | ||
|
||
if TYPE_CHECKING: | ||
|
@@ -128,3 +129,28 @@ def mutate_db_for_connection_test(database: "Database") -> None: | |
engine_params["connect_args"] = connect_args | ||
extra["engine_params"] = engine_params | ||
database.extra = json.dumps(extra) | ||
|
||
@classmethod | ||
def get_cancel_query_id(cls, cursor: Any, query: Query) -> Optional[str]: | ||
""" | ||
Get Snowflake session ID that will be used to cancel all other running | ||
queries in the same session. | ||
|
||
:param cursor: Cursor instance in which the query will be executed | ||
:param query: Query instance | ||
:return: Snowflake Session ID | ||
""" | ||
cursor.execute("SELECT CURRENT_SESSION()") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @koszti does this work? The reason I ask is per the documentation, CURRENT_SESSION
yet the connection associated with said cursor doesn't not the same connection which instantiated the query (in all likelihood for async queries that connection resides on a Celery host). Thus I would suspect there are actually no queries running within the current session. The same is true for the other engines. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The query that you're referring to is running when the query initiated and not when the stop button pressed. When the stop button pressed, we get the saved session id from the It's tested on Postgres, MySQL and Snowflake engines with sync and async query mode. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies. I misread the code logic. I originally thought the new cursor which is used to stop the query was being used to obtain the session identifier. |
||
row = cursor.fetchone() | ||
return row[0] | ||
|
||
@classmethod | ||
def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> None: | ||
""" | ||
Cancel query in the underlying database. | ||
|
||
:param cursor: New cursor instance to the db of the query | ||
:param query: Query instance | ||
:param cancel_query_id: Snowflake Session ID | ||
""" | ||
cursor.execute(f"SELECT SYSTEM$CANCEL_ALL_QUERIES({cancel_query_id})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question about this: can we make sure that we stop only queries when the DB is in sync mode? if the DB is in async mode the expectation I don't think we should ever cancel the query by navigating away from the browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it at least configurable? I’m happy to go for the extra mile if needed and add a new extra param to the db config if no objections. We really need to cancel all db queries both in sync and async modes.
This is because we’d like to go with async mode to utilise the celery workers but we still expect a lot of accidental cross joins, missing WHERE clauses and unwanted long running queries. In this case people will just close the browser and that’s making other people impossible running new queries. DBs like snowflake start queuing queries very quickly and unwanted long running queries easily could block using the entire db for long hours. Or DBAs have to kill unwanted queries manually.
Furthermore if it works only in sync mode then the stop button is basically doing nothing in async mode with non hive/presto databases.
Let me know your thoughts 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, if there's a legit use case let's make it configurable. I like the idea of having a per-DB option, this way for Hive we could have it off (since queries can take hours), and for Postgres (say) we could have it on.
@yousoph, any thoughts here?
Oh, sorry if I wasn't clear, I meant stopping a query by navigating away. If we press the stop button we should still cancel async queries, regardless of the configuration flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now configurable at
Database -> Performance -> Cancel query on window unload event
and defaults to False to keep the current behaviour. Commit: e5223ec