-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, cli: support basic auto complete for sql keywords #72925
Conversation
d83715b
to
1a7f792
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @RichardJCai)
pkg/sql/delegate/show_completions.go, line 23 at r1 (raw file):
return nil, err } tableNames, err := d.catalog.GetAllTableNames(d.ctx)
Looking up all of the table names before each completion attempt is quite expensive. Could this be done on-demand instead?
pkg/sql/logictest/testdata/logic_test/show_completions, line 16 at r1 (raw file):
query T show completions at offset 1 'select 1'
This seems a bit strange to me. I'd expect the completion at offset 1 to be SELECT
because the word at offset 1 is select.
pkg/sql/parser/show_completions.go, line 11 at r1 (raw file):
func init() { sort.Strings(sqlTokens)
This is mutating sqlToknames
because the declaration of sqlTokens
above is aliasing sqlToknames
. Also, I didn't think you want to use sqlToknames
. Rather, I think you want to use lexbase.KeywordNames
. The token names are not the same as the keyword names. For example, AS_LA
is in sqlToknames
, but that isn't actually a keyword. It is just the AS
keyword with a special bit of lexical lookahead to avoid parser ambiguities.
pkg/sql/parser/show_completions.go, line 14 at r1 (raw file):
} func RunShowCompletions(stmt string, offset int, tableNames []string) []string {
It feels odd for the implementation to be in the sql/parser
package. Was there something motivating put it here?
pkg/sql/parser/show_completions.go, line 72 at r1 (raw file):
// * getLastWordFromStmt("SELECT * FROM movr.rid") -> "rid" // * getLastWordFromStmt("SELECT * FROM t@prim") -> "prim" func getLastWordFromStmt(stmt string) string {
I suspect you should be lexing the string (using parser.Scanner
) rather than doing this ad-hoc splitting of it into words. That will handle cases like comments, and generate white space.
pkg/sql/sem/tree/show.go, line 899 at r1 (raw file):
ctx.WriteString(" ") ctx.WriteString("'") ctx.WriteString(s.Statement)
What if s.Statement
contains single quotes? I suspect there must be a function to format a string and do escape if necessary.
1a7f792
to
2329470
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/sql/delegate/show_completions.go, line 23 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Looking up all of the table names before each completion attempt is quite expensive. Could this be done on-demand instead?
Done.
pkg/sql/logictest/testdata/logic_test/show_completions, line 16 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This seems a bit strange to me. I'd expect the completion at offset 1 to be
SELECT
because the word at offset 1 is select.
Would you expect only SELECT
to be returned? Previously it just considered every SQL keyword that had the prefix "S".
After my latest update it's a little bit smarter and detects that there is no token before S
and returns the "common sql commands" starting with S.
SELECT, SET, SHOW and START
pressing tab to autocomplete here would return SELECT
but only because it happens to be alphabetically before the other 3. I suppose we'll want to add another heuristic here, maybe last used?
pkg/sql/sem/tree/show.go, line 899 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
What if
s.Statement
contains single quotes? I suspect there must be a function to format a string and do escape if necessary.
Fixed
pkg/sql/parser/show_completions.go, line 11 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This is mutating
sqlToknames
because the declaration ofsqlTokens
above is aliasingsqlToknames
. Also, I didn't think you want to usesqlToknames
. Rather, I think you want to uselexbase.KeywordNames
. The token names are not the same as the keyword names. For example,AS_LA
is insqlToknames
, but that isn't actually a keyword. It is just theAS
keyword with a special bit of lexical lookahead to avoid parser ambiguities.
Good call, lexbase.KeywordNames
seems rights
pkg/sql/parser/show_completions.go, line 14 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
It feels odd for the implementation to be in the
sql/parser
package. Was there something motivating put it here?
It probably shouldn't be there, took inspiration from show_syntax
. Moved this logic into delegate
pkg/sql/parser/show_completions.go, line 72 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I suspect you should be lexing the string (using
parser.Scanner
) rather than doing this ad-hoc splitting of it into words. That will handle cases like comments, and generate white space.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, and @RichardJCai)
pkg/sql/delegate/show_completions_test.go, line 55 at r3 (raw file):
testCat.SetAllTableNames([]string{"abc", "test_table_1", "test_table_2"}) for _, tc := range tests { offset := tc.offset
You're never setting tc.offset
to a non-zero value in any of these tests. Perhaps remove that field? Or add more test cases?
pkg/sql/logictest/testdata/logic_test/show_completions, line 16 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
Would you expect only
SELECT
to be returned? Previously it just considered every SQL keyword that had the prefix "S".
After my latest update it's a little bit smarter and detects that there is no token beforeS
and returns the "common sql commands" starting with S.
SELECT, SET, SHOW and START
pressing tab to autocomplete here would returnSELECT
but only because it happens to be alphabetically before the other 3. I suppose we'll want to add another heuristic here, maybe last used?
Yes, I'd expect only SELECT
to be returned. If the cursor is within a word, I'd expect the completions returned to be for the word underneath the cursor position, not for the prefix of the word before the cursor position. I realize this is a bit different than the behavior of tab-completion in shells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so for keyword completion it's not a bad idea, even though I wonder what you think of this comment:
#45186 (comment)
And also what you think should happen if the user asks for completion after a space, when the keyword is not know already - usually they'd like to see what keywords are accepted after that position, which is what the help texts and token were meant to help with (would it be possible for showCompletions
to fall back to that if the user asks for completions after a space?)
For table names, I'm really not convinced.
- The mechanism as proposed here makes nonsensical suggestions for many uses of SQL FROM, especially those where the keyword "TABLE" is mandatory before the table identifier.
- It also misses on INSERT and UPDATE, which arguably are just as frequent as DELETE and SELECT, as well as many other places where table names are expected. Users are going to get confused about why they get table names in some cases but not others.
I wonder if we wouldn't get better leverage with this idea:
- create a weird-looking keyword or token, which is guaranteed to never be valid SQL, let's call it the ASK_TABLE_NAME token
- change the
simple_db_object_name
andcomplex_db_object_name
as follows:
simple_db_object_name:
db_object_name_component
{
aIdx := sqllex.(*lexer).NewAnnotation()
res, err := tree.NewUnresolvedObjectName(1, [3]string{$1}, aIdx)
if err != nil { return setErr(sqllex, err) }
$$.val = res
}
| ASK_TABLE_NAME
{
return setErr(sqllex, makeSpecialCompletionError("SIMPLE"))
}
...
complex_db_object_name:
db_object_name_component '.' unrestricted_name
{ ... }
| db_object_name_component '.' unrestricted_name '.' unrestricted_name
{ ... }
| db_object_name_component '.' ASK_TABLE_NAME
{
return setErr(sqllex, makeSpecialCompletionError("TWOPART", $1))
}
| db_object_name_component '.' unrestricted_name '.' ASK_TABLE_NAME
{
return setErr(sqllex, makeSpecialCompletionError("THREEPART", $1, $2))
}
Then in your showCompletions
function:
-
if the offset points to the end of an identifier (but not after a space), erase the identifier and replace it by the "ask table name" token
-
if the offset points to after a space, add the "ask table name" token
-
run the
parse
function, and then inspect the result:-
when a special completion error object is returned choose how to complete a table name:
- for SIMPLE completion errors, list tables in current db + DBs visible to the user followed by a dot
- for TWOPART errors, list tables in given schema in current db + schemas in given db + tables in given db in current schema
- for THREEPART, list tables in given db/schema
-
if another error is returned (not a special completion error), this means that there was no expectation of a table name in that position. Then:
- if the offset was pointing to the end of an identifer (not after a space), propose completions using a keyword
- if the offset was after a space, try again by adding the help token
??
-
Reviewed 7 of 13 files at r1, 8 of 8 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @RichardJCai)
-- commits, line 10 at r2:
For the following statements, the heuristic will provide confusing results:
COPY FROM PARENT
SCATTER FROM <expr>
RESTORE FROM <storage>
IMPORT FROM <format>
CREATE STATISTICS FROM <cols>
REVOKE FROM <users>
SHOW {RANGE|PARTITIONS} FROM {DATABASE|INDEX|TABLE} <name>
SHOW REGIONS FROM {CLUSTER|DATABASE|ALL}...
SHOW FINGERPRINTS FROM TABLE <name>
ROWS FROM (<texpr>)
a IS DISTINCT FROM b
EXTRACT (YEAR FROM v)
For the following statements, the heuristc doesnt help:
UPDATE <table>
INSERT INTO <table>
BACKUP <table>
SHOW CREATE <table>
CREATE TABLE ... LIKE <table>
I think all this needs to be explained in the release note if we continue with this approach.
Is there no way to recognize positions where table_name
is accepted, instead of using a keyword?
pkg/sql/opt_catalog.go, line 225 at r2 (raw file):
} func (oc *optCatalog) GetAllTableNames(ctx context.Context) ([]string, error) {
-
I recommend using a LIMIT clause here
-
at least use a bytes monitor to ensure the memory usage gets accounted
pkg/sql/delegate/show_completions.go, line 40 at r2 (raw file):
var query bytes.Buffer fmt.Fprintf( &query, "SELECT @1 AS %s FROM (VALUES ",
What's going on here? Why not fmt.Fprint(&query, "SELECT @1 AS completions FROM (VALUES ")
?
pkg/sql/delegate/show_completions.go, line 59 at r2 (raw file):
// copied from https://github.com/xo/usql/blob/master/drivers/completer/completer.go // need to add license if this actually goes i var commonSqlStartCommands = []string{
What's this? Can't we get that list from the sql grammar automatically instead?
Many of these are not supported in postgres anyway.
pkg/sql/delegate/show_completions.go, line 114 at r2 (raw file):
// If we're at the whitespace, we do not want to return completion // recommendations for "SELECT". if unicode.IsSpace(rune(stmt[offset-1])) {
That's not the right way to transform an offset to a string position. Generally, if you're expecting the statement to be a UTF-8 string, you cannot use string indexing with foo[...]
to access individual runes given an offset. Instead, you have to scan the string from the start, either with a go loop for i, c := range str
or iterate with while
and utf8.DecodeXXX
.
ditto everywhere below.
(This will need tests that demonstrate an offset positioned after a unicode multi-byte character.)
pkg/sql/parser/sql.y, line 5581 at r2 (raw file):
show_completions_stmt: SHOW COMPLETIONS AT OFFSET ICONST SCONST
plz add a "FOR" in between the offset and the statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking a lot about tab completion over the past few weeks. I see two main technical decisions to make: 1) where does tab completion occur (on the client or the server), and 2) what are the tab completion heuristics. Spelunking the https://github.com/xo/usql code I've discovered that it uses a series of ad-hoc heuristics to decide when to complete table names. This is essentially the same approach used by psql, though psql has many many (many!) more heuristics (encoded in a gigantic ~3000k line function). What Richard has done here is baby steps in that direction of ad-hoc heuristics. Not terribly principled, but also trodding a well worn path.
Separately, I've been exploring what we can do in generating heuristics from the grammar. Your idea of demarcating the names is interesting, though I'm not sure if that is even necessary. I think generating the heuristics from the grammar should be our ultimate goal, but I think it is somewhat tangential to this PR which, to my mind, is focused on getting the structure of doing tab completions on the server set up.
run the parse function, and then inspect the result:
Trying to parse the SQL in order to provide completions is problematic as the SQL may not be valid when we want to perform completions. On the other hand, if we want to provide context-sensitive completion of column names we have to know what tables and table aliases are visible which involves the semantic analysis in the optbuilder step. My intuition right now, though, is that we can generate completion heuristics from the grammar that are similar to what psql/usql perform.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @RichardJCai)
It may not be valid after the completion point but if it's valid enough before it, this algorithm would work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For table names, I'm really not convinced.
The mechanism as proposed here makes nonsensical suggestions for many uses of SQL FROM, especially those where the keyword "TABLE" is mandatory before the table identifier.
It also misses on INSERT and UPDATE, which arguably are just as frequent as DELETE and SELECT, as well as many other places where table names are expected. Users are going to get confused about why they get table names in some cases but not others.
I'm thinking we can just ignore adding tab complete for table names and stick just to SQL keywords for now. We can choose to add heuristics later once we have a better grasp on what we actually want to do there.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @RichardJCai)
Possibly. I'd love to figure out if that is true. Anecdotally, I sometimes write |
I don't know if it's strange. But if you discovered experimentally that the completion thing works with a valid prefix, you'd be quick to transition to a new style, using e.g. |
Haven't had the bandwidth to work on this recently but I'm planning to update this to support barebones completion for just SQL keywords. |
49212d4
to
befcb1d
Compare
befcb1d
to
fde4512
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this PR to handle just simple SQL keyword autocompletion for now. PTAL, also planning to demo this in the breather week meeting tomorrow to gauge interest.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)
Previously, knz (kena) wrote…
For the following statements, the heuristic will provide confusing results:
COPY FROM PARENT
SCATTER FROM <expr>
RESTORE FROM <storage>
IMPORT FROM <format>
CREATE STATISTICS FROM <cols>
REVOKE FROM <users>
SHOW {RANGE|PARTITIONS} FROM {DATABASE|INDEX|TABLE} <name>
SHOW REGIONS FROM {CLUSTER|DATABASE|ALL}...
SHOW FINGERPRINTS FROM TABLE <name>
ROWS FROM (<texpr>)
a IS DISTINCT FROM b
EXTRACT (YEAR FROM v)
For the following statements, the heuristc doesnt help:
UPDATE <table>
INSERT INTO <table>
BACKUP <table>
SHOW CREATE <table>
CREATE TABLE ... LIKE <table>
I think all this needs to be explained in the release note if we continue with this approach.
Is there no way to recognize positions where
table_name
is accepted, instead of using a keyword?
Removed this heuristic in this PR
pkg/sql/delegate/show_completions.go, line 40 at r2 (raw file):
Previously, knz (kena) wrote…
What's going on here? Why not
fmt.Fprint(&query, "SELECT @1 AS completions FROM (VALUES ")
?
Done
pkg/sql/delegate/show_completions.go, line 59 at r2 (raw file):
Previously, knz (kena) wrote…
What's this? Can't we get that list from the sql grammar automatically instead?
Many of these are not supported in postgres anyway.
Removed
pkg/sql/delegate/show_completions.go, line 114 at r2 (raw file):
Previously, knz (kena) wrote…
That's not the right way to transform an offset to a string position. Generally, if you're expecting the statement to be a UTF-8 string, you cannot use string indexing with
foo[...]
to access individual runes given an offset. Instead, you have to scan the string from the start, either with a go loopfor i, c := range str
or iterate withwhile
andutf8.DecodeXXX
.ditto everywhere below.
(This will need tests that demonstrate an offset positioned after a unicode multi-byte character.)
Updated, I think converting the string to a []rune works in handling utf8 here, PTAL.
Also added some tests with UTF-8 strings.
pkg/sql/parser/sql.y, line 5581 at r2 (raw file):
Previously, knz (kena) wrote…
plz add a "FOR" in between the offset and the statement.
Done.
pkg/sql/logictest/testdata/logic_test/show_completions, line 16 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Yes, I'd expect only
SELECT
to be returned. If the cursor is within a word, I'd expect the completions returned to be for the word underneath the cursor position, not for the prefix of the word before the cursor position. I realize this is a bit different than the behavior of tab-completion in shells.
Updated, however if the word was SELEC
instead, would you expect the completion at offset 1 to be SELEC
as well - that's how it looks after my current update.
pkg/sql/delegate/show_completions_test.go, line 55 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
You're never setting
tc.offset
to a non-zero value in any of these tests. Perhaps remove that field? Or add more test cases?
Added some more tests that test offset
pkg/sql/opt_catalog.go, line 225 at r2 (raw file):
Previously, knz (kena) wrote…
I recommend using a LIMIT clause here
at least use a bytes monitor to ensure the memory usage gets accounted
N/a, removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall! i wonder if we should make SHOW COMPLETIONS AT
an observer statement. (this way there's no issue with using it if the CLI has an open transaction)
// ObserverStatement is a marker interface for statements which are allowed to
// run regardless of the current transaction state: statements other than
// rollback are generally rejected if the session is in a failed transaction
// state, but it's convenient to allow some statements (e.g. "show syntax; set
// tracing").
// Such statements are not expected to modify the database, the transaction or
// session state (other than special cases such as enabling/disabling tracing).
//
// These statements short-circuit the regular execution - they don't get planned
// (there are no corresponding planNodes). The connExecutor recognizes them and
// handles them.
and also maybe it should implement HiddenFromShowQueries
fde4512
to
097909a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks good call. Updated to implement ObserverStatement
and HiddenFromShowQueries
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)
097909a
to
ccfbac8
Compare
Looking for another review on this for whoever's interested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! feel free to merge if you add the TestParseDataDriven tests
} | ||
|
||
// Format implements the NodeFormatter interface. | ||
func (s ShowCompletions) Format(ctx *FmtCtx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a few test cases in pkg/sql/parser/testdata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Release note (sql change): Support SHOW COMPLETIONS AT OFFSET <offset> FOR <stmt> syntax that returns a set of SQL keywords that can complete the keyword at <offset> in the given <stmt>. If the offset is in the middle of a word, then it returns the full word. For example SHOW COMPLETIONS AT OFFSET 1 FOR "SELECT" returns select.
ccfbac8
to
c1e1276
Compare
TFTR! bors r=rafiss |
bors r+ |
Seems like this disappeared from the bors queue again. bors r=rafiss |
---- | ||
SHOW COMPLETIONS AT OFFSET 10 FOR 'select * fro' -- normalized! | ||
SHOW COMPLETIONS AT OFFSET 10 FOR 'select * fro' -- fully parenthesized | ||
SHOW COMPLETIONS AT OFFSET 10 FOR 'select * fro' -- literals removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i would have expected "literals removed" to remove the parts in single quotes. do you know why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't update the formatter to remove literals, fixed.
bors r- |
Canceled. |
Release note (cli change): CLI now auto completes on tab by using `SHOW COMPLETIONS AT OFFSET`.
c1e1276
to
1e64f18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Reviewed 1 of 13 files at r1, 1 of 8 files at r2, 2 of 12 files at r8, 6 of 10 files at r9, 5 of 5 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @petermattis, @rafiss, and @RichardJCai)
bors r=rafiss |
Build succeeded: |
sql: add SHOW COMPLETIONS AT offset FOR syntax
Release note (sql change): Support
SHOW COMPLETIONS AT OFFSET FOR syntax that
returns a set of SQL keywords that can complete the keyword at
in the given .
If the offset is in the middle of a word, then it returns the
full word.
For example SHOW COMPLETIONS AT OFFSET 1 FOR "SELECT" returns select.
cli: support autocomplete
Release note (cli change): CLI now auto completes on tab
by using
SHOW COMPLETIONS AT OFFSET
.