Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by not copying Tokens as much #1558

Open
alamb opened this issue Nov 26, 2024 · 1 comment
Open

Improve performance by not copying Tokens as much #1558

alamb opened this issue Nov 26, 2024 · 1 comment

Comments

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

Part of #1557

While looking at the flamegraph posted to that ticket, one obvious source of improvement is to stop copying each token so much.

Functions like peek_token(), and expect_token() copy the Token (which often includes a string)

impl Parser {
    // returns an OWNED TokenWithLocation
    pub fn peek_token(&self) -> TokenWithLocation {
   ...
    }
}

For example
https://github.com/apache/datafusion-sqlparser-rs/blob/2e90e105a74bf9f50f2bad6c22992759ddb06880/src/parser/mod.rs#L3314-L3334

In the above code, the non_whitespace.cloned() call actually copies the token (and string) even when many callsites only need to check what it is and could get by with a &

Suggestions for improvements

The biggest suggestion is to stop copying each token unless it actually needed a clone. For example instead of this

impl Parser {
    // returns an OWNED TokenWithLocation
    pub fn peek_token(&self) -> TokenWithLocation {
   ...
    }
}

Make it like this

impl Parser {
    // returns an reference to TokenWithLocation
    // (the & means no copying!)
    pub fn peek_token_ref(&self) -> &TokenWithLocation {
   ...
    }
}

I think we could do this without massive breaking changes by adding new functions like peek_token_ref() that returned a reference to the token rather than a clone

Ideas

I played around with it a bit and here is what I came up with. I think a similar pattern could be applied to other places

impl Parser {
...
    /// Return the first non-whitespace token that has not yet been processed
    /// (or None if reached end-of-file) and mark it as processed. OK to call
    /// repeatedly after reaching EOF.
    pub fn next_token(&mut self) -> TokenWithLocation {
        self.next_token_ref().clone()
    }

    /// Return the first non-whitespace token that has not yet been processed
    /// (or None if reached end-of-file) and mark it as processed. OK to call
    /// repeatedly after reaching EOF.
    pub fn next_token_ref(&mut self) -> &TokenWithLocation {
        loop {
            self.index += 1;
            // skip whitespace
            if let Some(TokenWithLocation {
                token: Token::Whitespace(_),
                location: _,
            }) = self.tokens.get(self.index - 1) {
                continue
            }
            break;
        }
        if (self.index - 1) < self.tokens.len() {
            &self.tokens[self.index - 1]
        }
        else {
            eof_token()
        }
    }
...
}

static EOF_TOKEN: OnceLock<TokenWithLocation> = OnceLock::new();
fn eof_token() -> &'static TokenWithLocation {
    EOF_TOKEN.get_or_init(|| {
        TokenWithLocation::wrap(Token::EOF)
    })
}
@alamb
Copy link
Contributor Author

alamb commented Nov 26, 2024

I started hacking on this but ran out of time today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant