Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parametrized queries #113

Closed
foobuzz opened this issue Mar 29, 2018 · 6 comments · Fixed by #201
Closed

Add support for parametrized queries #113

foobuzz opened this issue Mar 29, 2018 · 6 comments · Fixed by #201
Assignees

Comments

@foobuzz
Copy link

foobuzz commented Mar 29, 2018

Hello,

Props for this wonderful tool!

My understanding is that the end product for the user is always the SQL string as returned by calling str on a Query object. Such string can then be passed to whatever database interface the user is using. However, SQL interfaces also accept so-called parametrized queries, where the input isn't a single SQL string, but rather a string containing placeholders alongside a tuple containing values to replace the placeholders with. Parameterized queries are very important as they're the only working solution against SQL injection.

The interface could be improved such that the Query object would have two properties statement and values corresponding respectively to the parametrized statement and the values for this statement. The behavior of str would still be relevant as it's always useful to know what a query looks like even when parametrized, although its output should be documented as insecure with unstrusted data.

@foobuzz foobuzz changed the title Add support for parametrized query Add support for parametrized queries Mar 29, 2018
@twheys
Copy link
Contributor

twheys commented Apr 1, 2018

Thanks for raising this ticket. Can definitely do something like that in the near future.

@twheys twheys self-assigned this Apr 1, 2018
@butla
Copy link

butla commented Aug 9, 2018

@foobuzz Can you point to some articles about parametrization being the only working anti-injection solution? I started using pypika recently and I was wondering about the threat of injection. I might not be a skilled attacker, but my simplistic test leads me to think that I'm not in danger right now:

>>> t = Table('the_table')
>>> param_taken_from_the_outside_world = 'John\'; DROP TABLE the_table'
>>> param_taken_from_the_outside_world
"John'; DROP TABLE the_table"
>>> q = Query.from_(t).select('*').where(t.name == param_taken_from_the_outside_world)
>>> q
SELECT * FROM "the_table" WHERE "name"='John''; DROP TABLE the_table'

@twheys Did somebody succesfully inject SQL into pypika params?

@foobuzz
Copy link
Author

foobuzz commented Aug 14, 2018

@butla Some databases like MySQL or Postgre still accept \' to escape the single quote. Therefore we can perform an injection with the following:

>>> unsafe = "John\\'; DROP TABLE the_table --"
>>> print(unsafe)
John\'; DROP TABLE the_table --
>>> q = Query.from_(t).select('*').where(t.name == unsafe)
>>> q
SELECT * FROM "the_table" WHERE "name"='John\''; DROP TABLE the_table --'

The term 'John\'' will be understood by vulnerable databases as a literal for John'. Note that the final quote is ignored because of a comment terminating the statement.

Additionally some database may use pre-defined conversions from Unicode to ASCII, e.g. converting to ', which wouldn't be spotted by pypika.

The problem here is that pypika is just the tool that produces the query, and the robustness of a query against SQL injection is highly dependent on how the query will be handled by the database, which pypika is totally decoupled from. Parametrized queries are an interface exposed by databases themselves, therefore ensuring their security. It's just that they expect not only a string but also a tuple (of values) as input, making pypika not compatible with such interface.

@butla
Copy link

butla commented Aug 20, 2018

@foobuzz I know about parametrized queries and I know why they should be used. Still, I'd like to know what exactly is the risk when using Pypika. And the example you've provided fails to drop the table for me.

Using psycopg2 to run the query, using Postgres from citusdata/citus:7.0.3 Docker image.

@foobuzz
Copy link
Author

foobuzz commented Aug 20, 2018

@butla My example comes from https://security.stackexchange.com/a/108490 which claims that it works against a MySQL database.

This was referenced Oct 22, 2018
@grigi
Copy link
Contributor

grigi commented Jan 7, 2019

I'm looking at adding parametrised query support.
My planned implementation is something along the lines of:

Instead of calling query.get_sql() (or str(query)) instead add a query.get_parametrised_sql() which would return Tuple[str, List], and leaves PyPika to return the objects in the right order, so we can just do an execute(sql, *params)

The reason is that the way PyPika is used (in the monoid pattern) order of parameters is only determined at evaluation time, and is dependant on the dialect.
I'd rather not duplicate that logic, prone to forward compatibility issues etc...
We hand the objects to pypika, and it hands them back to us in the right order with the equivalent sql.

Is there anything I'm overlooking @twheys ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants