Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize filenames in MySQLHook #33328

Merged
merged 10 commits into from
Jan 3, 2024

Conversation

PApostol
Copy link
Contributor

This PR ensures filenames in MySQLHook are sanitized, so no arbitrary SQL execution could happen with e.g. filename = "myfile; SELECT * FROM DUAL". Closes #33283.

@eladkal
Copy link
Contributor

eladkal commented Aug 11, 2023

Can you please amend the commit message to a meaningful one? We generate release notes based on commit message thus it's important

@eladkal eladkal requested a review from potiuk August 11, 2023 20:38
@PApostol PApostol force-pushed the sanitize-MySQLHook-filename branch from 99867c1 to 72343e1 Compare August 11, 2023 20:42
@PApostol
Copy link
Contributor Author

Can you please amend the commit message to a meaningful one? We generate release notes based on commit message thus it's important

Done, sorry, I must have replaced the main commit message accidentally.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the linked issue, I see:

fillename = " file; SELECT * FROM DUAL" 

Does your change remove the leading space? If not, could you add a strip on re.sub result?

@PApostol
Copy link
Contributor Author

In the linked issue, I see:

fillename = " file; SELECT * FROM DUAL" 

Does your change remove the leading space? If not, could you add a strip on re.sub result?

The change doesn't remove leading spaces. I presumed the issue was with SQL commands following any ; - not sure if leading spaces cause any problems here. Happy to include a strip() though!

@potiuk
Copy link
Member

potiuk commented Aug 12, 2023

In the linked issue, I see:

fillename = " file; SELECT * FROM DUAL" 

Does your change remove the leading space? If not, could you add a strip on re.sub result?

The change doesn't remove leading spaces. I presumed the issue was with SQL commands following any ; - not sure if leading spaces cause any problems here. Happy to include a strip() though!

I think it's a little bit more complex though, sheer presence of ";" is not enough, it will be " ';" that will trigger it.

I am thinking that better solution will be to actually use prepared statement in this case. Not sure if that works for this command (but there is no reason it should not):

https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html

This is the ultimate way how you can avoid sql injection type of problems like this one.

@potiuk
Copy link
Member

potiuk commented Aug 12, 2023

Then you do not have sanitize file name at all. It's effectively mitigated by the driver parsing the statement first and adding filename later.

@PApostol
Copy link
Contributor Author

Regarding PREPARE statements, would changing

cur = conn.cursor()
cur.execute(
    f"""
    LOAD DATA LOCAL INFILE '{tmp_file}'
    INTO TABLE {table}
    """
)

into

cur = conn.cursor(prepared=True)
cur.execute(
    """
    LOAD DATA LOCAL INFILE '%s'
    INTO TABLE %s
    """,
    (tmp_file, table)
)

do the trick?

@potiuk
Copy link
Member

potiuk commented Aug 13, 2023

Very much so.

@PApostol
Copy link
Contributor Author

It looks like the above syntax doesn't sit well with MySQL 5.7!

@potiuk
Copy link
Member

potiuk commented Aug 14, 2023

Shall we wait :)?

https://endoflife.date/mysql

Ends in 2 months and 2 weeks (31 Oct 2023)

Actually I would be qute fine to make it MySQL 8 only feature in anticipation of us dropping support in provider. Users might still use older provoders if they want to keep 5.7.

@potiuk
Copy link
Member

potiuk commented Aug 14, 2023

One of the ways to do it is to make if (mysql 8 -> prepare, if not, don't) for now and then in 2 months we will drop it completely.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 29, 2023
@PApostol
Copy link
Contributor Author

I wonder is this PR can still be useful in 1 month when MySQL 5.7 reaches EOL?

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 30, 2023
@potiuk
Copy link
Member

potiuk commented Oct 8, 2023

yes

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 23, 2023
@PApostol
Copy link
Contributor Author

Perhaps this can be merged now?

@potiuk potiuk force-pushed the sanitize-MySQLHook-filename branch from 572e59c to af7693b Compare November 27, 2023 01:41
@potiuk
Copy link
Member

potiuk commented Nov 27, 2023

Rebased it. Generatlly when we have old PRs that (for whatever reason) has not been built for some time, rebasing is firt thing to do.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 28, 2023
@potiuk
Copy link
Member

potiuk commented Nov 29, 2023

There was an error that looked suspiciously related (MySQL test failed but also could be flake) I re-run it - if it happens again I think you will need to fix it @PApostol

@potiuk
Copy link
Member

potiuk commented Nov 29, 2023

Yep. it looks like:

Perhaps this can be merged now?

So ... no :) until it is fixed :)

@PApostol
Copy link
Contributor Author

PApostol commented Jan 3, 2024

Hello, I have updated the PR, tests seem to pass now. Perhaps it can be reviewed?

@potiuk potiuk merged commit 3598a52 into apache:main Jan 3, 2024
@potiuk
Copy link
Member

potiuk commented Jan 3, 2024

Cool. Thanks for fixing it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sanitize filename of the MySQLHook bulk_load parameter
4 participants