-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Workaround for Pandas.DataFrame.to_csv bug #28755
Conversation
64dec4b
to
ac5034e
Compare
superset/utils/csv.py
Outdated
@@ -49,10 +49,6 @@ def escape_value(value: str) -> str: | |||
is_negative_number = negative_number_re.match(value) is not None | |||
|
|||
if needs_escaping and not is_negative_number: | |||
# Escape pipe to be extra safe as this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled via the escapechar
character in the pandas.DataFrame.to_csv
method.
0bc246b
to
61123b1
Compare
csv_str = "\n".join([",".join(row) for row in csv_rows]) | ||
|
||
df = pd.read_csv(io.StringIO(csv_str)) | ||
df = pd.DataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to create a Pandas DataFrame
using data (as opposed to reading from a CSV file) in order to invoke the,
_csv.Error: need to escape, but no escapechar set
error. This resulted in different output and thus the tests needed to be updated.
df = pd.DataFrame( | ||
data={ | ||
"value": [ | ||
"a", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not sure why previously there were two columns of data. One should be suffice.
["'=func()"], | ||
["-10"], | ||
[r"'=cmd\\|' /C calc'!A0"], | ||
['"\'""""=b"'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous comment,
# pandas seems to be removing the leading ""
is no longer valid when creating the Pandas DataFrame
from data (as opposed to a CSV file). The extra "
are present because the "
is the quote variable.
["col_a"], | ||
["'=func()"], | ||
["-10"], | ||
[r"'=cmd\\|' /C calc'!A0"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now an extra \
. This is due to how the Pandas DataFrame
is composed as opposed to to inclusion of an escape character.
61123b1
to
22a28d5
Compare
22a28d5
to
ff488fd
Compare
csv_str = "\n".join([",".join(row) for row in csv_rows]) | ||
|
||
df = pd.read_csv(io.StringIO(csv_str)) | ||
df = pd.DataFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note these tests are unit as opposed to integration tests—no database dependency—and thus I opted to move the file.
(cherry picked from commit 6b016da)
SUMMARY
Per pandas-dev/pandas#47871, in Python 3.10+ (which is now the minimal supported Python version in Superset) there's a Pandas bug where
DataFrame.to_csv
unnecessarily requires one to specify anescapechar
even though—per here—special values have already been escaped.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Updated unit tests.
Also tested locally, i.e., I was able to successfully export a CSV file containing escaped special characters in Python 3.9 yet it throws in Python 3.10 with the following error,
unless an escape character is specified.
ADDITIONAL INFORMATION