-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: allow choosing separator character and encoding for csv writing #3441
Conversation
superset/views/core.py
Outdated
@@ -2126,13 +2126,15 @@ def csv(self, client_id): | |||
columns = [c['name'] for c in obj['columns']] | |||
df = pd.DataFrame.from_records(obj['data'], columns=columns) | |||
logging.info("Using pandas to convert to CSV") | |||
csv = df.to_csv(index=False, encoding='utf-8') | |||
csv = df.to_csv(index=False, encoding=str(config.get('CSV_ENCODING')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why calling str() on the options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode
with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine then
superset/config.py
Outdated
@@ -172,6 +172,9 @@ | |||
ENABLE_CORS = False | |||
CORS_OPTIONS = {} | |||
|
|||
# CSV Options | |||
CSV_SEP = ',' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CSV has a ton of options and pandas does not support dialects i think a CSV_EXPORT dict would be more future proof:
CSV_EXPORT = {
'SEPARATOR': ',',
'ENCODING': 'utf-8',
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this looks better. I will implement it this way.
Just waiting until you tell what you prefer regarding the previous comment about str()
and I update my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking I can do something more generic to be able to replace any option of the to_csv method. I'm gonna change this and resubmit.
Coverage increased (+0.009%) to 69.127% when pulling 12c8ffbf86d7b3749376accb8ed89df532c60ecf on JulieRossi:master into 3dfdde1 on apache:master. |
Are csv exports covered by tests? If not would be nice to add them :) |
So both cases are already tested. |
Coverage increased (+0.005%) to 69.123% when pulling 2af618e27bdcc24863def4f1c896bc49c6d30e71 on JulieRossi:master into 3dfdde1 on apache:master. |
superset/views/core.py
Outdated
@@ -2126,13 +2126,15 @@ def csv(self, client_id): | |||
columns = [c['name'] for c in obj['columns']] | |||
df = pd.DataFrame.from_records(obj['data'], columns=columns) | |||
logging.info("Using pandas to convert to CSV") | |||
csv = df.to_csv(index=False, encoding='utf-8') | |||
csv = df.to_csv(index=False, encoding=str(config.get('CSV_EXPORT')['ENCODING']), | |||
sep=str(config.get('CSV_EXPORT')['SEPARATOR'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why casting to str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)
superset/views/core.py
Outdated
else: | ||
logging.info("Running a query to turn into CSV") | ||
sql = query.select_sql or query.executed_sql | ||
df = query.database.get_df(sql, query.schema) | ||
# TODO(bkyryliuk): add compression=gzip for big files. | ||
csv = df.to_csv(index=False, encoding='utf-8') | ||
csv = df.to_csv(index=False, encoding=str(config.get('CSV_EXPORT')['ENCODING']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why casting to str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)
superset/viz.py
Outdated
@@ -306,7 +306,8 @@ def data(self): | |||
def get_csv(self): | |||
df = self.get_df() | |||
include_index = not isinstance(df.index, pd.RangeIndex) | |||
return df.to_csv(index=include_index, encoding="utf-8") | |||
return df.to_csv(index=include_index, encoding=str(config.get('CSV_EXPORT')['ENCODING']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why casting to str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)
Is there a use case to not use |
Regarding the non utf-8 encoding and the delimiter, we have the same problem kylozw describes in #1519. I also commented (but probably not visible enough since it was just a reply to a former comment) that I was thinking I can do something more generic to be able to replace any option of the to_csv method. I'm gonna change this and resubmit. |
This PR adds two global variables in config.py,
CSV_SEP
andCSV_ENCODING
allowing to configure at application level, which separator character and which encoding should be used to write a csv file.Fixes #1519