Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: allow choosing separator character and encoding for csv writing #3441

Merged
merged 1 commit into from
Sep 15, 2017
Merged

Conversation

JulieRossi
Copy link
Contributor

This PR adds two global variables in config.py, CSV_SEP and CSV_ENCODING allowing to configure at application level, which separator character and which encoding should be used to write a csv file.

Fixes #1519

@@ -2126,13 +2126,15 @@ def csv(self, client_id):
columns = [c['name'] for c in obj['columns']]
df = pd.DataFrame.from_records(obj['data'], columns=columns)
logging.info("Using pandas to convert to CSV")
csv = df.to_csv(index=False, encoding='utf-8')
csv = df.to_csv(index=False, encoding=str(config.get('CSV_ENCODING')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why calling str() on the options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine then

@@ -172,6 +172,9 @@
ENABLE_CORS = False
CORS_OPTIONS = {}

# CSV Options
CSV_SEP = ','
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CSV has a ton of options and pandas does not support dialects i think a CSV_EXPORT dict would be more future proof:

CSV_EXPORT = {
    'SEPARATOR': ',',
    'ENCODING': 'utf-8',
}

Copy link
Contributor Author

@JulieRossi JulieRossi Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this looks better. I will implement it this way.

Just waiting until you tell what you prefer regarding the previous comment about str() and I update my PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was thinking I can do something more generic to be able to replace any option of the to_csv method. I'm gonna change this and resubmit.

@coveralls
Copy link

coveralls commented Sep 11, 2017

Coverage Status

Coverage increased (+0.009%) to 69.127% when pulling 12c8ffbf86d7b3749376accb8ed89df532c60ecf on JulieRossi:master into 3dfdde1 on apache:master.

@xrmx
Copy link
Contributor

xrmx commented Sep 11, 2017

Are csv exports covered by tests? If not would be nice to add them :)

@coveralls
Copy link

coveralls commented Sep 11, 2017

Coverage Status

Coverage increased (+0.005%) to 69.123% when pulling 2af618e27bdcc24863def4f1c896bc49c6d30e71 on JulieRossi:master into 3dfdde1 on apache:master.

@@ -2126,13 +2126,15 @@ def csv(self, client_id):
columns = [c['name'] for c in obj['columns']]
df = pd.DataFrame.from_records(obj['data'], columns=columns)
logging.info("Using pandas to convert to CSV")
csv = df.to_csv(index=False, encoding='utf-8')
csv = df.to_csv(index=False, encoding=str(config.get('CSV_EXPORT')['ENCODING']),
sep=str(config.get('CSV_EXPORT')['SEPARATOR']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why casting to str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)

else:
logging.info("Running a query to turn into CSV")
sql = query.select_sql or query.executed_sql
df = query.database.get_df(sql, query.schema)
# TODO(bkyryliuk): add compression=gzip for big files.
csv = df.to_csv(index=False, encoding='utf-8')
csv = df.to_csv(index=False, encoding=str(config.get('CSV_EXPORT')['ENCODING']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why casting to str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)

superset/viz.py Outdated
@@ -306,7 +306,8 @@ def data(self):
def get_csv(self):
df = self.get_df()
include_index = not isinstance(df.index, pd.RangeIndex)
return df.to_csv(index=include_index, encoding="utf-8")
return df.to_csv(index=include_index, encoding=str(config.get('CSV_EXPORT')['ENCODING']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why casting to str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-cast the string, otherwise I get TypeError: "delimiter" must be string, not unicode with python 2.x
If you have a better solution, I can change that (or maybe add comment to clarify inside the code ?)

@mistercrunch
Copy link
Member

Is there a use case to not use utf-8? Changing the delimiter globally seems like a bad idea as the C in CSV stands for comma...

@JulieRossi
Copy link
Contributor Author

JulieRossi commented Sep 12, 2017

Regarding the non utf-8 encoding and the delimiter, we have the same problem kylozw describes in #1519.
Our users are used to Excel so when they export csv the want to be able to easily open it with Excel (they are not advanced users and changing Excel default encoding and delimiter is beyond their knowledge).
In the french version, Excel expects latin1 encoding and ; as separator (also , as decimal separator instead of .). It seems that others have the same kind of issues (given #1519) so I thought issuing a PR was a better idea than just changing this locally. It would also be more convenient for us.

I also commented (but probably not visible enough since it was just a reply to a former comment) that I was thinking I can do something more generic to be able to replace any option of the to_csv method. I'm gonna change this and resubmit.

@coveralls
Copy link

coveralls commented Sep 12, 2017

Coverage Status

Coverage increased (+0.005%) to 69.124% when pulling dfed319 on JulieRossi:master into 3c0e85e on apache:master.

@coveralls
Copy link

coveralls commented Sep 12, 2017

Coverage Status

Coverage increased (+0.005%) to 69.124% when pulling dfed319 on JulieRossi:master into 3c0e85e on apache:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.005%) to 69.124% when pulling dfed319 on JulieRossi:master into 3c0e85e on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.005%) to 69.124% when pulling dfed319 on JulieRossi:master into 3c0e85e on apache:master.

@mistercrunch mistercrunch merged commit b90d8e3 into apache:master Sep 15, 2017
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choose encoding when export csv
4 participants