-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: remove single quotes in index names when printing #60251
BUG: remove single quotes in index names when printing #60251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Whenever making changes to behavior, please add tests.
…hedataninja1786/pandas into bugfix--pprint-embedded-quotes
|
pandas/io/formats/printing.py
Outdated
def as_escaped_string( | ||
thing: Any, escape_chars: EscapeChars | None = escape_chars | ||
) -> str: | ||
translate = {"\t": r"\t", "\n": r"\n", "\r": r"\r"} | ||
translate = {"\t": r"\t", "\n": r"\n", "\r": r"\r", "'": ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will make the output here:
df = pd.DataFrame({"'a": [1], "b": [2]}).set_index(["'a", "b"])
print(df.index.names)
be ['a', 'b']
. That is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the desired behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index names must be accurate, the first being 'a
. I would think ['\'a', 'b']
.
(["'test\n'", "b"], ['test\n', 'b']) # Remove quotes, preserve newline | ||
] | ||
|
||
for input_names, expected_names in test_cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use pytest.mark.parametrize
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Will need to pass pre-commit.
https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit
Also needs a line in the whatsnew for v3.0.0.
(["'test\n'", "b"], ["\'test\n\'", "b"]) # Escape and preserve newline | ||
]) | ||
def test_formatted_index_names(input_names, expected_names): | ||
# Create DataFrame with specified index names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment - it repeats the code verbatim and so is not needed.
In addition, can you start the test with a comment referencing the issue:
# GH#60190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are there now?
…hedataninja1786/pandas into bugfix--pprint-embedded-quotes
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -663,9 +663,9 @@ Interval | |||
|
|||
Indexing | |||
^^^^^^^^ | |||
- Bug in :func:`pprint_thing` with ``quote_strings=True`` failing for strings with embedded single quotes (:issue:`60190`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pprint_thing
is an internal function, the whatsnew is for users of pandas. We need to report on the impact users will notice. Something like Bug in printing :class:`Index` names would not escape single quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes have been made
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -663,9 +663,9 @@ Interval | |||
|
|||
Indexing | |||
^^^^^^^^ | |||
- Bug in :class:`Index` printed names do not escape single quotes (:issue:`60190`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think :class:`Index` printed names
is clear, is there an objection to what I suggested in my previous comment? In any case, perhaps this is better.
Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
I did not realize that this can impact levels previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there was not, in any case I will be directly including your current suggestion since it also include levels :) I'll make another PR.
df = pd.DataFrame({name: [1, 2, 3] for name in input_names}).set_index(input_names) | ||
formatted_names = df.index.names | ||
|
||
assert formatted_names == expected_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test passes on the main branch, so is not testing your changes. You can do str(df.index.names)
and compare this to the expected value, e.g. ['\'a', 'b']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When comparing to str(df.index.names)
, you want the expected to be a single string, not a list of strings. But indeed, you need to escape the backslash: ['\\'a', 'b']
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my previous comment, you cannot compare single entries because with a backslash (this PR) and without a backslash (main) evaluate as equal. However if you convert the entire index.names
to a string, they do not.
input_names = ["'a", "b"]
df = pd.DataFrame({name: [1, 2, 3] for name in input_names}).set_index(input_names)
print(str(df.index.names) == "['\\'a', 'b']") # main
# False
print(str(df.index.names) == "['\\'a', 'b']") # PR
# True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think I'm with you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @thedataninja1786! |
np! |
pprint_thing(..., quote_strings=True)
fails for strings with embedded single-quotes #60190__str__
inFrozenList
(frompandas/core/indexes/frozen.py
) to escape single quotes and updatedpprint_thing
inpandas/io/formats/printing.py
to replace single quotes with an empty string.