Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] capitalize does not work correctly for the character ʼn . #8644

Closed
firestarman opened this issue Jul 2, 2021 · 0 comments · Fixed by #8647
Closed

[BUG] capitalize does not work correctly for the character ʼn . #8644

firestarman opened this issue Jul 2, 2021 · 0 comments · Fixed by #8647
Assignees
Labels
bug Something isn't working

Comments

@firestarman
Copy link
Contributor

firestarman commented Jul 2, 2021

Describe the bug
The upper case of the character ʼn should be ʼN, but getting only ʼ .

>>> import cudf as cu
>>> s = cu.Series(['\u0149s2', '\u02bc\u004eS2'])
>>> s
0     ʼns2
1    ʼNS2
dtype: object
>>> s.str.capitalize()
0     ʼs2
1    ʼns2
dtype: object

Expected behavior
The pandas capitalize works as expected.

>>> import pandas as pd
>>> s = pd.Series(['\u0149s2', '\u02bc\u004eS2'])
>>> s.str.capitalize()
0    ʼNs2
1    ʼns2
dtype: object

There is a smiliar bug #3132 being fixed, but seems only for the upper and lower.

@firestarman firestarman added bug Something isn't working Needs Triage Need team to review and classify labels Jul 2, 2021
@davidwendt davidwendt self-assigned this Jul 2, 2021
@rapids-bot rapids-bot bot closed this as completed in #8647 Jul 6, 2021
rapids-bot bot pushed a commit that referenced this issue Jul 6, 2021
Closes #8644 

Multi-character case conversion support added for strings `to_upper` and `to_lower` is reused for `capitalize` and `title` functions. For example, converting from a single character `ʼn` to its upper-case equivalent is actually two distinct characters `'N` (apostrophe and capital-N). This is different than conversion of a single multi-byte character to another single multi-byte character with different byte lengths. Here a single character is converted into two characters.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - Mark Harris (https://github.com/harrism)

URL: #8647
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants