Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 Ignore escaped characters #1422

Closed
wants to merge 4 commits into from

Conversation

akaszynski
Copy link

This take a stab at ignoring escaped characters. It's a basic (and probably inefficient) implementation, so if there's a better way to get this done, please let me know.

Also, the escape dictionary should be modifiable by the user.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a real code base this PR takes my run from 1.097 sec to 1.508 sec. If I change your function to:

    for key, rep in find_dict.items():
        text = text.replace(key, rep)
    return text

it only increases to 1.161 sec. So maybe regex is less than ideal here.

Can you check on PyVista and see if you observe the same?

@akaszynski
Copy link
Author

Same problem here. I've noticed that regex is almost always slower than the built-in replace.

@akaszynski
Copy link
Author

We're now using replace. As for making the substitution dictionary customizable, would you prefer to have a file containing the key/value pairs, or a string that they read in?

@larsoner
Copy link
Member

larsoner commented Feb 8, 2020

As for making the substitution dictionary customizable, would you prefer to have a file containing the key/value pairs, or a string that they read in?

I find putting characters like \n in the command line to be annoying (usually takes me multiple tries to get it right), so I would lean toward a file

@akaszynski
Copy link
Author

Agreed, I was encountering that issue as well when testing it out. I'm thinking simple key value pairs in a csv, where the pairs are separated by line breaks.

"\n", " "
"\'", "'"

@larsoner
Copy link
Member

larsoner commented Feb 8, 2020

Might make more sense to match the dictionary format

thing->replacement

@akaszynski
Copy link
Author

akaszynski commented Feb 8, 2020

As we’re going to have to use spaces, are quotes permitted in your dictionary format?

"\n"->" "
"\'"->"'"

@larsoner
Copy link
Member

I would just make it that anything to the left of -> is the substring to look for, and anything to the right is the replacement. So:

\n-> 
\'->

@akaszynski
Copy link
Author

Should be good to review now.

codespell_lib/_codespell.py Outdated Show resolved Hide resolved
@larsoner
Copy link
Member

Looks good. One last thing: we probably want to provide and use a default file, just like we do for the dictionary. Maybe it should just contain \n-> to start.

@akaszynski
Copy link
Author

Looks good. One last thing: we probably want to provide and use a default file, just like we do for the dictionary. Maybe it should just contain \n-> to start.

Perhaps \'->' as well.

@larsoner
Copy link
Member

Fine with me

@@ -232,6 +232,10 @@ def parse_options(args):
help='Comma separated list of words to be ignored '
'by codespell. Words are case sensitive based on '
'how they are written in the dictionary file')
parser.add_argument('-P', '--sub-pairs', type=str, metavar='FILE',
help='Custom substitution text file that contains '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear from the help detail what this is for, is it to "fix up" the dictionary to deal with it matching escape sequences? To actually do sed type runs on my codebase or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this linked to #233 ?

@larsoner
Copy link
Member

larsoner commented Apr 5, 2020

Also in the related PR #174 it was noted that the write-changes flag was problematic when dealing with escapes, if we come back to this we should make sure that write-changes works even with these substitutions

@akaszynski akaszynski closed this May 27, 2021
@bl-ue
Copy link
Contributor

bl-ue commented May 27, 2021

Why the close? I see this is quite old 🤔

@akaszynski
Copy link
Author

Why the close? I see this is quite old thinking

Just cleaning up old PRs. I'd like to work on this, but this project hasn't been updated in quite some time (last release 6 months ago).

@peternewman
Copy link
Collaborator

Hi @akaszynski ,

Despite the lack of releases, coding is still happening (although mostly in the dictionary), also see #1923:
v2.0.0...master

You didn't seem to have respond to the review comments myself and @larsoner had left if you were expecting it to have been merged.

@akaszynski
Copy link
Author

You didn't seem to have respond to the review comments myself and @larsoner had left if you were expecting it to have been merged.

True, there's still work to be done. I'll work on this.

@bl-ue bl-ue marked this pull request as draft June 13, 2021 13:44
@akaszynski akaszynski closed this Jun 16, 2022
rgetz added a commit to rgetz/libiio that referenced this pull request Jun 5, 2023
Today, codespell looks at "\tRead" which is "tab" followed by "Read" as
"tRead", and flags this as a spelling mistake. See:
codespell-project/codespell#1422

which was closed without merging.

Once that is resolved upstream - we can undo this.

Signed-off-by: Robin Getz <[email protected]>
rgetz added a commit to rgetz/libiio that referenced this pull request Jun 6, 2023
Today, codespell looks at "\tRead" which is "tab" followed by "Read" as
"tRead", and flags this as a spelling mistake. See:
codespell-project/codespell#1422

which was closed without merging.

Once that is resolved upstream - we can undo this.

Signed-off-by: Robin Getz <[email protected]>
pcercuei pushed a commit to analogdevicesinc/libiio that referenced this pull request Jun 9, 2023
Today, codespell looks at "\tRead" which is "tab" followed by "Read" as
"tRead", and flags this as a spelling mistake. See:
codespell-project/codespell#1422

which was closed without merging.

Once that is resolved upstream - we can undo this.

Signed-off-by: Robin Getz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants