Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow projects and contexts to contain non-Latin characters #31

Merged
merged 2 commits into from
May 31, 2021

Conversation

dehero
Copy link
Contributor

@dehero dehero commented May 26, 2021

Hello.

\b token at the end of regex searching pattern blocked Cyrillic and other non-Latin names for projects and contexts to be found and highlighted. Removing it solves the problem though now regex can eat more symbols than it was expected initially. I suppose it's worth.

Before:
image

After:
image

@davraamides
Copy link
Owner

Thanks, @dehero. Forgive my ignorance with Cyrillic languages! Before I accept your pull request, can you test tags, too (e.g. tag:value) as those regex patterns also begin and end with the \b token. I'd like to fix those at the same time, too, if needed. As I'm looking at my code, I can't remember why I think I needed the word boundary in the pattern but there may be some edge cases I need to consider.

@dehero
Copy link
Contributor Author

dehero commented May 27, 2021

Now I fixed tags too.

Before:
image

After:
image

Forgive my ignorance with Cyrillic languages!

Better to say that the Cyrillic languages were initially ignored by the developers of regular expressions.

I'm looking at my code, I can't remember why I think I needed the word boundary in the pattern but there may be some edge cases I need to consider.

Regarding boundary tokens, by removing them, we allow not only the use of non-Latin letters, but also use of any other non-whitespace characters, so these become valid:

+pro,ject, // project
@c*(n)tex: // context
\:$        // tag

Though todo.txt format has some sort of specification, I cannot find there details on which symbols are allowed or disallowed. Each editor or highlighter acts on it's own.

I generally think that todotxt-mode token parsing needs some more refinement for not-letter symbols. But for now we just fix a more significant issue. It's obvious that not-Latin letters should be allowed.

@davraamides davraamides merged commit e6fea87 into davraamides:master May 31, 2021
davraamides added a commit that referenced this pull request May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants