Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve capitalization and other prose formatters #1216

Merged
merged 12 commits into from
Jul 22, 2023

Conversation

nriley
Copy link
Collaborator

@nriley nriley commented Jun 17, 2023

  • Use Python's .title() to better handle hyphenated words.
  • Make CAPITALIZE_ALL_WORDS (should really be named something like TITLE_CASE_ALL_WORDS) a prose formatter and extend it to handle punctuation.
  • Don't change case of words that already include capital letters.
  • Add tests.

- Use Python's .title() to better handle hyphenated words.
- Make CAPITALIZE_ALL_WORDS (should really be named something like TITLE_CASE_ALL_WORDS) a prose formatter and extend it to handle punctuation.
- Don't change case of words that already include capital letters.
- Add tests.
@nriley nriley marked this pull request as ready for review June 17, 2023 17:57
@brief
Copy link
Contributor

brief commented Jun 18, 2023

I like this!

The challenge with title() is possessives and contractions:

they're bill's friends from the UK

Becomes:

They'Re Bill'S Friends From The UK

Looks like this also removes title from formatters_words, which means formatted title <text> no longer works in dictation mode. Is there another way to call it in that mode as a prose formatter? This also means we can't stack formatters (e.g. title string <text>).

@brief
Copy link
Contributor

brief commented Jun 18, 2023

A few other things I've been mulling over for my own repo:

  • In most major style guides, the last word of a title should be capitalized, even if it is typically lowercased. So There is Nothing to Be Afraid of would be There is Nothing to Be Afraid Of. Worth doing here? Or is it more likely to confuse?
  • The community small words list includes is and up, which really shouldn't be on the list. It's missing en, if, per, and via, which typically are. Plus v and vs from a legal context (though I expect most people would only use those if they're overriding versus in words_to_replace.csv).
  • I like what this PR does with hyphenated words. Can we be even smarter? man-in-the-middle should be capitalized as Man-in-the-Middle but stand-by as Stand-By.

nriley added 4 commits June 18, 2023 11:41
Use .capitalize() rather than .title() for capitalization.

Split hyphenated words and apply the same logic to their components.
…as in each word in the title case formatter.
@nriley
Copy link
Collaborator Author

nriley commented Jun 18, 2023

@brief Thanks for the feedback.

Think I have fixed everything except for the issue with the prose formatter not being applicable in dictation mode. I also noticed that seemed to be a problem when applying a formatter from Cursorless or using "format help" — they completely ignore the prose formatters.

Can anybody think of a reason why prose formatters should not be available everywhere (code) formatters are?

@nriley nriley marked this pull request as draft June 18, 2023 17:37
nriley added a commit to nriley/talon_community that referenced this pull request Jun 18, 2023
@brief
Copy link
Contributor

brief commented Jun 18, 2023

@nriley up is still included in the list of small words. Looks great otherwise.

Assuming prose formatters should be available everywhere, which I agree with, capitalize() introduces an edge case with quoted strings (or any word that's prefixed by a non-alpha character).

'hello world' and "hello world"

Become:

'hello World' and "hello World"

Instead of:

'Hello World' and "Hello World"

Only other character that might matter in prose is a left parenthesis (e.g. (Hello World))?

@nriley
Copy link
Collaborator Author

nriley commented Jun 18, 2023

Updated; it should handle punctuation at the beginning of words.

There's a bunch of nuances with prose formatters I still need to sort out — for example, should you be able to combine code and prose formatters (e.g., "kebab title")?

@brief
Copy link
Contributor

brief commented Jun 19, 2023

Thanks. This looks good to me.

Re: combining code and prose formatters, I'm not entirely sure how they differ. Why are there two different types?

@nriley
Copy link
Collaborator Author

nriley commented Jun 19, 2023

Thanks! There are three main differences between prose and code formatters:

  1. When dictating with a prose formatter, punctuation and symbols will work just like you are in dictation mode, including automatic spacing and capitalization. For example, if I am in command mode and say:

    • "phrase this is comma a test" → "this is comma a test" [no formatter with user.text]
    • "all down this is comma a test" → "this is,a test" [code formatter]
    • "kebab this is comma a test" → "this-is,a-test" [code formatter]
    • "say this is comma a test" → "this is, a test" [prose formatter]
  2. You can compose code formatters (order is significant).

    • "kebab hammer this is a test" → "This-Is-A-Test" [2 code formatters]
    • "hammer kebab this is a test" → "This-is-a-test" [2 code formatters]
  3. You can use code formatters in other places (e.g., reformatting with <user.formatters> that)

Most of the changes I made to "title" will be of limited use if it remains a code formatter, unless you're using it for reformatting of text you might have inserted with a prose formatter.

My main concern would be if people are composing "title" with other code formatters, e.g. using "kebab title this is a test", which in stock knausj returns "This-is-a-Test", but won't work with this PR as it is currently.

If that is not a major concern, what I think will work is replacing the definition of user.format_text so it only contains code formatters then expanding user.formatters to include prose formatters.

Another intermediate option might be allowing composition of code with prose formatters but only if you start with a code formatter.

@brief
Copy link
Contributor

brief commented Jun 19, 2023

Wow, thank you for the detailed explanation!

Curious to know what the feedback will be on composing "title" with other code formatters. No concerns here, but I'm the wrong person to ask. I only ever use it in the context of prose and prefer "hammer" when composing since I typically want to cap the start of every word.

nriley added 3 commits June 24, 2023 17:34
- You can now reformat with prose formatters.
- Reformatting with (only) a prose formatter doesn't unformat first.
- Prose formatters appear in "format help", marked as such.
@nriley
Copy link
Collaborator Author

nriley commented Jun 24, 2023

OK, I think I've got a reasonable solution here. You can compose prose and code formatters, but only when reformatting. Feedback appreciated!

@nriley nriley marked this pull request as ready for review June 24, 2023 21:41
@nriley nriley changed the title Improve capitalization formatters Improve capitalization and other prose formatters Jun 24, 2023
@brief
Copy link
Contributor

brief commented Jun 25, 2023

Looks good on my end.

@nriley nriley requested a review from AndreasArvidsson July 22, 2023 15:27
@nriley nriley merged commit bb91bef into talonhub:main Jul 22, 2023
@nriley nriley deleted the capitalize-formatters branch July 22, 2023 21:05
blyons333 pushed a commit to blyons333/knausj_talon that referenced this pull request Aug 11, 2023
- Use Python's `.title()` to better handle hyphenated words.
- Make `CAPITALIZE_ALL_WORDS` (should really be named something like
`TITLE_CASE_ALL_WORDS`) a prose formatter and extend it to handle
punctuation.
- Don't change case of words that already include capital letters.
- Add tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants