Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use unicode_script crate to check chars are Latin script inside is_english_lingual #504

Merged
merged 4 commits into from
Jan 28, 2025

Conversation

hippietrail
Copy link
Contributor

I noticed is_english_lingual() doesn't check that words are in the Latin script so it would match any 'alphabet' except Chinese, Japanese, and Korean.

Unicode character properties are finer-grained than Unicode blocks. And if new extension blocks are added you don't have to modify the code. I left the other logic in there but some of it may now be redundant.

Note that typst, which I don't know much about but seems to have some link to Harper also uses the unicode_script crate.

@hippietrail
Copy link
Contributor Author

The build errors are something to do with "manifest keys"??

@elijah-potter
Copy link
Collaborator

The build errors are something to do with "manifest keys"??

When you add a dependency, cargo needs to add an entry to the Cargo.lock manifest file. Make sure that file is added to your git stage before you commit changes to dependencies.

Also, would you mind making sure your commits match the practices laid out in our documentation?
Particularly, using Conventional Commits.

@elijah-potter
Copy link
Collaborator

Hey @hippietrail, I feel pretty confident merging this. Could you give me a couple examples that replicate the issue so I can add them as test cases?

@hippietrail
Copy link
Contributor Author

Hey @hippietrail, I feel pretty confident merging this. Could you give me a couple examples that replicate the issue so I can add them as test cases?

These say "This is in Greek/Georgian/Thai" in those languages:

Αυτό είναι στα ελληνικά.
ეს ქართულად.
นี่มันภาษาไทย

This is English with misstakes.

Without the change looks like this:
image

With it it's like this:
image

@elijah-potter elijah-potter merged commit 228ea7f into Automattic:master Jan 28, 2025
17 checks passed
@hippietrail hippietrail deleted the unicode-script branch January 28, 2025 17:11
@hippietrail
Copy link
Contributor Author

I should note two things:

  • This does not handle the rare case of words that have letters from multiple scripts. I believe those will always get split at script-change boundaries. In English my guess is this will pop up from time to time with Greek letters.
  • This means all Latin extended letters can be included in English words, such as Polish, Icelandic, obsolete letters, etc. But the spellchecker will flag them so not a problem.

tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 29, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | patch | `v0.18.0` -> `v0.18.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.18.1`](https://github.com/Automattic/harper/releases/tag/v0.18.1)

[Compare Source](Automattic/harper@v0.18.0...v0.18.1)

#### What's Changed

-   build(deps): bump [@&#8203;sveltepress/theme-default](https://github.com/sveltepress/theme-default) from 5.0.5 to 5.0.7 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#519
-   build(deps-dev): bump eslint-config-prettier from 9.1.0 to 10.0.1 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#518
-   build(deps-dev): bump vite-plugin-top-level-await from 1.4.1 to 1.4.4 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#516
-   build(deps-dev): bump esbuild from 0.20.2 to 0.24.2 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#517
-   build(deps-dev): bump flowbite from 1.8.1 to 3.0.0 in /packages by [@&#8203;dependabot](https://github.com/dependabot) in Automattic/harper#515
-   technical terms and popular names/websites by [@&#8203;MohamedAbdeen21](https://github.com/MohamedAbdeen21) in Automattic/harper#522
-   fix(core): `AnA` linter did not recognize capital articles by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#521
-   use unicode_script crate to check chars are Latin script inside `is_english_lingual` by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#504
-   sort and add to list of holidays by [@&#8203;hippietrail](https://github.com/hippietrail) in Automattic/harper#509
-   fix(core): `RepeatedWords` now detects repeated `and` tokens by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#520
-   `harper-core` Documentation Updates by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#513
-   feat: improve workflow for harper.js by [@&#8203;Asuka109](https://github.com/Asuka109) in Automattic/harper#526
-   Add more languages by [@&#8203;elijah-potter](https://github.com/elijah-potter) in Automattic/harper#495

#### New Contributors

-   [@&#8203;MohamedAbdeen21](https://github.com/MohamedAbdeen21) made their first contribution in Automattic/harper#522
-   [@&#8203;Asuka109](https://github.com/Asuka109) made their first contribution in Automattic/harper#526

**Full Changelog**: Automattic/harper@v0.18.0...v0.18.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMzcuMiIsInVwZGF0ZWRJblZlciI6IjM5LjEzNy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants