Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor is_wc_cat_id_start math symbol list #50222

Closed
wants to merge 1 commit into from
Closed

Refactor is_wc_cat_id_start math symbol list #50222

wants to merge 1 commit into from

Conversation

savq
Copy link

@savq savq commented Jun 20, 2023

  • Check ∂, ∃, ∄, ∅, ∆, ∇ and ∎, ∏, ∐, ∑ in two ranges, instead of comparing wc to each one individually.
  • Check with the rest of the angle symbols.
  • Check at the beginning with the other symbols that are not in any range.
  • Check integral symbols in a separate range.
  • Move angle symbols inside the big check (they're in the range 0x2140:0x2a1c)
  • Move some symbols to match lexicographical order.

This commit does NOT add any new characters.


Motivation:

  • When new characters get added, they usually get checked individually instead of adding new character ranges. (for example parse unicode forall and exists as identifiers #42314).
  • Keeping the logic simple here helps 3rd party parsers. A discussion in tree-sitter-julia is what prompted this PR. Lezer-julia also mantains a javascript version of these functions
  • It makes it easier to see "gaps" in the valid characters. For example (U+2201 COMPLEMENT) or the integrals in the range 0x2a17:0x2a1a could probably be added as characters (I'm not advocating that more characters get added, I'm just pointing it out).

- Check `∂, ∃, ∄, ∅, ∆, ∇` and `∎, ∏, ∐, ∑` in two ranges,
  instead of comparing `wc` to each one individually.
- Check ∟ with the rest of the angle symbols.
- Check ∞ at the beginning with the other symbols that are not in any range.
- Check integral symbols in a separate range.
- Move angle symbols inside the big check (they're in the range 0x2140 : 0x2a1c)
- Move some symbols to match lexicographical order.

This commit does NOT add any new characters.
@savq savq closed this Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant