Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for digraphs #2852

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Lindenk
Copy link

@Lindenk Lindenk commented Jun 21, 2022

Closes #1438

The backend is implemented as a trie and finds suggestions using a simple breadth first search starting at the node given by the user's input. I chose the default keybind of Ctrl-K because while vim uses Ctrl-k, in helix that's currently bound to kill_to_line_end. This implementation is to set up initial, usable support for digraphs and can later be improved with:

  • Better suggestions (currently Prompt doesn't allow for extra text in suggestions, so that would need to exist first)
  • Fuzzy find on both input sequences and descriptions
  • A sane, agreed upon default set of digraphs (currently only user configured symbols are supported)
  • Configurable auto-input once a digraph is matched, and there are no more options the input could represent
  • replace, append, and insert variants for binding to other keys in other modes as needed
output.mp4

@tmke8
Copy link

tmke8 commented Jun 27, 2022

Is digraph the right name for this feature? I guess the name comes from using, for example, ctrl-K and "ae" to get the symbol æ, which is sort of a digraph. But if you use this to write Hiragana, I'd say it's more of a general character input method.

@Lindenk
Copy link
Author

Lindenk commented Jun 27, 2022

I just copied nvim on this one, and they also use it for Hiragana. There's also plugins like better digraphs which turns it into a general character lookup.

I would assume most vim users would look for "digraph" as the name for this feature even though it's not a very accurate description.

If it should be changed, what name should should we go with?

@pickfire
Copy link
Contributor

pickfire commented Jun 27, 2022

What is your use case? If it's just japanese characters a better way is to use japanese or use a IME to input those kanji.

Also, for something like shrug, abbr in vim is probably a better choice compared to this.

@Lindenk
Copy link
Author

Lindenk commented Jun 27, 2022

The goal is to provide a customizable input tool similar to vim's digraphs, not specifically for japanese characters. The video is an example based on digraphs available in vim, not an exhaustive list (I didn't feel a multi-hour video going through every option was necessary). Common use cases are usually mathmatics or computer science related symbols such as TH -> þ or the custom example given by the author of #1438 *| -> λ

I would argue abbv is not the same feature as this. Automatic implicit text replacement can be unwanted in many situations, while this character input command is explicit. For example, I wouldn't want ¯\_(ツ)_/¯ to appear every time I type shrug in a sentence

The reason I structured it as a general input tool and not exactly like vim's digraph is for flexability and customizability. It can be used to implement exactly the same behavior as vim's digraph (with a few of the features I bullet pointed above) while also allowing improvements such as fuzzy find, and symbol names and replacement of 1 or more characters

@EpocSquadron
Copy link
Contributor

We get occasional requests for this for one of the programming languages that uses mathematical symbols instead of ASCII approximations.

I wouldn't be opposed, but perhaps this is a subset of snippet support (#395)? Could we merge this now and grow it to support snippets later when we get marks, or else would it be better to wait for an all at once implementation and ensure this is in there as a special case?

@pickfire
Copy link
Contributor

pickfire commented Jul 1, 2022

But it seemed weird given that it can accept more than 2 characters, digraph is supposed to accept two characters.

@Lindenk
Copy link
Author

Lindenk commented Jul 1, 2022

I can rename it to something else, maybe just unicode input? I figured most people looking for this feature would search for diagraph though

@EpocSquadron
Copy link
Contributor

But it seemed weird given that it can accept more than 2 characters, digraph is supposed to accept two characters.

That's why I say it seems like a good base for snippets, especially if it grows the ability to add custom shortcut triggers. Add on top of that marks and we have snippets.

@kirawi kirawi added A-helix-term Area: Helix term improvements S-waiting-on-review Status: Awaiting review from a maintainer. labels Sep 13, 2022
@kirawi kirawi self-requested a review January 17, 2023 15:23
@velllu
Copy link

velllu commented Jun 7, 2023

Any interest in finishing this? It's an essential feature for non-english speakers.

omentic added a commit to omentic/helix that referenced this pull request Jul 16, 2023
@omentic
Copy link
Contributor

omentic commented Jul 16, 2023

I really like this patch. It's easy and fast to use, and also adds support for the handful of Unicode-heavy programming languages - particularly proof assistants.

I might suggest supporting generic Unicode input via hex literals as a fallback and reading from a different file than config.toml. It might also be good to bundle default digraphs with Helix, I've been going through and making a personal file and it's turned out to look just like every other list of commonly used Unicode characters out there.

@omentic
Copy link
Contributor

omentic commented Jul 17, 2023

I put together a list of some digraphs for personal use: mostly mathematics and linguistics focused. For anyone running this patch, feel free to use this as a starting point and tweak as desired.

expand me
[editor.digraphs]
## Lowercase Greek
alpha = "α"
beta = "β"
gamma = "γ"
delta = "δ"
epsilon = "ε"
zeta = "ζ"
eta = "η"
theta = "θ"
iota = "ι"
kappa = "κ"
lambda = "λ"
mu = "μ"
nu = "ν"
xi = "ξ"
omicron = "ο"
pi = "π"
rho = "ρ"
sigma = "σ"
tau = "τ"
upsilon = "υ"
phi = "φ"
chi = "χ"
psi = "ψ"
omega = "ω"

## Alternate Greek
varbeta = "ϐ"
vargamma = "ɣ"
varepsilon = "ϵ"
vartheta = "ϑ"
varkappa = "ϰ"
varpi = "ϖ"
varrho = "ϱ"
varsigma = "ς"
varphi = "ɸ"

## Uppercase Greek
Alpha = "Α"
Beta = "Β"
Gamma = "Γ"
Delta = "Δ"
Epsilon = "Ε"
Zeta = "Ζ"
Eta = "Η"
Theta = "Θ"
Iota = "Ι"
Kappa = "Κ"
Lambda = "Λ"
Mu = "Μ"
Nu = "Ν"
Xi = "Ξ"
Omicron = "Ο"
Pi = "Π"
Rho = "Ρ"
Sigma = "Σ"
Tau = "Τ"
Upsilon = "Υ"
Phi = "Φ"
Chi = "Χ"
Psi = "Ψ"
Omega = "Ω"

## Double-struck / Blackboard bold
AA = "𝔸"
BB = "𝔹"
CC = ""
DD = "𝔻"
EE = "𝔼"
FF = "𝔽"
GG = "𝔾"
HH = ""
II = "𝕀"
JJ = "𝕁"
KK = "𝕂"
LL = "𝕃"
MM = "𝕄"
NN = ""
OO = "𝕆"
PP = ""
QQ = ""
RR = ""
SS = "𝕊"
TT = "𝕋"
UU = "𝕌"
VV = "𝕍"
WW = "𝕎"
XX = "𝕏"
YY = "𝕐"
ZZ = ""

## Small caps
sa = ""
sb = "ʙ"
sc = ""
sd = ""
se = ""
sf = ""
sg = "ɢ"
sh = "ʜ"
si = "ɪ"
sj = ""
sk = ""
sl = "ʟ"
sm = ""
sn = "ɴ"
so = ""
sp = ""
sq = ""
sr = "ʀ"
ss = ""
st = ""
su = ""
sv = ""
sw = ""
sx = "x"
sy = "ʏ"
sz = ""

## Hebrew letters
alef = "א"
bet = "ב"
gimel = "ג"
shin = "ש"

## Extra letters
ell = ""
angstrom = ""
degree = "°"
celcius = ""
fahrenheit = ""
kelvin = ""
Re = ""
Im = ""
section = "§"
refmark = ""

## Mathematics
forall = ""
exists = ""
notexists = ""
therefore = ""
because = ""
sum = ""
product = ""
coproduct = ""
qed = ""
top = ""
bot = ""
tee = ""
yields = ""
inf = ""
wreath = ""
compose = ""
convolve = ""
multimap = ""
pm = "±"
mp = ""
plus = "+"
minus = "-"
times = "×"
div = "÷"
divides = ""
notdivides = ""
parallel = ""
perp = ""
notparallel = ""
ident = ""
notident = ""
sident = ""
prop = ""
join = ""
smash = ""

## Calculus
diff = ""
nabla = ""
laplace = ""
int = ""
iint = ""
iiint = ""
iiiint = ""
sumint = ""
closedint = ""
surfint = ""
volint = ""

## Logic
not = "¬"
and = ""
or = ""
xor = ""
in = ""
notin = ""
ni = ""
notni = ""
sub = ""
sube = ""
notsub = ""
notsube = ""
sup = ""
supe = ""
notsup = ""
notsupe = ""
union = ""
sect = ""
without = ""
emptyset = ""
null = ""
to = ""
gets = ""
implies = ""
implied = ""
iff = ""
models = ""

## Relations
ratio = ""
eq = "="
gt = ">"
lt = "<"
geq = ""
leq = ""
prec = ""
succ = ""

## Punctuation
amp = "&"
pma = ""
pil = ""
lip = ""
# at = "@"
# hash = "#"
# colon = ":"
# comma = ","
# period = "."
# semicolon = ";"
# slash = "/"
# backslash = "\\"
# exclamation = "!"
bullet = ""
ast = ""
kleene = ""
dagger = ""
ddagger = ""
interrobang = ""

## Ligatures
ae = "æ"
AE = "Æ"
oe = "œ"
varoe = "ɶ"
OE = "Œ"
lezh = "ɮ"
dezh = "d͡ʒ"

## Linguistics
ash = "æ"
Ash = "Æ"
ethel = "œ"
Ethel = "Œ"
emg = "ɱ"
Emg = ""
eng = "ŋ"
Eng = "Ŋ"
esh = "ʃ"
Esh = "Ʃ"
eth = "ð"
Eth = "Ð"
ezh = "ʒ"
Ezh = "Ʒ"
schwa = "ə"
tap = "ɾ"
vtap = ""
stop = "ʔ"
ramhorns = "ɤ"
bullseye = "ʘ"
tm = "ɯ"
TM = "Ɯ"
ty = "ʎ"
tr = "ɹ"
tsr = "ʁ"
bl = "ɬ"
nlh = "ɲ"
nrh = "ɳ"
vh = "ʋ"
bh = "ɓ"
BH = "Ɓ"
dh = "ɗ"
DH = "Ɗ"
gh = "ɠ"
GH = "Ɠ"
rt = "ʈ"
rd = "ɖ"
# ɽɟʂʐçʝħʕɦɻɰɭ
# ǀǃǂǁʄ

## Old English
thorn = "þ"
Thorn = "Þ"
wynn = "ƿ"
Wynn = "Ƿ"

## Assorted
spade = ""
heart = ""
club = ""
diamond = ""
maltese = ""
bitcoin = ""
dollar = "$"
euro = ""
franc = ""
lira = ""
peso = ""
pound = "£"
ruble = ""
rupee = ""
won = ""
yen = "¥"

@omentic
Copy link
Contributor

omentic commented Aug 6, 2023

After working with this for a while, I've found it exceedingly helpful. These bindings work well for me: and I would suggest having space function the same as enter for ease of use.

[keys.insert]
"\\" = "insert_digraph"

[editor.digraphs]
"\\" = "\\"
...

@kirawi
Copy link
Member

kirawi commented Sep 5, 2023

Sorry for taking a while to review. I'm confused on why you opted for a trie instead of a vector. I would probably implement this feature as a Vec<DigraphEntry> and implement completion through helix_core::fuzzy::fuzzy_match like how it's done for

fn filename_impl<F>(

@Lindenk
Copy link
Author

Lindenk commented Sep 12, 2023

I don't think I knew there was already a fuzzy match implementation (or if there was one a year ago). It should be pretty easy to swap over to using it instead of a handbuilt trie if that's preferable

@kirawi
Copy link
Member

kirawi commented Sep 12, 2023

So I brought this up on Matrix, and @pascalkuthe said,

In particular I think this should be handled by the same infrastructure that will Handel custom snippets (and abbreviations). I think just having an abbreviation which has a special flag that makes it a diagraph that would not make it showup automatically but only once you press a certain key would be enough (it would just use fuzzy filtering/the normal completion windoe but autoconfirm once there only is a single match)

omentic added a commit to omentic/helix-ext that referenced this pull request Nov 1, 2023
omentic pushed a commit to omentic/helix-ext that referenced this pull request May 1, 2024
omentic pushed a commit to omentic/helix-ext that referenced this pull request May 1, 2024
omentic pushed a commit to omentic/helix-ext that referenced this pull request May 1, 2024
omentic pushed a commit to omentic/helix-ext that referenced this pull request May 1, 2024
omentic pushed a commit to omentic/helix-ext that referenced this pull request May 1, 2024
@omentic
Copy link
Contributor

omentic commented Oct 25, 2024

Seems this should be closed in favour of #9801?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-helix-term Area: Helix term improvements S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Digraphs and Unicode input tools
7 participants