You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Many PDF file contains ligature substitution, so "fi" or "ff" becomes a single single character instead of two consecutive yet separate ones.
When you use ripgrep-all to search for string containing "fi" or "ff", the subsituted ones will not be matched.
Describe the solution you'd like
Break down common ligatures in rga-preproc.
Describe alternatives you've considered
Identify contractable ligatures in the search pattern and replace thme with (contracted)|(original).
For example, rga definition should actually search for de((fi)|X)nition where X is the ligature "fi".
Is your feature request related to a problem? Please describe.
Many PDF file contains ligature substitution, so "fi" or "ff" becomes a single single character instead of two consecutive yet separate ones.
When you use
ripgrep-all
to search for string containing "fi" or "ff", the subsituted ones will not be matched.Describe the solution you'd like
Break down common ligatures in
rga-preproc
.Describe alternatives you've considered
Identify contractable ligatures in the search pattern and replace thme with
(contracted)|(original)
.For example,
rga definition
should actually search forde((fi)|X)nition
where X is the ligature "fi".Additional context
Backgrounds: wikipedia: ligatures in computer typesetting
The results are 27429, 5986, and 21451.
The text was updated successfully, but these errors were encountered: