Skip to content

Commit

Permalink
normalize search string to NFC before comparison (#1272)
Browse files Browse the repository at this point in the history
Normalize the search string to NFC since all data in LF is normalized to NFC on disk.  This allows for exact match or ignore diacritic queries to work regardless of form or language, e.g. Korean.

A note about this fix:
- All data is normalized to NFC in the database on write.  It's been this way for years.
- @longrunningprocess 's addition in #1243 normalized the query to NFD for the purposes of removing diacritics from the data and query.  This a fine approach.
- This PR could have chosen to normalize all data to NFD for comparison under all circumstances given the second point above, however I chose to stick with NFC since that is what the data is underneath.  Either way works.

Fixes #1244
  • Loading branch information
megahirt authored Jan 11, 2022
1 parent bffa380 commit 453a09a
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions src/angular-app/bellows/core/offline/editor-data.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -457,13 +457,13 @@ export class EditorDataService {
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes
// https://unicode.org/reports/tr44/#Diacritic
// https://stackoverflow.com/a/37511463/10818013

return input.normalize('NFD').replace(/\p{Diacritic}/gu, '')
// Convert to NFD in order to remove any code points with the property 'Diacritic', then convert back to NFC for comparison
return input.normalize('NFD').replace(/\p{Diacritic}/gu, '').normalize('NFC');
}

private entryMeetsFilterCriteria(config: any, entry: LexEntry): boolean {
if (this.entryListModifiers.filterText() !== '') {
const rawQuery = this.entryListModifiers.filterText()
const rawQuery = this.entryListModifiers.filterText().normalize('NFC');
const normalizedQuery = this.entryListModifiers.matchDiacritic ? rawQuery : this.removeDiacritics(rawQuery);
const regexSafeQuery = this.escapeRegex(normalizedQuery);
const queryRegex = new RegExp(this.entryListModifiers.wholeWord ? `\\b${regexSafeQuery}\\b` : regexSafeQuery, 'i');
Expand Down

0 comments on commit 453a09a

Please sign in to comment.