-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix detection of suffix/prefix changes for name-changes #4421
Conversation
fc98077
to
8cdfd2a
Compare
3e53c12
to
2a52bed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MoKob please could you tell what the status of the PR, i left some comments about utf-8, but if it is a US-ASCII-only solution then just ignore comments
} | ||
} | ||
// the best position marks the end of the string | ||
return lhs.substr(best_pos - best, best); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be UTF-8 friendly? Cyrillic 'а' (#xD0 #xB0) and 'б' (#xD0 #xB0) have common prefix xD0 but the returned prefix has no encoding.
http://coliru.stacked-crooked.com/a/1bdfc10033256f67
ab
аб�
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an issue, as the substring still needs to match our pre-set database of suffixes/prefixes. Unless incomplete suffixes are stored within that set, we cannot match against what is left.
return ""; | ||
|
||
// array for dynamic programming | ||
std::vector<std::vector<std::uint32_t>> dp(lhs.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two single vectors can be used.
|
||
// trim spaces, transform to lower | ||
const auto trim = [](auto str) { | ||
boost::to_lower(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf-8 problem here http://coliru.stacked-crooked.com/a/7da4a8be589f8c36
const auto first_prefix_and_suffixes = getPrefixAndSuffix(first); | ||
const auto second_prefix_and_suffixes = getPrefixAndSuffix(second); | ||
const auto checkTable = [&](const std::string &str) { | ||
// workaround for cucumber tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well... it's a bad side-effect of having all the different roads that are just two letters and include N/E/S/W
in their lettering :(. In the end, this check prevents matching suffixes that are just 1 of two letters. This could be relevant in 1N
1S
, if that should exist anywhere. In general, it is more helpful on cucumber tests, where we would need to ensure that no name is actually just two letters (which is mostly the case right now)
2a52bed
to
bea5778
Compare
b2a7c9c
to
93281d5
Compare
Issue
Resolves #4420
Tasklist