Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix detection of suffix/prefix changes for name-changes #4421

Merged
merged 3 commits into from
Oct 11, 2017

Conversation

MoKob
Copy link

@MoKob MoKob commented Aug 17, 2017

Issue

Resolves #4420

Tasklist

  • add regression / cucumber cases (see docs/testing.md)
  • review
  • adjust for comments

@MoKob MoKob force-pushed the fix/suffix-detection branch from fc98077 to 8cdfd2a Compare August 18, 2017 12:51
@MoKob MoKob added this to the 5.12.0 milestone Aug 18, 2017
@MoKob MoKob force-pushed the fix/suffix-detection branch 3 times, most recently from 3e53c12 to 2a52bed Compare August 18, 2017 14:36
@TheMarex TheMarex modified the milestones: 5.13.0, 5.12.0 Aug 31, 2017
@MoKob MoKob requested a review from oxidase October 2, 2017 11:46
Copy link
Contributor

@oxidase oxidase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoKob please could you tell what the status of the PR, i left some comments about utf-8, but if it is a US-ASCII-only solution then just ignore comments

}
}
// the best position marks the end of the string
return lhs.substr(best_pos - best, best);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be UTF-8 friendly? Cyrillic 'а' (#xD0 #xB0) and 'б' (#xD0 #xB0) have common prefix xD0 but the returned prefix has no encoding.

http://coliru.stacked-crooked.com/a/1bdfc10033256f67

ab
аб�

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an issue, as the substring still needs to match our pre-set database of suffixes/prefixes. Unless incomplete suffixes are stored within that set, we cannot match against what is left.

return "";

// array for dynamic programming
std::vector<std::vector<std::uint32_t>> dp(lhs.size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two single vectors can be used.


// trim spaces, transform to lower
const auto trim = [](auto str) {
boost::to_lower(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto first_prefix_and_suffixes = getPrefixAndSuffix(first);
const auto second_prefix_and_suffixes = getPrefixAndSuffix(second);
const auto checkTable = [&](const std::string &str) {
// workaround for cucumber tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙀

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... it's a bad side-effect of having all the different roads that are just two letters and include N/E/S/W in their lettering :(. In the end, this check prevents matching suffixes that are just 1 of two letters. This could be relevant in 1N 1S, if that should exist anywhere. In general, it is more helpful on cucumber tests, where we would need to ensure that no name is actually just two letters (which is mostly the case right now)

@MoKob MoKob force-pushed the fix/suffix-detection branch from b2a7c9c to 93281d5 Compare October 11, 2017 11:11
@MoKob MoKob merged commit fd52c80 into master Oct 11, 2017
@MoKob MoKob deleted the fix/suffix-detection branch October 11, 2017 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants