Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve taxonomy parser to handle commas that are part of entry names #210

Open
Tracked by #5505
stephanegigandet opened this issue Feb 8, 2016 · 3 comments
Open
Tracked by #5505
Labels
🥗 Ingredients 🧴 Open Beauty Facts Our cosmetic analysis project https://world.openbeautyfacts.org P4 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies

Comments

@stephanegigandet
Copy link
Contributor

e.g. for Open Beauty Facts, we have chemicals with formulas like:

5-Bromo-5-Nitro-1,3 Dioxane

Currently commas are considered separators:

  1. in the taxonomy definition (synonyms are separated by commas and optionaly spaces)
  2. in the fields (e.g. categories) entry
@stephanegigandet
Copy link
Contributor Author

Proposed solution (currently live on http://world.openbeautyfacts.org): do not treat commas as separators if they are between 2 digits.

replace by a lower comma ‚
$line =~ s/(\d),(\d)/$1‚$2/g;

Need to check if this covers all the cases of names that contain commas, and if there are cases where commas between 2 digits should indeed be treated as separators.

@teolemon teolemon added the 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies label Feb 18, 2016
@VaiTon
Copy link
Member

VaiTon commented Oct 25, 2019

News on this @stephanegigandet ?

@stephanegigandet
Copy link
Contributor Author

I didn't have time to work on it. It is something we'll probably need to do eventually, but currently it is not high in terms of priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥗 Ingredients 🧴 Open Beauty Facts Our cosmetic analysis project https://world.openbeautyfacts.org P4 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies
Projects
Status: To discuss and validate
Status: To Discuss & Validate
Development

No branches or pull requests

4 participants