As correctly said in this article
Don’t even compare human text if you can
As such, you shouldn't rely on this package as only data point.
Stations can sometimes can be located very close to each other with a similar name,but on a similar note a location can be known under multiple names.
As we should be able have a educated guess if 2 locations are the same, as such we should detect that
Eindhoven Centraal and Eindhoven Strijp-S are not the same, but that Eidnhoven Centraal and Eindhoven are the same.
- Digest station name:
- To city, country, etc...
- Logic to split multilanguage label
- Harmonize (clean-up duplicate spaces, replace
—
with-
, etc.) - Proposal:
parse(name: string, coordinates: { lat: number, lon: number }, language?: string[]) => { country?: string, appendix?: string, type?: string, harmonized: string }
- Slugify in different formats
- Name comparison
- Proposal:
compare({ name: string, coordinates: { lat: number, lon: number }, language?: string[] }, { name: string, coordinates: { lat: number, lon: number }, language?: string[] }) => { score: number, harmonizedEqual: boolean, digest: [any, any]}
- Proposal:
Use compare({ name: ... }, { name: ... })
- City name should match city name when appended with central station,
Eindhoven
==Eindhoven Centraal
- Should recoginize abbrevations,
St. Pölten
==Sankt Pölten
- Should recognize diacritics,
Woergl
==Wörgl
andWorgl
==Wörgl
- Ignore dash (when possible?),
's-Hertogenbosch
=='s Hertogenbosch
- Compare different seperators,
Bascharage-Sanem
==Bascharage/Sanem
- Ignores country names,
Athens (Greece)
==Athens
Use compareWithAlias({ name: ... }, { name: ... })
To compare a aliases wikidata's database is used, rate limits may apply.
- This is useful for comparing exonyms,
Köln
!=Cologne
- Or for aliases,
Den Bosch
!='s-Hertogenbosch
- Substation should not at match city name,
Eindhoven
!=Eindhoven Strijp-S