Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfixes: locales and OMP series numbers #116

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nongenti
Copy link
Contributor

Bugs

  1. CSL needs to choose the right language xml file a locale in format "xy-ZW". So it only uses a fallback "en-US" since we change the locales. In this solution I first look, if "xy-XY" exists and else I'm looking in an array with entries build "xy" => "xy-ZW". If tried to map all possible languages form https://github.com/citation-style-language/locales. Only for zh-CN and zh-TW I had no idea, how to decide which one is to use.

  2. Series number in OMP was mapped to "volume", but should be mapped to "collection-number". See here: https://docs.citationstyles.org/en/stable/specification.html#appendix-iv-variables

@asmecher
Copy link
Member

@defstat, could you review this? Thanks!

@nongenti
Copy link
Contributor Author

I fixed an other bug. If you tried to cite a chapter and there is no chapter author, then you will get an error.

Copy link
Contributor

@defstat defstat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any objections on that changes

@nongenti
Copy link
Contributor Author

Should I do anything here or was this only fallen into oblivion?


protected function mapLocale(string $locale): string
{
$locales = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me nervous that we're inferring the region from the language code. For example, fa could be fa-IR or fa-AF (or possibly others). Hard-coding this mapping also means it'll need to be updated. Is there another way of doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Only languge codes you can find here (https://github.com/citation-style-language/locales ) are supported by CSL. Without a mapping we get always default en_US. So we need a mapping. We could realize the mapping also in the plugin settings. Then users are free to choose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coincidentally, here is a fresh example of the difference between fa-IR and fa-AF!

@bozana and @ajnyga, do you have a recent recommendation for this from the work you've been doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It just so happens that we're having an internal discussion about this at the moment -- more soon.)

Copy link
Member

@asmecher asmecher Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I'd suggest, e.g. to get the name of the CSL locale from an IETF language tag:

/**
 * Find the best match for a CSL locale, given the desired IETF language tag (and a fallback default).
 * @param $locale An IETF language tag.
 * @param $default A locale code to use for a default. This should already be sanitized.
 * @return string A language code that's available in the CSL library.
 */
function getCSLLocale(string $locale, string $default = 'en-US') : string
{
    $prefix = 'plugins/generic/citationStyleLanguage/lib/vendor/citation-style-language/locales/locales-';
    $suffix = '.xml';
    $preferences = [
        'es' => 'es-ES',
    ];

    // Determine the language and region we're looking for from $locale
    $localeParts = locale_parse($locale);
    $language = $localeParts['language'];
    $region = $localeParts['region'] ?? null;
    $localeAndRegion = $language . ($region ? "-{$region}" : '');

    // Get a list of available options from the filesystem.
    $availableLocaleFiles = glob("{$prefix}*{$suffix}");

    // 1. Look for an exact match and return it.
    if (in_array("{$prefix}{$locale}{$suffix}", $availableLocaleFiles)) return $locale;

    // 2. Look in the preference list for a preferred fallback.
    if ($preference = $preferences[$localeAndRegion] ?? false) return $preference;

    // 3. Find the first match by language.
    foreach ($availableLocaleFiles as $filename) {
        if (strpos($filename, "{$prefix}{$language}-") === 0) {
            return substr($filename, strlen($prefix), -strlen($suffix));
        }
    }

    // 4. Use the supplied default.
    return "{$prefix}{$default}{$suffix}";
}
  • It's safe to use user-supplied data for the $locale parameter
  • It uses a single glob() filesystem scan to find available locales, but this is acceptably fast, I think
  • You can provide $preferences inside the function (so that e.g. es-ES is preferred for es, rather than es-CA), but it's not necessary to maintain a list of all locales; providing zh will result successfully match zh-CN, for example.
  • It will handle more exotic locale variants e.g. uz@latin

I think this will be easier/better to maintain than an exhaustive list of locales, which will get forgotten. And it's about the same length :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks very well. What do you think, should we make the preferences configurable in the settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm pretty sure that we can choose a set of preferences that will work well for 99% of users, and save ourselves the pain of making it configurable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with only considering those that have multiple variants in CSL. The other ones would be found by that logic Alec suggested above. So, for now, those would be:

de -> de-DE
en -> en-US
es -> es-ES
fr -> fr-FR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a note that instead using locale_parse we might need to use:

      $language = \Locale::getPrimaryLanguage($locale);
       $region = \Locale::getRegion($locale) ?? null;

And that $prefix = $this->getPluginPath() . '/lib/vendor/citation-style-language/locales/locales-';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants