Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number::spell() - spell numbers as words #43

Closed
wants to merge 1 commit into from

Conversation

tommarshall
Copy link

Firstly, thanks for the project. It's nice idea to package this functionality up together in a lib. Definitely much nicer than the ball of utility functions I normally use.

Would you be interested in adding this Number::toWords() function to the lib?

Summary:

If you're interested in adding this function to php-humanizer would you prefer to see toWords as a function of String like BinarySuffix and MetricSuffix?

Presumably this would also benefit from localisation, although I guess that could be problematic if other languages construct large numbers differently to English? I imagine the ordinalization could also face similar challenges. Again, if you're interested in adding this function I'm happy to add base support for the localisation, if you want it.

@norberttech
Copy link
Member

Hello @tommarshall ! Thanks for your contribution. I was thinkg about this feature :) localisation is definitely required. We can use similar mechanism that we have for datetime difference, I mean translations stored in yaml files. Maybe the name of this function could be "spellNumber"? We can then put it into String::spellNumber($number) or Numer::spell($number), what do you think?

@tommarshall
Copy link
Author

Hi @norzechowicz, thanks for the prompt response!

Number::spell($number) would make the most sense to me, as it feels like a similar function to ordinalize but I'd equally be happy to have it under String::spellNumber($number) if you felt that was more in keeping with project?

I'll add the localisation support. How do you want to name the localization files in Resources/translations? spell-number.en.yml?

Thanks,
Tom

@norberttech
Copy link
Member

Number::spell($number) make more sense to me too.
About resource file name I think number.en.yml would be enough.

@tommarshall
Copy link
Author

I've refactored Number::toWords() to Number::spell() and added base support for the localisation.

Unfortunately I only know the one language, so the convert() function may require some modification in order for the localisation read correctly for other languages if large numbers are constructed differently to in English, but this should hopefully provide a useful base implementation.

Anything you'd like me change?

@tommarshall tommarshall changed the title Number::toWords() - convert number to words Number::spell() - spell numbers as words Oct 30, 2015
/**
* @var array
*/
private $map = array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, I was talking about the $map field but I didnt noticed its used in order to create translation key : / Sorry for that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. No problem. I thought you were favouring implicit scoping for the properties. I'll revert that commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that, just read my other comments and we gonna remove $map and other fields as well

@norberttech
Copy link
Member

@tommarshall thanks, looks better now!

@orestes @lightglitch @dagaa @Forst @sarelvdwalt @ozmodiar @mattallty @cnkt @tbreuss @IgorDePaula @omissis - could you please take a look at this PR and tell us if current implementation gonna handle your native language?

@tommarshall
Copy link
Author

@norzechowicz no problem. Thanks for your help, much cleaner now 👍

Would you like me to squash it down into a single commit?

@norberttech
Copy link
Member

@tommarshall yes please :)

 - Convert '123' to 'one hundred and twenty-three'
 - Includes support for decimals and negatives
 - Includes base support for localisation
 - Includes tests
 - Credit to Karl Rixon for the original function (http://www.karlrixon.co.uk/writing/convert-numbers-to-words-with-php/)
 - Thanks to @norzechowicz for help and advice
@Forst
Copy link
Contributor

Forst commented Oct 30, 2015

I'm not sure if the current version will work, the number words in Russian have to be singular/plural, just as in #23. If the currently used syntax supports "form1|form2|form3", then it's fine.

@norberttech
Copy link
Member

@Forst its possible but @tommarshall would need to pass number as a variable into translation key.

For example $this->translate('number.100', ["%number%" => 100]);, is this what you need? Maybe you could prepare translation file for Russian, this would help us a lot.

@Forst
Copy link
Contributor

Forst commented Oct 31, 2015

Right now I'm short on time, so didn't make a proper PR.

If anyone wants to take over the Russian translation, please go ahead. @sam002

Here's what the translation file should look like with the current spelling code:

hyphen: " "
conjunction: " "
separator: ""
negative: "минус"
# decimal is a workaround for the current spelling code
decimal: "запятая"

number:
  0:                    "ноль"
  1:                    "один"
  2:                    "два"
  3:                    "три"
  4:                    "четыре"
  5:                    "пять"
  6:                    "шесть"
  7:                    "семь"
  8:                    "восемь"
  9:                    "девять"
  10:                   "десять"
  11:                   "одиннадцать"
  12:                   "двенадцать"
  13:                   "тринадцать"
  14:                   "четырнадцать"
  15:                   "пятнадцать"
  16:                   "шестнадцать"
  17:                   "семнадцать"
  18:                   "восемнадцать"
  19:                   "девятнадцать"
  20:                   "двадцать"
  30:                   "тридцать"
  40:                   "сорок"
  50:                   "пятьдесят"
  60:                   "шестдесят"
  70:                   "семьдесят"
  80:                   "восемьдесят"
  90:                   "девяносто"
  100:                  "сто"
  200:                  "двести"
  300:                  "триста"
  400:                  "четыреста"
  500:                  "пятьсот"
  600:                  "шестьсот"
  700:                  "семьсот"
  800:                  "восемьсот"
  900:                  "девятьсот"
  1000:                 "тысяча|тысячи|тысяч"
  1000000:              "миллион|миллиона|миллионов"
  1000000000:           "миллиард|миллиарда|миллиардов"
  1000000000000:        "триллион|триллиона|триллионов"
  1000000000000000:     "квадриллион|квадриллиона|квадриллионов"
  1000000000000000000:  "квинтиллион|квинтиллиона|квинтиллионов"

With this, the number -1234567.89 should turn into минус один миллион двести тридцать четыре тысячи пятьсот шестьдесят семь запятая восемьдесят девять.

Note I had to add 200, 300, …, 900, since those do not obey any particular rules.

Also, the decimal spelling with the current code is not the way it is usually done in Russian. The normal way of pronouncing the number -1234567.89 would be минус один миллион двести тридцать четыре тысячи пятьсот шестьдесят семь целых восемьдесят девять сотых, where целых stands for whole and сотых for hundredth (89/100).

For the proper support of the syntax above, the following should be added in the localization file:

number:
  # The following MUST be used when the number is not an integer, for both whole and decimal parts:
  1_decimal:                    "одна"
  2_decimal:                    "две"
  # the rest are the same as in 'number'

number_decimal:
  # see below for how to use decimals below 0.001
  1:                    "целая|целых|целых"
  0.1:                  "десятая|десятых|десятых"
  0.01:                 "сотая|сотых|сотых"
  0.001:                "тысячная|тысячных|тысячных"
  0.000001:             "миллионная|миллионных|миллионных"
  0.000000001:          "миллиардная|миллиардных|миллиардных"
  0.000000000001:       "триллионная|триллионных|триллионных"
  0.000000000000001:    "квадриллионная|квадриллионных|квадриллионных"
  0.000000000000000001: "квинтиллионная|квинтиллионных|квинтиллионных"

  decimal_ten:          "десяти"
  decimal_hundred:      "сто"

  # 10^-3 becomes одна тысячная (%count% + 0.001)
  # 10^-4 becomes одна десятитысячная (%count% + decimal_ten + 0.001)
  # 10^-5 becomes одна стотысячная (%count% + decimal_hundred + 0.001)
  # 10^-6 becomes одна миллионная (%count + 0.000001)
  # 10^-7 becomes одна десятимиллионная (%count + decimal_ten + 0.000001)
  # 10^-8 becomes одна стомиллионная (%count% + decimal_hundred + 0.000001)

As an example, -1.23456789 becomes минус одна целая двадцать три миллиона четыреста пятьдесят шесть тысяч семьсот восемьдесят девять стомиллионных, but -1 is минус один, since it's an integer.

All these rules are pretty complicated, hope I made them at least somewhat clear.

@lightglitch
Copy link
Contributor

In Portuguese you are going to have the same issue as Russian with the 200, 300, …, 900, they also need to be added. About the other cases I need to do more testing to confirm things.

@orestes
Copy link

orestes commented Oct 31, 2015

All this variations seem to always affect the structure of the translation file. I think we could come up with a base translation strategy that works for latin alphabets/rules/languages and uses that initial set of strings in the translation files. For other alphabets, I think implementing a specific class, using a common base class, would be a better solution. For example, having a HumanizerTranslator class, and extending it in a RussianHumanizerTranslator. This later class could handle the corner cases for that language. Otherwise the translation file format is going to keep changing and affecting the rest of the languages.

@mostertb
Copy link
Contributor

mostertb commented Nov 8, 2015

Hi All

Sorry that I'm late to the party here. I have an idea that might be able to achieve @tommarshall 's feature without too much complexity:

The ICU 56.1 standard (http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html#details) already specifies a method for spelling out numbers using rulesets.

This is implemented, with localisation, in PHP in the NumberFormatter class (http://php.net/manual/en/class.numberformatter.php#intl.numberformatter-constants.unumberformatstyle). It can be used by specifying the NumberFormatter::SPELLOUT style constant.

English Example:

$formatter = new \NumberFormatter('en', \NumberFormatter::SPELLOUT);
echo $formatter->format(-1234567.89).PHP_EOL;
// minus one million two hundred thirty-four thousand five hundred sixty-seven point eight nine

Russian Example:

$formatter = new \NumberFormatter('ru', \NumberFormatter::SPELLOUT);
echo $formatter->format(-1234567.89).PHP_EOL;
// минус один миллион двести тридцать четыре тысячи пятьсот шестьдесят семь запятая восемь девять

This output differs from @Forst 's example of:
минус один миллион двести тридцать четыре тысячи пятьсот шестьдесят семь запятая восемьдесят девять
I dont speak Russian, but this may be because a different DEFAULT_RULESET needs to be specified: http://stackoverflow.com/questions/24282324/numberformatterspellout-spellout-ordinal-in-russian-and-italian

For completeness @orestes here is a Portuguese example

$formatter = new \NumberFormatter('pt', \NumberFormatter::SPELLOUT);
echo $formatter->format(-1234567.89).PHP_EOL;
// menos um milhão e duzentos e trinta e quatro mil e quinhentos e sessenta e sete vírgula oito nove

Lastly, I stumbled on this PEAR package that might help as a good reference:
http://pear.php.net/package/Numbers_Words

I hope this helps

@Forst
Copy link
Contributor

Forst commented Nov 11, 2015

@mosterb The Russian example you gave is the simplest of all to make in code, yet sounds least natural.

@tomasfejfar
Copy link

JFYI \NumberFormatter::SPELLOUT does not work properly for Czech after the comma:

actual:   minus jeden milión dvě stě třicet čtyři tisíc pět set šedesát sedm čárka osm devět
expected: minus jeden milión dvě stě třicet čtyři tisíc pět set šedesát sedm celých osmdesát devět
                                                                             ^-------------------^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants