Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(person): split prefix into gendered versions #1665

Merged
merged 14 commits into from
Jan 9, 2023

Conversation

matthewmayer
Copy link
Contributor

@matthewmayer matthewmayer commented Dec 13, 2022

Split from #1637

Because the default en locale does not have person/male_prefix.ts and person/female_prefix.ts files, when using faker.person.fullName() you sometimes get a mismatch between the apparent sex of the first name and the title. For example:

Mr. Cathy Klein or Mrs. Nicholas Lemke

This PR adds male_prefix.ts and female_prefix.ts to all locales that currently have a prefix.ts (I used online sources for languages i was not familiar with to check if each prefix was male or female or generic like "Dr"`)

It also clarifies the documentation to make it clear that some locales dont have prefixes and so faker.person.prefix() could return undefined.

I wrote a small script to spit out a Markdown table to be able to more easily review the prefixes for all locales.

const {faker} =require("@faker-js/faker");
let locales = Object.keys(faker.locales)
const renderPrefix = prefix => {
    if (prefix===undefined) {
       return "No file"
    } else if (prefix.length===0) {
        return "Empty array"
    } else {
        return prefix.join(", ")
    }
}
console.log(`| locale | prefix      | male_prefix | female_prefix |
| --- | --- | --- | --- |`)

for (let locale of locales) {
    let prefix = faker.locales[locale].person?.prefix
    let male_prefix = faker.locales[locale].person?.male_prefix
    let female_prefix = faker.locales[locale].person?.female_prefix
    console.log(`| ${locale} | ${renderPrefix(prefix)} | ${renderPrefix(male_prefix)} | ${renderPrefix(female_prefix)} |`)
}

as of 71bf044

locale prefix male_prefix female_prefix
af_ZA No file No file No file
ar سيد, سيدة, آنسة, دكتور, بروفيسور سيد, دكتور, بروفيسور سيدة, آنسة, دكتور, بروفيسور
az Empty array Empty array Empty array
cz Ing., Mgr., JUDr., MUDr. Ing., Mgr., JUDr., MUDr. Ing., Mgr., JUDr., MUDr.
de Herr, Frau, Dr., Prof. Dr. Herr, Dr., Prof. Dr. Frau, Dr., Prof. Dr.
de_AT Herr, Frau, Dr., Prof. Dr. Herr, Dr., Prof. Dr. Frau, Dr., Prof. Dr.
de_CH Herr, Frau, Dr., Prof. Dr. Herr, Dr., Prof. Dr. Frau, Dr., Prof. Dr.
dv އަމީރު, އަމީރާ, ބަނޑޭރި, ބޮޑު އަމީރު, އަމީރާ, ބަނޑޭރި, ބޮޑު އަމީރު, އަމީރާ, ބަނޑޭރި, ބޮޑު
el Κυρία, Κύριος, Δόκτορ Κύριος, Δόκτορ Κυρία, Δόκτορ
en Mr., Mrs., Ms., Miss, Dr. Mr., Dr. Mrs., Ms., Miss, Dr.
en_AU No file No file No file
en_AU_ocker No file No file No file
en_BORK No file No file No file
en_CA No file No file No file
en_GB No file No file No file
en_GH No file No file No file
en_IE No file No file No file
en_IN No file No file No file
en_NG No file No file No file
en_US No file No file No file
en_ZA No file No file No file
es Sr., Sra., Sta. Sr. Sra., Sta.
es_MX Sr., Sra., Sta. Sr. Sra., Sta.
fa آقای, خانم, دکتر آقای, دکتر خانم, دکتر
fi No file No file No file
fr M, Mme, Mlle, Dr, Prof M, Dr, Prof Mme, Mlle, Dr, Prof
fr_BE M, Mme, Mlle, Dr, Prof M, Dr, Prof Mme, Mlle, Dr, Prof
fr_CA No file No file No file
fr_CH M, Mme, Mlle, Dr, Prof M, Dr, Prof Mme, Mlle, Dr, Prof
ge ბ-ნი, ბატონი, ქ-ნი, ქალბატონი ბ-ნი, ბატონი ქ-ნი, ქალბატონი
he מר, גברת, עו"ד, 'פרופ, ד"ר מר, עו"ד, 'פרופ, ד"ר גברת, עו"ד, 'פרופ, ד"ר
hr g., gđa., gđa, dr. g., dr. gđa., gđa, dr.
hu Dr., Prof. Dr., Prof. Dr., Prof.
hy No file No file No file
id_ID Empty array Empty array Empty array
it Sig., Dott., Dr., Ing. Sig., Dott., Dr., Ing. Sig., Dott., Dr., Ing.
ja No file No file No file
ko No file No file No file
lv Prof., Dr., Biedrs Prof., Dr., Biedrs Prof., Dr., Biedrs
mk г-дин, д-р, м-р, г-ѓа, г-ца г-дин, д-р, м-р г-ѓа, г-ца, д-р, м-р
nb_NO Dr., Prof. Dr., Prof. Dr., Prof.
ne No file No file No file
nl Dhr., Mevr. Dr., Bsc, Msc, Prof. Dhr., Bsc, Msc, Prof. Dhr., Mevr. Dr., Bsc, Msc, Prof.
nl_BE Dr., Ir., Ing., Prof. Dr., Ir., Ing., Prof. Dr., Ir., Ing., Prof.
pl Pan, Pani Pan Pani
pt_BR Sr., Sra., Srta., Dr., Dra. Sr., Dr. Sra., Srta., Dra.
pt_PT Sr., Dr., Prof., Eng.º, Sra., Dra., Prof.ª, Eng.ª Sr., Dr., Prof., Eng.º Sra., Dra., Prof.ª, Eng.ª
ro Dl, Dna, Dra Dl, Dra Dna, Dra
ru Empty array Empty array Empty array
sk Ing., Mgr., JUDr., MUDr. Ing., Mgr., JUDr., MUDr. Ing., Mgr., JUDr., MUDr.
sv Dr., Prof., PhD. Dr., Prof., PhD. Dr., Prof., PhD.
tr Bay, Bayan, Dr., Prof. Dr. Bay, Dr., Prof. Dr. Bayan, Dr., Prof. Dr.
uk Пан, Пані Пан Пані
ur محترم., محترمہ., ڈاکٹر محترم., ڈاکٹر محترمہ., ڈاکٹر
vi No file No file No file
zh_CN Empty array Empty array Empty array
zh_TW Empty array Empty array Empty array
zu_ZA No file No file No file

@matthewmayer matthewmayer requested a review from a team as a code owner December 13, 2022 04:43
@codecov
Copy link

codecov bot commented Dec 13, 2022

Codecov Report

Merging #1665 (3cf96c7) into next (35f3869) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff            @@
##             next    #1665    +/-   ##
========================================
  Coverage   99.63%   99.63%            
========================================
  Files        2262     2336    +74     
  Lines      240707   241081   +374     
  Branches     1095     1095            
========================================
+ Hits       239825   240199   +374     
  Misses        861      861            
  Partials       21       21            
Impacted Files Coverage Δ
src/locales/ar/person/female_prefix.ts 100.00% <100.00%> (ø)
src/locales/ar/person/index.ts 100.00% <100.00%> (ø)
src/locales/ar/person/male_prefix.ts 100.00% <100.00%> (ø)
src/locales/ar/person/prefix.ts 100.00% <100.00%> (ø)
src/locales/az/person/female_prefix.ts 100.00% <100.00%> (ø)
src/locales/az/person/index.ts 100.00% <100.00%> (ø)
src/locales/az/person/male_prefix.ts 100.00% <100.00%> (ø)
src/locales/az/person/prefix.ts 100.00% <100.00%> (ø)
src/locales/cz/person/female_prefix.ts 100.00% <100.00%> (ø)
src/locales/cz/person/index.ts 100.00% <100.00%> (ø)
... and 136 more

@ST-DDT ST-DDT added p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug c: locale Permutes locale definitions m: person Something is referring to the person module labels Dec 13, 2022
Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a source for all of these, so I can verify them?

src/locales/az/person/female_prefix.ts Show resolved Hide resolved
src/locales/de_AT/person/female_prefix.ts Outdated Show resolved Hide resolved
src/locales/de/person/female_prefix.ts Outdated Show resolved Hide resolved
src/locales/de/person/male_prefix.ts Outdated Show resolved Hide resolved
src/locales/de_CH/person/female_prefix.ts Outdated Show resolved Hide resolved
src/modules/person/index.ts Outdated Show resolved Hide resolved
@ST-DDT
Copy link
Member

ST-DDT commented Dec 13, 2022

These PRs are quite huge, and some reviewers are tired of reviewing these large PRs.
Could you please create a PR per locale in the future?
(per locale (e.g. en) + per category (e.g. person))
(Not applicable to PRs that only change the locale structure and only move files around)

@matthewmayer
Copy link
Contributor Author

Is reviewing 30 small PRs any less tiresome than one big PR though?

@Shinigami92
Copy link
Member

Shinigami92 commented Dec 13, 2022

Is reviewing 30 small PRs any less tiresome than one big PR though?

I think we somehow need to find a way where PRs are not getting larger than say max 50 files 🤔
30 small PRs might also be a bit extreme, but at the end they might be accepted much simpler


Edit: I'm a heavy GitHub WebUI reviewer and do not use tools like VSCode
and when the diff tab already lags my browser and heavily pressures my browser, I'm not enjoying reviewing it

@ST-DDT
Copy link
Member

ST-DDT commented Dec 13, 2022

We could "group" them by locale groups e.g. de + de_AT + de_CH + ...

@Shinigami92
Copy link
Member

We could "group" them by locale groups e.g. de + de_AT + de_CH + ...

This could make sense as these would also attract more locale aware persons that e.g. in this example can target german locale reviews

@matthewmayer
Copy link
Contributor Author

We could "group" them by locale groups e.g. de + de_AT + de_CH + ...

This could make sense as these would also attract more locale aware persons that e.g. in this example can target german locale reviews

This PR still touches 29 different locales, even if you group.

I feel for locale updates, it's not really possible to hand-review every change line-by-line.
Better to improve the testing around different locales to check for any weird regressions (e.g. adding more snapshot tests in other locales).

@ST-DDT ST-DDT changed the title chore(person): add female_prefix and male_prefix in more locales feat(person): add gendered prefix versions based on existing prefix.ts Dec 13, 2022
@ST-DDT
Copy link
Member

ST-DDT commented Dec 13, 2022

I feel for locale updates, it's not really possible to hand-review every change line-by-line.

I think we should at least do a best effort job to review them (e.g. prevent curse words).
If there is a way to automatically review the changes, then the method to do so should be described in the description.
(e.g. I reviewed this PR by checking that male_prefix ⊆ prefix and female_prefix ⊆ prefix and the 3 changes to prefix are plausible)

Automated Review Script
import { faker } from "@faker-js/faker";

Array.prototype.equals = function (array) {
  // if the other array is a falsy value, return
  if (!array) return false;
  // if the argument is the same array, we can be sure the contents are same as well
  if (array === this) return true;
  // compare lengths - can save a lot of time
  if (this.length != array.length) return false;

  for (var i = 0, l = this.length; i < l; i++) {
    // Check if we have nested arrays
    if (this[i] instanceof Array && array[i] instanceof Array) {
      // recurse into the nested arrays
      if (!this[i].equals(array[i])) return false;
    } else if (this[i] != array[i]) {
      // Warning - two different object instances will never be equal: {x:20} != {x:20}
      return false;
    }
  }
  return true;
};
// Hide method from for-in loops
Object.defineProperty(Array.prototype, "equals", { enumerable: false });

Object.entries(faker.locales).forEach(([locale, localeData]) => {
  const {
    prefix = [],
    male_prefix = [],
    female_prefix = [],
  } = localeData.person ?? {};
  console.log("------------------");

  if (male_prefix) {
    const gendered = Array.from(
      new Set([...male_prefix, ...female_prefix])
    ).sort();
    const ungendered = Array.from(new Set(prefix)).sort();
    if (gendered.equals(ungendered)) {
      console.log(locale, "Matches");
    } else {
      console.log(locale, "Does not match", gendered, ungendered);
    }
  }
});

Could you please also "fix" pt_PT's and mk's prefix.ts?

@matthewmayer
Copy link
Contributor Author

I added a script at the top of the PR which spits out all current prefixes into a Markdown table to make it easier to review.

@matthewmayer
Copy link
Contributor Author

Sources for the fact that certain locales do not use prefix titles:
ru - https://culturalatlas.sbs.com.au/russian-culture/russian-culture-naming
zh_* - https://en.wikipedia.org/wiki/Chinese_titles - only used as suffixes and not with the full name
az - https://www.azer.com/aiweb/categories/topics/azeri/az_learn/az_socio/az_socio/az_socio_72/72_socio_az.html - only suffixes
id_ID - https://en.wikipedia.org/wiki/Indonesian_names#General - bit unclear, i think the correct honorific depends on the relationship between speakers, and so you wouldnt normally include it in say a name on a website.

@ST-DDT
Copy link
Member

ST-DDT commented Dec 14, 2022

Thanks for the clarification.

ST-DDT
ST-DDT previously approved these changes Dec 14, 2022
Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, would like to have these questions addressed
after that => looking good to me 👍 the majority of this PR is a benefit

@matthewmayer
Copy link
Contributor Author

To confirm, do you want all prefix.ts files to be changed to use the new mergeArrays function?

@Shinigami92 Shinigami92 added the needs rebase There is a merge conflict label Dec 23, 2022
Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please switch

 mergeArrays(male_prefix, female_prefix)

to

 mergeArrays(female_prefix, male_prefix)

to use alphabetical order for the merge?

@ST-DDT
Copy link
Member

ST-DDT commented Jan 2, 2023

There seems to be merge conflcts. Can you please address them?
I assume pnpm run generate:locales will do the trick.

ST-DDT
ST-DDT previously approved these changes Jan 2, 2023
Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

@ST-DDT ST-DDT removed the needs rebase There is a merge conflict label Jan 2, 2023
@ST-DDT ST-DDT requested review from Shinigami92 and a team January 2, 2023 13:27
Shinigami92
Shinigami92 previously approved these changes Jan 2, 2023
Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wont like to hold this back, so I will approve for now

We might need a bug issue for #1665 (comment) and handle it somehow
Maybe in a general way

@ST-DDT
Copy link
Member

ST-DDT commented Jan 2, 2023

We might need a bug issue for #1665 (comment) and handle it somehow Maybe in a general way

Done: #893 (comment)

@ST-DDT
Copy link
Member

ST-DDT commented Jan 5, 2023

Looks like a merge conflict crept in. Could you fix it please?

@ST-DDT ST-DDT added the needs rebase There is a merge conflict label Jan 5, 2023
@matthewmayer matthewmayer dismissed stale reviews from Shinigami92 and ST-DDT via d86a64f January 6, 2023 13:19
@ST-DDT ST-DDT removed the needs rebase There is a merge conflict label Jan 6, 2023
@ST-DDT ST-DDT requested review from Shinigami92 and a team January 6, 2023 13:57
Copy link
Member

@xDivisionByZerox xDivisionByZerox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I reviewed for plausibility rather than checking every value on its own. For now, this looks good. If a person has more knowledge of a specific language, they can add/modify/revert in a later PR.

@ST-DDT ST-DDT changed the title feat(person): add gendered prefix versions based on existing prefix.ts feat(person): split prefix into gendered versions Jan 9, 2023
@ST-DDT ST-DDT merged commit 27dff93 into faker-js:next Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Permutes locale definitions m: person Something is referring to the person module p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants