Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(locale): sort person data #3269

Merged
merged 10 commits into from
Nov 30, 2024
Merged

refactor(locale): sort person data #3269

merged 10 commits into from
Nov 30, 2024

Conversation

ST-DDT
Copy link
Member

@ST-DDT ST-DDT commented Nov 16, 2024

Follow-on to #2265 (the actual follow on will be after #3266)

Preparation for #3266


Sorts (and uniques) the entries in the person module definitions in all locales.

The first commit contains the script changes that were used to apply the changes.

@ST-DDT ST-DDT added p: 1-normal Nothing urgent c: refactor PR that affects the runtime behavior, but doesn't add new features or fixes bugs c: locale Permutes locale definitions m: person Something is referring to the person module labels Nov 16, 2024
@ST-DDT ST-DDT added this to the vAnytime milestone Nov 16, 2024
@ST-DDT ST-DDT requested review from a team November 16, 2024 11:13
@ST-DDT ST-DDT self-assigned this Nov 16, 2024
Copy link

netlify bot commented Nov 16, 2024

Deploy Preview for fakerjs ready!

Name Link
🔨 Latest commit d148920
🔍 Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/674b6ffada4f34000840f6ba
😎 Deploy Preview https://deploy-preview-3269.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Nov 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (42083ac) to head (d148920).
Report is 2 commits behind head on next.

Additional details and impacted files
@@           Coverage Diff           @@
##             next    #3269   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files        2806     2806           
  Lines      217140   217140           
  Branches      980      977    -3     
=======================================
+ Hits       217061   217067    +6     
+ Misses         79       73    -6     
Files with missing lines Coverage Δ
src/locales/af_ZA/person/first_name.ts 100.00% <100.00%> (ø)
src/locales/af_ZA/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/ar/person/first_name.ts 100.00% <100.00%> (ø)
src/locales/ar/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/ar/person/prefix.ts 100.00% <100.00%> (ø)
src/locales/az/person/first_name.ts 100.00% <100.00%> (ø)
src/locales/az/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/first_name.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/prefix.ts 100.00% <100.00%> (ø)
... and 113 more

... and 1 file with indirect coverage changes

@matthewmayer
Copy link
Contributor

This is truncating as well as sorting.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 16, 2024

This is truncating as well as sorting.

That was not intended. I'll revert it and redo it later.

@ST-DDT ST-DDT marked this pull request as draft November 16, 2024 14:04
@ST-DDT ST-DDT force-pushed the refactor/locale/person/sort branch from 318bac6 to 4847a6b Compare November 17, 2024 10:49
@ST-DDT ST-DDT changed the title refactor(locale): normalize person data refactor(locale): sort person data Nov 17, 2024
@ST-DDT ST-DDT marked this pull request as ready for review November 17, 2024 10:59
@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

Should be fixed now. The script I have used to sort (and unique) the entries is in the first commit.

@matthewmayer
Copy link
Contributor

The only thing I'm concerned about is a theoretical situation where:

  1. There are generic definitions which are not just a merge of male and female
  2. In future we might want to attempt to split the generic definitions into male and female.
  3. The sort order of the generic definitions is useful (eg it's all the male names then all the female names)
  4. And so we'd be losing useful information by sorting.

So I want to manually double check to make sure this is not the case for any locales in this PR.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

  1. There are generic definitions which are not just a merge of male and female

This PR doesnt change that.

  1. In future we might want to attempt to split the generic definitions into male and female.

That's what the next PR will do.

At least for things that can be automated. We can later still do manual adjustments.

  1. The sort order of the generic definitions is useful (eg it's all the male names then all the female names)

That feature will be obsolete with the above cleanup PR anyway. Also according to our definitions update, specifically assignable values dont belong in generic any longer.

So we would only loose that info, if generic is present and specifc entries are missing.
I'll check for that case.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

So we would only loose that info, if generic is present and specifc entries are missing.
I'll check for that case.

I checked for generic.length > 0 && (female.length === 0 || male.length === 0).

(I should have filtered for already sorted lists as well)

The following locales are potentially affected: (Click to expand)

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

All last name patterns are basically 1-2 elements and there is no gender aware part/sorting in there (yet).

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

FirstName

En-au-ocker is two concatted sets.
I will create a PR to split it into two.

Ka-Ge is a wild mix of female and male names.
They arent sorted in a particular way, so there is no easy way to split them.
A native speaker has to PR that later.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

The last names are not sex aware in the relevant locales, so there is nothing to do for us for them.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

@matthewmayer Does that solve the concerns you are having? Feel free to do your own analysis.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 17, 2024

Shinigami92
Shinigami92 previously approved these changes Nov 20, 2024
Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LGTM" - no bitcoin miner implementation found 🤣

@ST-DDT ST-DDT requested a review from a team November 20, 2024 17:29
src/locales/es/person/first_name.ts Outdated Show resolved Hide resolved
@ST-DDT ST-DDT marked this pull request as ready for review November 27, 2024 23:12
@ST-DDT ST-DDT requested review from matthewmayer, xDivisionByZerox and a team November 27, 2024 23:12
@ST-DDT ST-DDT dismissed matthewmayer’s stale review November 30, 2024 20:04

Issue has been resolved

@ST-DDT ST-DDT enabled auto-merge November 30, 2024 20:05
@ST-DDT ST-DDT disabled auto-merge November 30, 2024 20:08
@ST-DDT ST-DDT merged commit 01e20e9 into next Nov 30, 2024
23 checks passed
@ST-DDT ST-DDT deleted the refactor/locale/person/sort branch November 30, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Permutes locale definitions c: refactor PR that affects the runtime behavior, but doesn't add new features or fixes bugs m: person Something is referring to the person module p: 1-normal Nothing urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants