Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(locale): filter and cleanup PersonEntryDefintions data #3266

Draft
wants to merge 15 commits into
base: refactor/person/sex
Choose a base branch
from

Conversation

ST-DDT
Copy link
Member

@ST-DDT ST-DDT commented Nov 15, 2024

Second part of #3058

Extension of #3259


This PR cleans up the PersonEntryDefintions locale data.

  1. generic values are checked whether they exist in exclusively either female and male, if so, they are removed from generic. This solves the issue where generic = merge(female, male)
  2. female values are checked whether they are in generic, if so, they are removed from female.
  3. female values are checked whether they are in male, if so, they are added to generic and removed from female.
  4. male values are checked whether they are in generic, if so, they are removed from male.

I haven't run the script yet, because there is a large diff, due to the person data not being sorted.

Summary (changes only)
locale entry female generic male
af_ZA first_name 107 219 ❌ 113
ar first_name 10 327 ❌ 331
ar prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
az first_name 73 108 ❌ 35
az last_name 10 20 ❌ 10
cs_CZ first_name 785 ➡ 783 1578 ➡ 2 795 ➡ 793
cs_CZ last_name 991 ➡ 980 1979 ➡ 11 999 ➡ 988
cs_CZ prefix 4 ❌ 4 4 ❌
da first_name 109 227 ❌ 118
da middle_name 30 ❌ 30 30 ❌
da prefix 1 2 ❌ 1
de first_name 583 ➡ 573 1145 ➡ 10 572 ➡ 562
de prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_AT first_name 573 1145 ❌ 572
de_AT prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_CH first_name 138 ➡ 137 316 ➡ 1 179 ➡ 178
de_CH prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
dv first_name 49 63 ❌ 14
dv last_name 248 ➡ 243 355 ➡ 5 112 ➡ 107
dv prefix 4 ❌ 4 4 ❌
el first_name 19 55 ❌ 36
el prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
en first_name 500 ➡ 473 3005 ➡ 2240 500 ➡ 473
en middle_name 210 ➡ 207 62 ➡ 40 98 ➡ 95
en prefix 4 ➡ 3 5 ➡ 1 2 ➡ 1
en_AU first_name 100 200 ❌ 100
en_GH first_name 132 ➡ 131 261 ➡ 1 130 ➡ 129
en_IN first_name 288 742 ❌ 454
en_NG first_name 31 98 ❌ 67
en_ZA first_name 291 ➡ 288 546 ➡ 11 250 ➡ 247
eo first_name 90 ➡ 89 179 ➡ 1 90 ➡ 89
eo prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
es prefix 2 3 ❌ 1
es_MX first_name 161 300 ❌ 139
es_MX prefix 2 3 ❌ 1
fa first_name 67 715 ➡ 639 73
fa prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
fi first_name 50 100 ❌ 50
fr first_name 451 ➡ 435 931 ➡ 16 496 ➡ 480
fr prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_BE first_name 1338 ➡ 1293 2591 ➡ 45 1299 ➡ 1254
fr_BE prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_CH first_name 451 ➡ 447 898 ➡ 4 451 ➡ 447
fr_CH prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_SN first_name 79 ➡ 78 181 ➡ 1 103 ➡ 102
he first_name 336 ➡ 218 541 ➡ 118 323 ➡ 205
he prefix 4 ➡ 1 5 ➡ 3 4 ➡ 1
hr first_name 238 ➡ 234 405 ➡ 4 171 ➡ 167
hr prefix 3 ➡ 2 4 ➡ 1 2 ➡ 1
hu first_name 100 200 ❌ 100
hu prefix 2 ❌ 2 2 ❌
hy first_name 46 91 ❌ 45
id_ID first_name 263 ➡ 259 752 ➡ 4 493 ➡ 489
id_ID last_name 109 ➡ 108 257 ➡ 1 149 ➡ 148
it first_name 617 1700 ❌ 1083
it prefix 4 ❌ 4 4 ❌
ja first_name 145 ➡ 144 279 ➡ 1 135 ➡ 134
ka_GE prefix 2 4 ❌ 2
ko first_name 300 ➡ 239 540 ➡ 61 301 ➡ 240
lv first_name 105 196 ❌ 91
lv last_name 207 ➡ 195 401 ➡ 12 206 ➡ 194
lv prefix 3 ❌ 3 3 ❌
mk first_name 232 515 ❌ 283
mk last_name 495 ➡ 458 951 ➡ 37 493 ➡ 456
mk prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
nb_NO first_name 50 100 ❌ 50
nb_NO prefix 2 ❌ 2 2 ❌
ne first_name 18 55 ❌ 37
nl first_name 514 ➡ 499 49 ➡ 15 587 ➡ 572
nl prefix 7 ➡ 1 8 ➡ 6 7 ➡ 1
nl_BE first_name 99 199 ❌ 100
nl_BE prefix 4 ❌ 4 4 ❌
pl first_name 163 393 ❌ 230
pl prefix 1 2 ❌ 1
pt_BR first_name 81 169 ❌ 88
pt_BR prefix 3 5 ❌ 2
pt_PT first_name 93 188 ❌ 95
pt_PT prefix 8 16 ❌ 8
ro first_name 387 674 ❌ 287
ro prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
ro_MD first_name 256 ➡ 245 460 ➡ 11 215 ➡ 204
ro_MD prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
ru first_name 80 401 ❌ 321
ru last_name 250 500 ❌ 250
sk first_name 200 ➡ 199 391 ➡ 1 192 ➡ 191
sk last_name 251 508 ❌ 257
sk prefix 4 ❌ 4 4 ❌
sr_RS_latin first_name 200 400 ❌ 200
sv first_name 100 200 ❌ 100
sv prefix 3 ❌ 3 3 ❌
th first_name 687 ➡ 681 1159 ➡ 6 478 ➡ 472
th prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
tr first_name 404 ➡ 392 730 ➡ 679 735 ➡ 723
tr prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
uk first_name 192 387 ❌ 195
uk last_name 230 ➡ 58 297 ➡ 172 239 ➡ 67
uk prefix 1 2 ❌ 1
ur first_name 18 36 ❌ 18
ur prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
uz_UZ_latin first_name 133 360 ❌ 227
uz_UZ_latin last_name 209 ➡ 207 416 ➡ 2 209 ➡ 207
vi first_name 1298 ➡ 1264 2488 ➡ 34 1224 ➡ 1190
zh_CN first_name 85 164 ➡ 115 78
zh_TW first_name 41 113 ❌ 72
zu_ZA first_name 49 ➡ 48 98 ➡ 1 50 ➡ 49

@ST-DDT ST-DDT added p: 1-normal Nothing urgent c: locale Permutes locale definitions m: person Something is referring to the person module labels Nov 15, 2024
@ST-DDT ST-DDT added this to the vAnytime milestone Nov 15, 2024
@ST-DDT ST-DDT self-assigned this Nov 15, 2024
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (fb732cc) to head (7db34bd).
Report is 13 commits behind head on refactor/person/sex.

Additional details and impacted files
@@                   Coverage Diff                   @@
##           refactor/person/sex    #3266      +/-   ##
=======================================================
- Coverage                99.97%   99.96%   -0.02%     
=======================================================
  Files                     2806     2806              
  Lines                   217095   183754   -33341     
  Branches                   976      974       -2     
=======================================================
- Hits                    217035   183683   -33352     
- Misses                      60       71      +11     
Files with missing lines Coverage Δ
src/locales/af_ZA/person/first_name.ts 100.00% <ø> (ø)
src/locales/ar/location/street_pattern.ts 100.00% <100.00%> (ø)
src/locales/ar/person/first_name.ts 100.00% <ø> (ø)
src/locales/ar/person/prefix.ts 100.00% <100.00%> (ø)
src/locales/az/person/first_name.ts 100.00% <ø> (ø)
src/locales/az/person/last_name.ts 100.00% <ø> (ø)
src/locales/cs_CZ/company/name_pattern.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/first_name.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/last_name.ts 100.00% <100.00%> (ø)
src/locales/cs_CZ/person/prefix.ts 100.00% <100.00%> (ø)
... and 163 more

... and 1 file with indirect coverage changes

@matthewmayer
Copy link
Contributor

Perhaps it would be possible to summarise the length of each locale definition file before and after the changes to get a feel for what the actual impacted methods and locales are without having to review the giant diff

eg
en.person.prefix.generic from 6 to 4 entries
...

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 15, 2024

what the actual impacted methods and locales are without having to review the giant diff

Click to expand
locale entry female generic male
af_ZA first_name 107 ➡ 107 219 ➡ 0 113 ➡ 113
af_ZA last_name 0 ➡ 0 162 ➡ 162 0 ➡ 0
af_ZA last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ar first_name 10 ➡ 10 327 ➡ 0 331 ➡ 331
ar last_name 0 ➡ 0 76 ➡ 76 0 ➡ 0
ar last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ar prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
az first_name 73 ➡ 73 108 ➡ 0 35 ➡ 35
az last_name 10 ➡ 10 20 ➡ 0 10 ➡ 10
az last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
cs_CZ first_name 785 ➡ 783 1578 ➡ 2 795 ➡ 793
cs_CZ last_name 991 ➡ 980 1979 ➡ 11 999 ➡ 988
cs_CZ last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
cs_CZ prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
da first_name 109 ➡ 109 227 ➡ 0 118 ➡ 118
da last_name 0 ➡ 0 106 ➡ 106 0 ➡ 0
da last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
da middle_name 30 ➡ 0 30 ➡ 30 30 ➡ 0
da prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
de first_name 583 ➡ 573 1145 ➡ 10 572 ➡ 562
de last_name 0 ➡ 0 1688 ➡ 1688 0 ➡ 0
de last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
de prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_AT first_name 573 ➡ 573 1145 ➡ 0 572 ➡ 572
de_AT last_name 0 ➡ 0 1688 ➡ 1688 0 ➡ 0
de_AT last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
de_AT prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_CH first_name 138 ➡ 137 316 ➡ 1 179 ➡ 178
de_CH last_name 0 ➡ 0 209 ➡ 209 0 ➡ 0
de_CH last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
de_CH prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
dv first_name 49 ➡ 49 63 ➡ 0 14 ➡ 14
dv last_name 248 ➡ 243 355 ➡ 5 112 ➡ 107
dv last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
dv prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
el first_name 19 ➡ 19 55 ➡ 0 36 ➡ 36
el last_name 0 ➡ 0 200 ➡ 200 0 ➡ 0
el last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
el prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
en first_name 500 ➡ 473 3005 ➡ 2240 500 ➡ 473
en last_name 0 ➡ 0 473 ➡ 473 0 ➡ 0
en last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en middle_name 210 ➡ 207 62 ➡ 40 98 ➡ 95
en prefix 4 ➡ 3 5 ➡ 1 2 ➡ 1
en_AU first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
en_AU last_name 0 ➡ 0 286 ➡ 286 0 ➡ 0
en_AU last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_AU_ocker first_name 0 ➡ 0 104 ➡ 104 0 ➡ 0
en_AU_ocker last_name 0 ➡ 0 24 ➡ 24 0 ➡ 0
en_AU_ocker last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_BORK last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_CA last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_GB last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_GH first_name 132 ➡ 131 261 ➡ 1 130 ➡ 129
en_GH last_name 0 ➡ 0 120 ➡ 120 0 ➡ 0
en_GH last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_HK last_name 0 ➡ 0 97 ➡ 97 0 ➡ 0
en_HK last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
en_IE last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_IN first_name 288 ➡ 288 742 ➡ 0 454 ➡ 454
en_IN last_name 0 ➡ 0 92 ➡ 92 0 ➡ 0
en_IN last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_NG first_name 31 ➡ 31 98 ➡ 0 67 ➡ 67
en_NG last_name 0 ➡ 0 156 ➡ 156 0 ➡ 0
en_NG last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_US last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
en_ZA first_name 291 ➡ 288 546 ➡ 11 250 ➡ 247
en_ZA last_name 0 ➡ 0 237 ➡ 237 0 ➡ 0
en_ZA last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
eo first_name 90 ➡ 89 179 ➡ 1 90 ➡ 89
eo last_name 0 ➡ 0 100 ➡ 100 0 ➡ 0
eo last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
eo prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
es first_name 11 ➡ 11 213 ➡ 198 18 ➡ 18
es last_name 0 ➡ 0 625 ➡ 625 0 ➡ 0
es last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
es prefix 2 ➡ 2 3 ➡ 0 1 ➡ 1
es_MX first_name 161 ➡ 161 300 ➡ 0 139 ➡ 139
es_MX last_name 0 ➡ 0 687 ➡ 687 0 ➡ 0
es_MX last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
es_MX prefix 2 ➡ 2 3 ➡ 0 1 ➡ 1
fa first_name 67 ➡ 67 715 ➡ 639 73 ➡ 73
fa last_name 0 ➡ 0 144 ➡ 144 0 ➡ 0
fa last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fa prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
fi first_name 50 ➡ 50 100 ➡ 0 50 ➡ 50
fi last_name 0 ➡ 0 50 ➡ 50 0 ➡ 0
fi last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr first_name 451 ➡ 435 931 ➡ 16 496 ➡ 480
fr last_name 0 ➡ 0 150 ➡ 150 0 ➡ 0
fr last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_BE first_name 1338 ➡ 1293 2591 ➡ 45 1299 ➡ 1254
fr_BE last_name 0 ➡ 0 615 ➡ 615 0 ➡ 0
fr_BE last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr_BE prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_CA last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr_CH first_name 451 ➡ 447 898 ➡ 4 451 ➡ 447
fr_CH last_name 0 ➡ 0 199 ➡ 199 0 ➡ 0
fr_CH last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr_CH prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_LU last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
fr_SN first_name 79 ➡ 78 181 ➡ 1 103 ➡ 102
fr_SN last_name 0 ➡ 0 148 ➡ 148 0 ➡ 0
fr_SN last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
he first_name 336 ➡ 218 541 ➡ 118 323 ➡ 205
he last_name 0 ➡ 0 738 ➡ 738 0 ➡ 0
he last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
he prefix 4 ➡ 1 5 ➡ 3 4 ➡ 1
hr first_name 238 ➡ 234 405 ➡ 4 171 ➡ 167
hr last_name 0 ➡ 0 1000 ➡ 1000 0 ➡ 0
hr last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
hr prefix 3 ➡ 2 4 ➡ 1 2 ➡ 1
hu first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
hu last_name 0 ➡ 0 100 ➡ 100 0 ➡ 0
hu last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
hu prefix 2 ➡ 0 2 ➡ 2 2 ➡ 0
hy first_name 46 ➡ 46 91 ➡ 0 45 ➡ 45
hy last_name 0 ➡ 0 47 ➡ 47 0 ➡ 0
hy last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
id_ID first_name 263 ➡ 259 752 ➡ 4 493 ➡ 489
id_ID last_name 109 ➡ 108 257 ➡ 1 149 ➡ 148
id_ID last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
it first_name 617 ➡ 617 1700 ➡ 6 1083 ➡ 1083
it last_name 0 ➡ 0 2170 ➡ 2170 0 ➡ 0
it last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
it prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
ja first_name 145 ➡ 144 279 ➡ 1 135 ➡ 134
ja last_name 0 ➡ 0 20 ➡ 20 0 ➡ 0
ja last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ka_GE first_name 0 ➡ 0 494 ➡ 494 0 ➡ 0
ka_GE last_name 0 ➡ 0 169 ➡ 169 0 ➡ 0
ka_GE last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ka_GE prefix 2 ➡ 2 4 ➡ 0 2 ➡ 2
ko first_name 300 ➡ 239 540 ➡ 61 301 ➡ 240
ko last_name 0 ➡ 0 112 ➡ 112 0 ➡ 0
ko last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
lv first_name 105 ➡ 105 196 ➡ 0 91 ➡ 91
lv last_name 207 ➡ 195 401 ➡ 12 206 ➡ 194
lv last_name_pattern 2 ➡ 2 0 ➡ 0 2 ➡ 2
lv prefix 3 ➡ 0 3 ➡ 3 3 ➡ 0
mk first_name 232 ➡ 232 515 ➡ 0 283 ➡ 283
mk last_name 495 ➡ 458 951 ➡ 37 493 ➡ 456
mk last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
mk prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
nb_NO first_name 50 ➡ 50 100 ➡ 0 50 ➡ 50
nb_NO last_name 0 ➡ 0 100 ➡ 100 0 ➡ 0
nb_NO last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
nb_NO prefix 2 ➡ 0 2 ➡ 2 2 ➡ 0
ne first_name 18 ➡ 18 55 ➡ 0 37 ➡ 37
ne last_name 0 ➡ 0 39 ➡ 39 0 ➡ 0
ne last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
nl first_name 514 ➡ 499 49 ➡ 15 587 ➡ 572
nl last_name 0 ➡ 0 131 ➡ 131 0 ➡ 0
nl last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
nl prefix 7 ➡ 1 8 ➡ 6 7 ➡ 1
nl_BE first_name 99 ➡ 99 199 ➡ 0 100 ➡ 100
nl_BE last_name 0 ➡ 0 32 ➡ 32 0 ➡ 0
nl_BE last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
nl_BE prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
pl first_name 163 ➡ 163 393 ➡ 0 230 ➡ 230
pl last_name 0 ➡ 0 712 ➡ 712 0 ➡ 0
pl last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
pl prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
pt_BR first_name 80 ➡ 80 169 ➡ 1 88 ➡ 88
pt_BR last_name 0 ➡ 0 21 ➡ 21 0 ➡ 0
pt_BR last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
pt_BR prefix 3 ➡ 3 5 ➡ 0 2 ➡ 2
pt_PT first_name 93 ➡ 93 188 ➡ 0 95 ➡ 95
pt_PT last_name 0 ➡ 0 121 ➡ 121 0 ➡ 0
pt_PT last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
pt_PT prefix 8 ➡ 8 16 ➡ 0 8 ➡ 8
ro first_name 387 ➡ 387 674 ➡ 0 287 ➡ 287
ro last_name 0 ➡ 0 300 ➡ 300 0 ➡ 0
ro last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ro prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
ro_MD first_name 256 ➡ 245 460 ➡ 11 215 ➡ 204
ro_MD last_name 0 ➡ 0 299 ➡ 299 0 ➡ 0
ro_MD prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
ru first_name 80 ➡ 80 401 ➡ 0 321 ➡ 321
ru last_name 250 ➡ 250 500 ➡ 0 250 ➡ 250
ru last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
ru middle_name 79 ➡ 79 0 ➡ 0 132 ➡ 132
sk first_name 200 ➡ 199 391 ➡ 1 192 ➡ 191
sk last_name 251 ➡ 251 508 ➡ 0 257 ➡ 257
sk last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
sk prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
sr_RS_latin first_name 200 ➡ 200 400 ➡ 0 200 ➡ 200
sr_RS_latin last_name 0 ➡ 0 999 ➡ 999 0 ➡ 0
sv first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
sv last_name 0 ➡ 0 100 ➡ 100 0 ➡ 0
sv last_name_pattern 0 ➡ 0 2 ➡ 2 0 ➡ 0
sv prefix 3 ➡ 0 3 ➡ 3 3 ➡ 0
th first_name 687 ➡ 681 1159 ➡ 6 478 ➡ 472
th last_name 0 ➡ 0 111 ➡ 111 0 ➡ 0
th prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
tr first_name 404 ➡ 392 730 ➡ 679 735 ➡ 723
tr last_name 0 ➡ 0 198 ➡ 198 0 ➡ 0
tr last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
tr prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
uk first_name 192 ➡ 192 387 ➡ 0 195 ➡ 195
uk last_name 230 ➡ 58 297 ➡ 172 239 ➡ 67
uk last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
uk middle_name 116 ➡ 116 0 ➡ 0 116 ➡ 116
uk prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
ur first_name 18 ➡ 18 36 ➡ 0 18 ➡ 18
ur last_name 0 ➡ 0 20 ➡ 20 0 ➡ 0
ur last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
ur prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
uz_UZ_latin first_name 133 ➡ 133 360 ➡ 0 227 ➡ 227
uz_UZ_latin last_name 209 ➡ 207 416 ➡ 2 209 ➡ 207
uz_UZ_latin last_name_pattern 1 ➡ 1 0 ➡ 0 1 ➡ 1
vi first_name 1298 ➡ 1264 2488 ➡ 34 1224 ➡ 1190
vi last_name 0 ➡ 0 26 ➡ 26 0 ➡ 0
vi last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
yo_NG first_name 84 ➡ 84 61 ➡ 61 86 ➡ 86
yo_NG last_name 0 ➡ 0 98 ➡ 98 0 ➡ 0
yo_NG last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
zh_CN first_name 85 ➡ 85 164 ➡ 115 78 ➡ 78
zh_CN last_name 0 ➡ 0 1000 ➡ 1000 0 ➡ 0
zh_CN last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
zh_TW first_name 41 ➡ 41 113 ➡ 0 72 ➡ 72
zh_TW last_name 0 ➡ 0 100 ➡ 100 0 ➡ 0
zh_TW last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0
zu_ZA first_name 49 ➡ 48 98 ➡ 1 50 ➡ 49
zu_ZA last_name 0 ➡ 0 96 ➡ 96 0 ➡ 0
zu_ZA last_name_pattern 0 ➡ 0 1 ➡ 1 0 ➡ 0

@matthewmayer
Copy link
Contributor

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic

image image

and then that single name will be returned 20% of the time?

@matthewmayer
Copy link
Contributor

Also the en generic first name list are not actually all generic

image

@matthewmayer
Copy link
Contributor

matthewmayer commented Nov 15, 2024

I sorted by entry first, and removed 0 ➡ 0 entries to make it easier to skim

locale entry female generic male
af_ZA first_name 107 ➡ 107 219 ➡ 0 113 ➡ 113
ar first_name 10 ➡ 10 327 ➡ 0 331 ➡ 331
az first_name 73 ➡ 73 108 ➡ 0 35 ➡ 35
cs_CZ first_name 785 ➡ 783 1578 ➡ 2 795 ➡ 793
da first_name 109 ➡ 109 227 ➡ 0 118 ➡ 118
de first_name 583 ➡ 573 1145 ➡ 10 572 ➡ 562
de_AT first_name 573 ➡ 573 1145 ➡ 0 572 ➡ 572
de_CH first_name 138 ➡ 137 316 ➡ 1 179 ➡ 178
dv first_name 49 ➡ 49 63 ➡ 0 14 ➡ 14
el first_name 19 ➡ 19 55 ➡ 0 36 ➡ 36
en first_name 500 ➡ 473 3005 ➡ 2240 500 ➡ 473
en_AU first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
en_AU_ocker first_name 104 ➡ 104
en_GH first_name 132 ➡ 131 261 ➡ 1 130 ➡ 129
en_IN first_name 288 ➡ 288 742 ➡ 0 454 ➡ 454
en_NG first_name 31 ➡ 31 98 ➡ 0 67 ➡ 67
en_ZA first_name 291 ➡ 288 546 ➡ 11 250 ➡ 247
eo first_name 90 ➡ 89 179 ➡ 1 90 ➡ 89
es first_name 11 ➡ 11 213 ➡ 198 18 ➡ 18
es_MX first_name 161 ➡ 161 300 ➡ 0 139 ➡ 139
fa first_name 67 ➡ 67 715 ➡ 639 73 ➡ 73
fi first_name 50 ➡ 50 100 ➡ 0 50 ➡ 50
fr first_name 451 ➡ 435 931 ➡ 16 496 ➡ 480
fr_BE first_name 1338 ➡ 1293 2591 ➡ 45 1299 ➡ 1254
fr_CH first_name 451 ➡ 447 898 ➡ 4 451 ➡ 447
fr_SN first_name 79 ➡ 78 181 ➡ 1 103 ➡ 102
he first_name 336 ➡ 218 541 ➡ 118 323 ➡ 205
hr first_name 238 ➡ 234 405 ➡ 4 171 ➡ 167
hu first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
hy first_name 46 ➡ 46 91 ➡ 0 45 ➡ 45
id_ID first_name 263 ➡ 259 752 ➡ 4 493 ➡ 489
it first_name 617 ➡ 617 1700 ➡ 6 1083 ➡ 1083
ja first_name 145 ➡ 144 279 ➡ 1 135 ➡ 134
ka_GE first_name 494 ➡ 494
ko first_name 300 ➡ 239 540 ➡ 61 301 ➡ 240
lv first_name 105 ➡ 105 196 ➡ 0 91 ➡ 91
mk first_name 232 ➡ 232 515 ➡ 0 283 ➡ 283
nb_NO first_name 50 ➡ 50 100 ➡ 0 50 ➡ 50
ne first_name 18 ➡ 18 55 ➡ 0 37 ➡ 37
nl first_name 514 ➡ 499 49 ➡ 15 587 ➡ 572
nl_BE first_name 99 ➡ 99 199 ➡ 0 100 ➡ 100
pl first_name 163 ➡ 163 393 ➡ 0 230 ➡ 230
pt_BR first_name 80 ➡ 80 169 ➡ 1 88 ➡ 88
pt_PT first_name 93 ➡ 93 188 ➡ 0 95 ➡ 95
ro first_name 387 ➡ 387 674 ➡ 0 287 ➡ 287
ro_MD first_name 256 ➡ 245 460 ➡ 11 215 ➡ 204
ru first_name 80 ➡ 80 401 ➡ 0 321 ➡ 321
sk first_name 200 ➡ 199 391 ➡ 1 192 ➡ 191
sr_RS_latin first_name 200 ➡ 200 400 ➡ 0 200 ➡ 200
sv first_name 100 ➡ 100 200 ➡ 0 100 ➡ 100
th first_name 687 ➡ 681 1159 ➡ 6 478 ➡ 472
tr first_name 404 ➡ 392 730 ➡ 679 735 ➡ 723
uk first_name 192 ➡ 192 387 ➡ 0 195 ➡ 195
ur first_name 18 ➡ 18 36 ➡ 0 18 ➡ 18
uz_UZ_latin first_name 133 ➡ 133 360 ➡ 0 227 ➡ 227
vi first_name 1298 ➡ 1264 2488 ➡ 34 1224 ➡ 1190
yo_NG first_name 84 ➡ 84 61 ➡ 61 86 ➡ 86
zh_CN first_name 85 ➡ 85 164 ➡ 115 78 ➡ 78
zh_TW first_name 41 ➡ 41 113 ➡ 0 72 ➡ 72
zu_ZA first_name 49 ➡ 48 98 ➡ 1 50 ➡ 49
af_ZA last_name 162 ➡ 162
ar last_name 76 ➡ 76
az last_name 10 ➡ 10 20 ➡ 0 10 ➡ 10
cs_CZ last_name 991 ➡ 980 1979 ➡ 11 999 ➡ 988
da last_name 106 ➡ 106
de last_name 1688 ➡ 1688
de_AT last_name 1688 ➡ 1688
de_CH last_name 209 ➡ 209
dv last_name 248 ➡ 243 355 ➡ 5 112 ➡ 107
el last_name 200 ➡ 200
en last_name 473 ➡ 473
en_AU last_name 286 ➡ 286
en_AU_ocker last_name 24 ➡ 24
en_GH last_name 120 ➡ 120
en_HK last_name 97 ➡ 97
en_IN last_name 92 ➡ 92
en_NG last_name 156 ➡ 156
en_ZA last_name 237 ➡ 237
eo last_name 100 ➡ 100
es last_name 625 ➡ 625
es_MX last_name 687 ➡ 687
fa last_name 144 ➡ 144
fi last_name 50 ➡ 50
fr last_name 150 ➡ 150
fr_BE last_name 615 ➡ 615
fr_CH last_name 199 ➡ 199
fr_SN last_name 148 ➡ 148
he last_name 738 ➡ 738
hr last_name 1000 ➡ 1000
hu last_name 100 ➡ 100
hy last_name 47 ➡ 47
id_ID last_name 109 ➡ 108 257 ➡ 1 149 ➡ 148
it last_name 2170 ➡ 2170
ja last_name 20 ➡ 20
ka_GE last_name 169 ➡ 169
ko last_name 112 ➡ 112
lv last_name 207 ➡ 195 401 ➡ 12 206 ➡ 194
mk last_name 495 ➡ 458 951 ➡ 37 493 ➡ 456
nb_NO last_name 100 ➡ 100
ne last_name 39 ➡ 39
nl last_name 131 ➡ 131
nl_BE last_name 32 ➡ 32
pl last_name 712 ➡ 712
pt_BR last_name 21 ➡ 21
pt_PT last_name 121 ➡ 121
ro last_name 300 ➡ 300
ro_MD last_name 299 ➡ 299
ru last_name 250 ➡ 250 500 ➡ 0 250 ➡ 250
sk last_name 251 ➡ 251 508 ➡ 0 257 ➡ 257
sr_RS_latin last_name 999 ➡ 999
sv last_name 100 ➡ 100
th last_name 111 ➡ 111
tr last_name 198 ➡ 198
uk last_name 230 ➡ 58 297 ➡ 172 239 ➡ 67
ur last_name 20 ➡ 20
uz_UZ_latin last_name 209 ➡ 207 416 ➡ 2 209 ➡ 207
vi last_name 26 ➡ 26
yo_NG last_name 98 ➡ 98
zh_CN last_name 1000 ➡ 1000
zh_TW last_name 100 ➡ 100
zu_ZA last_name 96 ➡ 96
af_ZA last_name_pattern 1 ➡ 1
ar last_name_pattern 1 ➡ 1
az last_name_pattern 1 ➡ 1 1 ➡ 1
cs_CZ last_name_pattern 1 ➡ 1 1 ➡ 1
da last_name_pattern 2 ➡ 2
de last_name_pattern 1 ➡ 1
de_AT last_name_pattern 1 ➡ 1
de_CH last_name_pattern 1 ➡ 1
dv last_name_pattern 1 ➡ 1 1 ➡ 1
el last_name_pattern 1 ➡ 1
en last_name_pattern 2 ➡ 2
en_AU last_name_pattern 2 ➡ 2
en_AU_ocker last_name_pattern 2 ➡ 2
en_BORK last_name_pattern 2 ➡ 2
en_CA last_name_pattern 2 ➡ 2
en_GB last_name_pattern 2 ➡ 2
en_GH last_name_pattern 2 ➡ 2
en_HK last_name_pattern 1 ➡ 1
en_IE last_name_pattern 2 ➡ 2
en_IN last_name_pattern 2 ➡ 2
en_NG last_name_pattern 2 ➡ 2
en_US last_name_pattern 2 ➡ 2
en_ZA last_name_pattern 2 ➡ 2
eo last_name_pattern 2 ➡ 2
es last_name_pattern 1 ➡ 1
es_MX last_name_pattern 2 ➡ 2
fa last_name_pattern 1 ➡ 1
fi last_name_pattern 1 ➡ 1
fr last_name_pattern 1 ➡ 1
fr_BE last_name_pattern 1 ➡ 1
fr_CA last_name_pattern 1 ➡ 1
fr_CH last_name_pattern 1 ➡ 1
fr_LU last_name_pattern 1 ➡ 1
fr_SN last_name_pattern 1 ➡ 1
he last_name_pattern 1 ➡ 1
hr last_name_pattern 1 ➡ 1
hu last_name_pattern 1 ➡ 1
hy last_name_pattern 1 ➡ 1
id_ID last_name_pattern 1 ➡ 1 1 ➡ 1
it last_name_pattern 1 ➡ 1
ja last_name_pattern 1 ➡ 1
ka_GE last_name_pattern 1 ➡ 1
ko last_name_pattern 1 ➡ 1
lv last_name_pattern 2 ➡ 2 2 ➡ 2
mk last_name_pattern 1 ➡ 1 1 ➡ 1
nb_NO last_name_pattern 2 ➡ 2
ne last_name_pattern 1 ➡ 1
nl last_name_pattern 1 ➡ 1
nl_BE last_name_pattern 1 ➡ 1
pl last_name_pattern 1 ➡ 1
pt_BR last_name_pattern 1 ➡ 1
pt_PT last_name_pattern 1 ➡ 1
ro last_name_pattern 1 ➡ 1
ru last_name_pattern 1 ➡ 1 1 ➡ 1
sk last_name_pattern 1 ➡ 1 1 ➡ 1
sv last_name_pattern 2 ➡ 2
tr last_name_pattern 1 ➡ 1
uk last_name_pattern 1 ➡ 1 1 ➡ 1
ur last_name_pattern 1 ➡ 1
uz_UZ_latin last_name_pattern 1 ➡ 1 1 ➡ 1
vi last_name_pattern 1 ➡ 1
yo_NG last_name_pattern 1 ➡ 1
zh_CN last_name_pattern 1 ➡ 1
zh_TW last_name_pattern 1 ➡ 1
zu_ZA last_name_pattern 1 ➡ 1
da middle_name 30 ➡ 0 30 ➡ 30 30 ➡ 0
en middle_name 210 ➡ 207 62 ➡ 40 98 ➡ 95
ru middle_name 79 ➡ 79 132 ➡ 132
uk middle_name 116 ➡ 116 116 ➡ 116
ar prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
cs_CZ prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
da prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
de prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_AT prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
de_CH prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
dv prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
el prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
en prefix 4 ➡ 3 5 ➡ 1 2 ➡ 1
eo prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
es prefix 2 ➡ 2 3 ➡ 0 1 ➡ 1
es_MX prefix 2 ➡ 2 3 ➡ 0 1 ➡ 1
fa prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
fr prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_BE prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
fr_CH prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
he prefix 4 ➡ 1 5 ➡ 3 4 ➡ 1
hr prefix 3 ➡ 2 4 ➡ 1 2 ➡ 1
hu prefix 2 ➡ 0 2 ➡ 2 2 ➡ 0
it prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
ka_GE prefix 2 ➡ 2 4 ➡ 0 2 ➡ 2
lv prefix 3 ➡ 0 3 ➡ 3 3 ➡ 0
mk prefix 4 ➡ 2 5 ➡ 2 3 ➡ 1
nb_NO prefix 2 ➡ 0 2 ➡ 2 2 ➡ 0
nl prefix 7 ➡ 1 8 ➡ 6 7 ➡ 1
nl_BE prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
pl prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
pt_BR prefix 3 ➡ 3 5 ➡ 0 2 ➡ 2
pt_PT prefix 8 ➡ 8 16 ➡ 0 8 ➡ 8
ro prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
ro_MD prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1
sk prefix 4 ➡ 0 4 ➡ 4 4 ➡ 0
sv prefix 3 ➡ 0 3 ➡ 3 3 ➡ 0
th prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
tr prefix 3 ➡ 1 4 ➡ 2 3 ➡ 1
uk prefix 1 ➡ 1 2 ➡ 0 1 ➡ 1
ur prefix 2 ➡ 1 3 ➡ 1 2 ➡ 1

@matthewmayer
Copy link
Contributor

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic

image image
and then that single name will be returned 20% of the time?

previously

"If female/male is requested: Then the method will mostly (80%) return female/male values with some (20%) generic values sprinkled in. "

This suggests to me that rather than a fixed 80% gendered and 20% generic result, if say female is requested it should pick randomly from the female definitions concatenated with the generic definitions, so that locales with only a small number of generic definitions dont keep picking the same small number of generic names.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 15, 2024

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic
image image

and then that single name will be returned 20% of the time?

Yes, we can add new generic entries to them later or change the distribution. Do you have suggestion/prefered solution?

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 15, 2024

This suggests to me that rather than a fixed 80% gendered and 20% generic result, if say female is requested it should pick randomly from the female definitions concatenated with the generic definitions, so that locales with only a small number of generic definitions dont keep picking the same small number of generic names.

We also considered this.
There is one downside to this. If (fe-)male has a small set compared to generic, then you get odd distributions there. E.g. 50% Mr. and 50% Dr.

We also considerd weighting them:

  • binary.length vs generic.length
  • binary.length + x vs generic.length
  • binary.length * x vs generic.length
  • binary = 4 vs generic = 1 (current PR)
  • binary = 1 - genericPercentage vs generic = genericPercentage

In summary, we haven found the perfect solution yet.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 15, 2024

I kind of tend to just proceed with a non optimal weight distribution and tweak it in subsequent PRs.

@matthewmayer
Copy link
Contributor

What if the percentage of generic names was something that could be set for each locale definition seperately?

export default {
  generic: ['Dr.'],
  female: ['Mrs.', 'Ms.', 'Miss'],
  male: ['Mr.'],
  generic_probability:0.1
};

Then you could have say a 10% chance of getting a gender-neutral english prefix, but a 50% chance of getting a gender-neutral Chinese first_name.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 16, 2024

What if the percentage of generic names was something that could be set for each locale definition seperately?

Is finding the right percentage for each distribution a (merge or release) blocking issue for you or is that something we can adjust in later PRs?

@matthewmayer
Copy link
Contributor

I'd say yes. Having 20 percent of all Japanese first names output the same because there is only one generic name feels like a bug/ regression.

@ST-DDT ST-DDT added the s: needs decision Needs team/maintainer decision label Nov 16, 2024
@matthewmayer
Copy link
Contributor

Also the en generic first name list are not actually all generic

image

I'm not sure what the best way to handle this is. We don't want to leave them all in generic otherwise female names would be returned when you asked for male.

So we would have to go through and split the generic names into male and female. There are some (free and paid) apis which might be able to help with that like https://genderize.io/

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 24, 2024

My plan for this - if you are fine with it - looks like this:

Please let me know what you think of this and what your suggestions are.

@matthewmayer
Copy link
Contributor

matthewmayer commented Nov 24, 2024

In general that sounds fine. There's no great hurry for this and we are less likely to accidentally break things if we spread this over a few releases.

However I think we should try to figure out what we will do the problematic locales so we don't get stuck in future. Even if we truncate en generic first names to 1000 first that's a lot to go through by hand.

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 24, 2024

However I think we should try to figure out what we will do the problematic locales so we don't get stuck in future. Even if we truncate en generic first names to 1000 first that's a lot to go through by hand.

IMO we can either check the existing list, which can be a lot, or we could search for a new list. Whatever is easier for us.

@matthewmayer
Copy link
Contributor

Would we allow 1000 male and 1000 female names? Or 1000 total across all genders?

@ST-DDT
Copy link
Member Author

ST-DDT commented Nov 25, 2024

I think the current script limits it to up to 1000 each.

@ST-DDT
Copy link
Member Author

ST-DDT commented Dec 3, 2024

I'd say yes. Having 20 percent of all Japanese first names output the same because there is only one generic name feels like a bug/ regression.

How about using a ratio of:

sqrt(specific) + 5 vs sqrt(generic)

Percentage of choosing specific

  1 generic 5 10 50 100 500 1000
1 specific 86% 73% 65% 46% 38% 21% 16%
5 88% 76% 70% 51% 42% 24% 19%
10 89% 78% 72% 54% 45% 27% 21%
50 92% 84% 79% 63% 55% 35% 28%
100 94% 87% 83% 68% 60% 40% 32%
500 96% 92% 90% 79% 73% 55% 46%
1000 97% 94% 92% 84% 79% 62% 54%

Percentage of choosing generic

  1 generic 5 10 50 100 500 1000
1 specific 14% 27% 35% 54% 63% 79% 84%
5 12% 24% 30% 49% 58% 76% 81%
10 11% 22% 28% 46% 55% 73% 79%
50 8% 16% 21% 37% 45% 65% 72%
100 6% 13% 17% 32% 40% 60% 68%
500 4% 8% 10% 21% 27% 45% 54%
1000 3% 6% 8% 16% 21% 38% 46%

'Živana',
'Žofie',
],
generic: ['Nikola', 'René'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to move them to female and male respectively?

@ST-DDT
Copy link
Member Author

ST-DDT commented Dec 19, 2024

Team Decision

  • We will use sqrt(specific) * 3 vs sqrt(generic)
Sqrt*3 1 generic 5 10 50 100 500 1000
1 specific 75% 57% 49% 30% 23% 12% 9%
5 87% 75% 68% 49% 40% 23% 18%
10 90% 81% 75% 57% 49% 30% 23%
50 95% 90% 87% 75% 68% 49% 40%
100 97% 93% 90% 81% 75% 57% 49%
500 99% 97% 95% 90% 87% 75% 68%
1000 99% 98% 97% 93% 90% 81% 75%

We believe that these values represent the use case best while leaning towards specific values, if specific has been requested.

@ST-DDT ST-DDT removed the s: needs decision Needs team/maintainer decision label Dec 19, 2024
@matthewmayer
Copy link
Contributor

matthewmayer commented Dec 20, 2024

Just wanted to check I understood this right

So for example if there were 9 generic first names, 25 female first names and 36 male first names, then if I request firstName("female") then I'll get a name from the female list versus the generic list in a ratio of

3*sqrt(25) : sqrt(9)

15:3

i.e. I get a name from the female list 15/18 of the time, 83.3 percent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Permutes locale definitions m: person Something is referring to the person module p: 1-normal Nothing urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants