refactor(locale): filter and cleanup PersonEntryDefintions data #3266

ST-DDT · 2024-11-15T12:53:24Z

Second part of #3058

Clean up generic prefixes #3058

Extension of #3259

refactor(person): refine usage of PersonEntryDefinitions #3259

This PR cleans up the PersonEntryDefintions locale data.

generic values are checked whether they exist in exclusively either female and male, if so, they are removed from generic. This solves the issue where generic = merge(female, male)
female values are checked whether they are in generic, if so, they are removed from female.
female values are checked whether they are in male, if so, they are added to generic and removed from female.
male values are checked whether they are in generic, if so, they are removed from male.

~~I haven't run the script yet, because there is a large diff, due to the person data not being sorted.~~

Summary (changes only)

locale	entry	female	generic	male
af_ZA	first_name	107	219 ❌	113
ar	first_name	10	327 ❌	331
ar	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
az	first_name	73	108 ❌	35
az	last_name	10	20 ❌	10
cs_CZ	first_name	785 ➡ 783	1578 ➡ 2	795 ➡ 793
cs_CZ	last_name	991 ➡ 980	1979 ➡ 11	999 ➡ 988
cs_CZ	prefix	4 ❌	4	4 ❌
da	first_name	109	227 ❌	118
da	middle_name	30 ❌	30	30 ❌
da	prefix	1	2 ❌	1
de	first_name	583 ➡ 573	1145 ➡ 10	572 ➡ 562
de	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_AT	first_name	573	1145 ❌	572
de_AT	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_CH	first_name	138 ➡ 137	316 ➡ 1	179 ➡ 178
de_CH	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
dv	first_name	49	63 ❌	14
dv	last_name	248 ➡ 243	355 ➡ 5	112 ➡ 107
dv	prefix	4 ❌	4	4 ❌
el	first_name	19	55 ❌	36
el	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
en	first_name	500 ➡ 473	3005 ➡ 2240	500 ➡ 473
en	middle_name	210 ➡ 207	62 ➡ 40	98 ➡ 95
en	prefix	4 ➡ 3	5 ➡ 1	2 ➡ 1
en_AU	first_name	100	200 ❌	100
en_GH	first_name	132 ➡ 131	261 ➡ 1	130 ➡ 129
en_IN	first_name	288	742 ❌	454
en_NG	first_name	31	98 ❌	67
en_ZA	first_name	291 ➡ 288	546 ➡ 11	250 ➡ 247
eo	first_name	90 ➡ 89	179 ➡ 1	90 ➡ 89
eo	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
es	prefix	2	3 ❌	1
es_MX	first_name	161	300 ❌	139
es_MX	prefix	2	3 ❌	1
fa	first_name	67	715 ➡ 639	73
fa	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
fi	first_name	50	100 ❌	50
fr	first_name	451 ➡ 435	931 ➡ 16	496 ➡ 480
fr	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_BE	first_name	1338 ➡ 1293	2591 ➡ 45	1299 ➡ 1254
fr_BE	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_CH	first_name	451 ➡ 447	898 ➡ 4	451 ➡ 447
fr_CH	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_SN	first_name	79 ➡ 78	181 ➡ 1	103 ➡ 102
he	first_name	336 ➡ 218	541 ➡ 118	323 ➡ 205
he	prefix	4 ➡ 1	5 ➡ 3	4 ➡ 1
hr	first_name	238 ➡ 234	405 ➡ 4	171 ➡ 167
hr	prefix	3 ➡ 2	4 ➡ 1	2 ➡ 1
hu	first_name	100	200 ❌	100
hu	prefix	2 ❌	2	2 ❌
hy	first_name	46	91 ❌	45
id_ID	first_name	263 ➡ 259	752 ➡ 4	493 ➡ 489
id_ID	last_name	109 ➡ 108	257 ➡ 1	149 ➡ 148
it	first_name	617	1700 ❌	1083
it	prefix	4 ❌	4	4 ❌
ja	first_name	145 ➡ 144	279 ➡ 1	135 ➡ 134
ka_GE	prefix	2	4 ❌	2
ko	first_name	300 ➡ 239	540 ➡ 61	301 ➡ 240
lv	first_name	105	196 ❌	91
lv	last_name	207 ➡ 195	401 ➡ 12	206 ➡ 194
lv	prefix	3 ❌	3	3 ❌
mk	first_name	232	515 ❌	283
mk	last_name	495 ➡ 458	951 ➡ 37	493 ➡ 456
mk	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
nb_NO	first_name	50	100 ❌	50
nb_NO	prefix	2 ❌	2	2 ❌
ne	first_name	18	55 ❌	37
nl	first_name	514 ➡ 499	49 ➡ 15	587 ➡ 572
nl	prefix	7 ➡ 1	8 ➡ 6	7 ➡ 1
nl_BE	first_name	99	199 ❌	100
nl_BE	prefix	4 ❌	4	4 ❌
pl	first_name	163	393 ❌	230
pl	prefix	1	2 ❌	1
pt_BR	first_name	81	169 ❌	88
pt_BR	prefix	3	5 ❌	2
pt_PT	first_name	93	188 ❌	95
pt_PT	prefix	8	16 ❌	8
ro	first_name	387	674 ❌	287
ro	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
ro_MD	first_name	256 ➡ 245	460 ➡ 11	215 ➡ 204
ro_MD	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
ru	first_name	80	401 ❌	321
ru	last_name	250	500 ❌	250
sk	first_name	200 ➡ 199	391 ➡ 1	192 ➡ 191
sk	last_name	251	508 ❌	257
sk	prefix	4 ❌	4	4 ❌
sr_RS_latin	first_name	200	400 ❌	200
sv	first_name	100	200 ❌	100
sv	prefix	3 ❌	3	3 ❌
th	first_name	687 ➡ 681	1159 ➡ 6	478 ➡ 472
th	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
tr	first_name	404 ➡ 392	730 ➡ 679	735 ➡ 723
tr	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
uk	first_name	192	387 ❌	195
uk	last_name	230 ➡ 58	297 ➡ 172	239 ➡ 67
uk	prefix	1	2 ❌	1
ur	first_name	18	36 ❌	18
ur	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
uz_UZ_latin	first_name	133	360 ❌	227
uz_UZ_latin	last_name	209 ➡ 207	416 ➡ 2	209 ➡ 207
vi	first_name	1298 ➡ 1264	2488 ➡ 34	1224 ➡ 1190
zh_CN	first_name	85	164 ➡ 115	78
zh_TW	first_name	41	113 ❌	72
zu_ZA	first_name	49 ➡ 48	98 ➡ 1	50 ➡ 49

codecov · 2024-11-15T12:56:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (fb732cc) to head (7db34bd).
Report is 13 commits behind head on refactor/person/sex.

Additional details and impacted files

@@                   Coverage Diff                   @@
##           refactor/person/sex    #3266      +/-   ##
=======================================================
- Coverage                99.97%   99.96%   -0.02%     
=======================================================
  Files                     2806     2806              
  Lines                   217095   183754   -33341     
  Branches                   976      974       -2     
=======================================================
- Hits                    217035   183683   -33352     
- Misses                      60       71      +11

Files with missing lines	Coverage Δ
src/locales/af_ZA/person/first_name.ts	`100.00% <ø> (ø)`
src/locales/ar/location/street_pattern.ts	`100.00% <100.00%> (ø)`
src/locales/ar/person/first_name.ts	`100.00% <ø> (ø)`
src/locales/ar/person/prefix.ts	`100.00% <100.00%> (ø)`
src/locales/az/person/first_name.ts	`100.00% <ø> (ø)`
src/locales/az/person/last_name.ts	`100.00% <ø> (ø)`
src/locales/cs_CZ/company/name_pattern.ts	`100.00% <100.00%> (ø)`
src/locales/cs_CZ/person/first_name.ts	`100.00% <100.00%> (ø)`
src/locales/cs_CZ/person/last_name.ts	`100.00% <100.00%> (ø)`
src/locales/cs_CZ/person/prefix.ts	`100.00% <100.00%> (ø)`
... and 163 more

... and 1 file with indirect coverage changes

matthewmayer · 2024-11-15T13:07:08Z

Perhaps it would be possible to summarise the length of each locale definition file before and after the changes to get a feel for what the actual impacted methods and locales are without having to review the giant diff

eg
en.person.prefix.generic from 6 to 4 entries
...

ST-DDT · 2024-11-15T13:28:58Z

what the actual impacted methods and locales are without having to review the giant diff

Click to expand

locale	entry	female	generic	male
af_ZA	first_name	107 ➡ 107	219 ➡ 0	113 ➡ 113
af_ZA	last_name	0 ➡ 0	162 ➡ 162	0 ➡ 0
af_ZA	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ar	first_name	10 ➡ 10	327 ➡ 0	331 ➡ 331
ar	last_name	0 ➡ 0	76 ➡ 76	0 ➡ 0
ar	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ar	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
az	first_name	73 ➡ 73	108 ➡ 0	35 ➡ 35
az	last_name	10 ➡ 10	20 ➡ 0	10 ➡ 10
az	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
cs_CZ	first_name	785 ➡ 783	1578 ➡ 2	795 ➡ 793
cs_CZ	last_name	991 ➡ 980	1979 ➡ 11	999 ➡ 988
cs_CZ	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
cs_CZ	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
da	first_name	109 ➡ 109	227 ➡ 0	118 ➡ 118
da	last_name	0 ➡ 0	106 ➡ 106	0 ➡ 0
da	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
da	middle_name	30 ➡ 0	30 ➡ 30	30 ➡ 0
da	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
de	first_name	583 ➡ 573	1145 ➡ 10	572 ➡ 562
de	last_name	0 ➡ 0	1688 ➡ 1688	0 ➡ 0
de	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
de	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_AT	first_name	573 ➡ 573	1145 ➡ 0	572 ➡ 572
de_AT	last_name	0 ➡ 0	1688 ➡ 1688	0 ➡ 0
de_AT	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
de_AT	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_CH	first_name	138 ➡ 137	316 ➡ 1	179 ➡ 178
de_CH	last_name	0 ➡ 0	209 ➡ 209	0 ➡ 0
de_CH	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
de_CH	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
dv	first_name	49 ➡ 49	63 ➡ 0	14 ➡ 14
dv	last_name	248 ➡ 243	355 ➡ 5	112 ➡ 107
dv	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
dv	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
el	first_name	19 ➡ 19	55 ➡ 0	36 ➡ 36
el	last_name	0 ➡ 0	200 ➡ 200	0 ➡ 0
el	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
el	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
en	first_name	500 ➡ 473	3005 ➡ 2240	500 ➡ 473
en	last_name	0 ➡ 0	473 ➡ 473	0 ➡ 0
en	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en	middle_name	210 ➡ 207	62 ➡ 40	98 ➡ 95
en	prefix	4 ➡ 3	5 ➡ 1	2 ➡ 1
en_AU	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
en_AU	last_name	0 ➡ 0	286 ➡ 286	0 ➡ 0
en_AU	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_AU_ocker	first_name	0 ➡ 0	104 ➡ 104	0 ➡ 0
en_AU_ocker	last_name	0 ➡ 0	24 ➡ 24	0 ➡ 0
en_AU_ocker	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_BORK	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_CA	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_GB	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_GH	first_name	132 ➡ 131	261 ➡ 1	130 ➡ 129
en_GH	last_name	0 ➡ 0	120 ➡ 120	0 ➡ 0
en_GH	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_HK	last_name	0 ➡ 0	97 ➡ 97	0 ➡ 0
en_HK	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
en_IE	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_IN	first_name	288 ➡ 288	742 ➡ 0	454 ➡ 454
en_IN	last_name	0 ➡ 0	92 ➡ 92	0 ➡ 0
en_IN	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_NG	first_name	31 ➡ 31	98 ➡ 0	67 ➡ 67
en_NG	last_name	0 ➡ 0	156 ➡ 156	0 ➡ 0
en_NG	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_US	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
en_ZA	first_name	291 ➡ 288	546 ➡ 11	250 ➡ 247
en_ZA	last_name	0 ➡ 0	237 ➡ 237	0 ➡ 0
en_ZA	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
eo	first_name	90 ➡ 89	179 ➡ 1	90 ➡ 89
eo	last_name	0 ➡ 0	100 ➡ 100	0 ➡ 0
eo	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
eo	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
es	first_name	11 ➡ 11	213 ➡ 198	18 ➡ 18
es	last_name	0 ➡ 0	625 ➡ 625	0 ➡ 0
es	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
es	prefix	2 ➡ 2	3 ➡ 0	1 ➡ 1
es_MX	first_name	161 ➡ 161	300 ➡ 0	139 ➡ 139
es_MX	last_name	0 ➡ 0	687 ➡ 687	0 ➡ 0
es_MX	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
es_MX	prefix	2 ➡ 2	3 ➡ 0	1 ➡ 1
fa	first_name	67 ➡ 67	715 ➡ 639	73 ➡ 73
fa	last_name	0 ➡ 0	144 ➡ 144	0 ➡ 0
fa	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fa	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
fi	first_name	50 ➡ 50	100 ➡ 0	50 ➡ 50
fi	last_name	0 ➡ 0	50 ➡ 50	0 ➡ 0
fi	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr	first_name	451 ➡ 435	931 ➡ 16	496 ➡ 480
fr	last_name	0 ➡ 0	150 ➡ 150	0 ➡ 0
fr	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_BE	first_name	1338 ➡ 1293	2591 ➡ 45	1299 ➡ 1254
fr_BE	last_name	0 ➡ 0	615 ➡ 615	0 ➡ 0
fr_BE	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr_BE	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_CA	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr_CH	first_name	451 ➡ 447	898 ➡ 4	451 ➡ 447
fr_CH	last_name	0 ➡ 0	199 ➡ 199	0 ➡ 0
fr_CH	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr_CH	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_LU	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
fr_SN	first_name	79 ➡ 78	181 ➡ 1	103 ➡ 102
fr_SN	last_name	0 ➡ 0	148 ➡ 148	0 ➡ 0
fr_SN	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
he	first_name	336 ➡ 218	541 ➡ 118	323 ➡ 205
he	last_name	0 ➡ 0	738 ➡ 738	0 ➡ 0
he	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
he	prefix	4 ➡ 1	5 ➡ 3	4 ➡ 1
hr	first_name	238 ➡ 234	405 ➡ 4	171 ➡ 167
hr	last_name	0 ➡ 0	1000 ➡ 1000	0 ➡ 0
hr	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
hr	prefix	3 ➡ 2	4 ➡ 1	2 ➡ 1
hu	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
hu	last_name	0 ➡ 0	100 ➡ 100	0 ➡ 0
hu	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
hu	prefix	2 ➡ 0	2 ➡ 2	2 ➡ 0
hy	first_name	46 ➡ 46	91 ➡ 0	45 ➡ 45
hy	last_name	0 ➡ 0	47 ➡ 47	0 ➡ 0
hy	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
id_ID	first_name	263 ➡ 259	752 ➡ 4	493 ➡ 489
id_ID	last_name	109 ➡ 108	257 ➡ 1	149 ➡ 148
id_ID	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
it	first_name	617 ➡ 617	1700 ➡ 6	1083 ➡ 1083
it	last_name	0 ➡ 0	2170 ➡ 2170	0 ➡ 0
it	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
it	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
ja	first_name	145 ➡ 144	279 ➡ 1	135 ➡ 134
ja	last_name	0 ➡ 0	20 ➡ 20	0 ➡ 0
ja	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ka_GE	first_name	0 ➡ 0	494 ➡ 494	0 ➡ 0
ka_GE	last_name	0 ➡ 0	169 ➡ 169	0 ➡ 0
ka_GE	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ka_GE	prefix	2 ➡ 2	4 ➡ 0	2 ➡ 2
ko	first_name	300 ➡ 239	540 ➡ 61	301 ➡ 240
ko	last_name	0 ➡ 0	112 ➡ 112	0 ➡ 0
ko	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
lv	first_name	105 ➡ 105	196 ➡ 0	91 ➡ 91
lv	last_name	207 ➡ 195	401 ➡ 12	206 ➡ 194
lv	last_name_pattern	2 ➡ 2	0 ➡ 0	2 ➡ 2
lv	prefix	3 ➡ 0	3 ➡ 3	3 ➡ 0
mk	first_name	232 ➡ 232	515 ➡ 0	283 ➡ 283
mk	last_name	495 ➡ 458	951 ➡ 37	493 ➡ 456
mk	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
mk	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
nb_NO	first_name	50 ➡ 50	100 ➡ 0	50 ➡ 50
nb_NO	last_name	0 ➡ 0	100 ➡ 100	0 ➡ 0
nb_NO	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
nb_NO	prefix	2 ➡ 0	2 ➡ 2	2 ➡ 0
ne	first_name	18 ➡ 18	55 ➡ 0	37 ➡ 37
ne	last_name	0 ➡ 0	39 ➡ 39	0 ➡ 0
ne	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
nl	first_name	514 ➡ 499	49 ➡ 15	587 ➡ 572
nl	last_name	0 ➡ 0	131 ➡ 131	0 ➡ 0
nl	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
nl	prefix	7 ➡ 1	8 ➡ 6	7 ➡ 1
nl_BE	first_name	99 ➡ 99	199 ➡ 0	100 ➡ 100
nl_BE	last_name	0 ➡ 0	32 ➡ 32	0 ➡ 0
nl_BE	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
nl_BE	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
pl	first_name	163 ➡ 163	393 ➡ 0	230 ➡ 230
pl	last_name	0 ➡ 0	712 ➡ 712	0 ➡ 0
pl	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
pl	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
pt_BR	first_name	80 ➡ 80	169 ➡ 1	88 ➡ 88
pt_BR	last_name	0 ➡ 0	21 ➡ 21	0 ➡ 0
pt_BR	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
pt_BR	prefix	3 ➡ 3	5 ➡ 0	2 ➡ 2
pt_PT	first_name	93 ➡ 93	188 ➡ 0	95 ➡ 95
pt_PT	last_name	0 ➡ 0	121 ➡ 121	0 ➡ 0
pt_PT	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
pt_PT	prefix	8 ➡ 8	16 ➡ 0	8 ➡ 8
ro	first_name	387 ➡ 387	674 ➡ 0	287 ➡ 287
ro	last_name	0 ➡ 0	300 ➡ 300	0 ➡ 0
ro	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ro	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
ro_MD	first_name	256 ➡ 245	460 ➡ 11	215 ➡ 204
ro_MD	last_name	0 ➡ 0	299 ➡ 299	0 ➡ 0
ro_MD	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
ru	first_name	80 ➡ 80	401 ➡ 0	321 ➡ 321
ru	last_name	250 ➡ 250	500 ➡ 0	250 ➡ 250
ru	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
ru	middle_name	79 ➡ 79	0 ➡ 0	132 ➡ 132
sk	first_name	200 ➡ 199	391 ➡ 1	192 ➡ 191
sk	last_name	251 ➡ 251	508 ➡ 0	257 ➡ 257
sk	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
sk	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
sr_RS_latin	first_name	200 ➡ 200	400 ➡ 0	200 ➡ 200
sr_RS_latin	last_name	0 ➡ 0	999 ➡ 999	0 ➡ 0
sv	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
sv	last_name	0 ➡ 0	100 ➡ 100	0 ➡ 0
sv	last_name_pattern	0 ➡ 0	2 ➡ 2	0 ➡ 0
sv	prefix	3 ➡ 0	3 ➡ 3	3 ➡ 0
th	first_name	687 ➡ 681	1159 ➡ 6	478 ➡ 472
th	last_name	0 ➡ 0	111 ➡ 111	0 ➡ 0
th	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
tr	first_name	404 ➡ 392	730 ➡ 679	735 ➡ 723
tr	last_name	0 ➡ 0	198 ➡ 198	0 ➡ 0
tr	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
tr	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
uk	first_name	192 ➡ 192	387 ➡ 0	195 ➡ 195
uk	last_name	230 ➡ 58	297 ➡ 172	239 ➡ 67
uk	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
uk	middle_name	116 ➡ 116	0 ➡ 0	116 ➡ 116
uk	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
ur	first_name	18 ➡ 18	36 ➡ 0	18 ➡ 18
ur	last_name	0 ➡ 0	20 ➡ 20	0 ➡ 0
ur	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
ur	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
uz_UZ_latin	first_name	133 ➡ 133	360 ➡ 0	227 ➡ 227
uz_UZ_latin	last_name	209 ➡ 207	416 ➡ 2	209 ➡ 207
uz_UZ_latin	last_name_pattern	1 ➡ 1	0 ➡ 0	1 ➡ 1
vi	first_name	1298 ➡ 1264	2488 ➡ 34	1224 ➡ 1190
vi	last_name	0 ➡ 0	26 ➡ 26	0 ➡ 0
vi	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
yo_NG	first_name	84 ➡ 84	61 ➡ 61	86 ➡ 86
yo_NG	last_name	0 ➡ 0	98 ➡ 98	0 ➡ 0
yo_NG	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
zh_CN	first_name	85 ➡ 85	164 ➡ 115	78 ➡ 78
zh_CN	last_name	0 ➡ 0	1000 ➡ 1000	0 ➡ 0
zh_CN	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
zh_TW	first_name	41 ➡ 41	113 ➡ 0	72 ➡ 72
zh_TW	last_name	0 ➡ 0	100 ➡ 100	0 ➡ 0
zh_TW	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0
zu_ZA	first_name	49 ➡ 48	98 ➡ 1	50 ➡ 49
zu_ZA	last_name	0 ➡ 0	96 ➡ 96	0 ➡ 0
zu_ZA	last_name_pattern	0 ➡ 0	1 ➡ 1	0 ➡ 0

matthewmayer · 2024-11-15T13:55:36Z

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic

and then that single name will be returned 20% of the time?

matthewmayer · 2024-11-15T13:56:33Z

Also the en generic first name list are not actually all generic

matthewmayer · 2024-11-15T14:11:16Z

I sorted by entry first, and removed 0 ➡ 0 entries to make it easier to skim

locale	entry	female	generic	male
af_ZA	first_name	107 ➡ 107	219 ➡ 0	113 ➡ 113
ar	first_name	10 ➡ 10	327 ➡ 0	331 ➡ 331
az	first_name	73 ➡ 73	108 ➡ 0	35 ➡ 35
cs_CZ	first_name	785 ➡ 783	1578 ➡ 2	795 ➡ 793
da	first_name	109 ➡ 109	227 ➡ 0	118 ➡ 118
de	first_name	583 ➡ 573	1145 ➡ 10	572 ➡ 562
de_AT	first_name	573 ➡ 573	1145 ➡ 0	572 ➡ 572
de_CH	first_name	138 ➡ 137	316 ➡ 1	179 ➡ 178
dv	first_name	49 ➡ 49	63 ➡ 0	14 ➡ 14
el	first_name	19 ➡ 19	55 ➡ 0	36 ➡ 36
en	first_name	500 ➡ 473	3005 ➡ 2240	500 ➡ 473
en_AU	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
en_AU_ocker	first_name		104 ➡ 104
en_GH	first_name	132 ➡ 131	261 ➡ 1	130 ➡ 129
en_IN	first_name	288 ➡ 288	742 ➡ 0	454 ➡ 454
en_NG	first_name	31 ➡ 31	98 ➡ 0	67 ➡ 67
en_ZA	first_name	291 ➡ 288	546 ➡ 11	250 ➡ 247
eo	first_name	90 ➡ 89	179 ➡ 1	90 ➡ 89
es	first_name	11 ➡ 11	213 ➡ 198	18 ➡ 18
es_MX	first_name	161 ➡ 161	300 ➡ 0	139 ➡ 139
fa	first_name	67 ➡ 67	715 ➡ 639	73 ➡ 73
fi	first_name	50 ➡ 50	100 ➡ 0	50 ➡ 50
fr	first_name	451 ➡ 435	931 ➡ 16	496 ➡ 480
fr_BE	first_name	1338 ➡ 1293	2591 ➡ 45	1299 ➡ 1254
fr_CH	first_name	451 ➡ 447	898 ➡ 4	451 ➡ 447
fr_SN	first_name	79 ➡ 78	181 ➡ 1	103 ➡ 102
he	first_name	336 ➡ 218	541 ➡ 118	323 ➡ 205
hr	first_name	238 ➡ 234	405 ➡ 4	171 ➡ 167
hu	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
hy	first_name	46 ➡ 46	91 ➡ 0	45 ➡ 45
id_ID	first_name	263 ➡ 259	752 ➡ 4	493 ➡ 489
it	first_name	617 ➡ 617	1700 ➡ 6	1083 ➡ 1083
ja	first_name	145 ➡ 144	279 ➡ 1	135 ➡ 134
ka_GE	first_name		494 ➡ 494
ko	first_name	300 ➡ 239	540 ➡ 61	301 ➡ 240
lv	first_name	105 ➡ 105	196 ➡ 0	91 ➡ 91
mk	first_name	232 ➡ 232	515 ➡ 0	283 ➡ 283
nb_NO	first_name	50 ➡ 50	100 ➡ 0	50 ➡ 50
ne	first_name	18 ➡ 18	55 ➡ 0	37 ➡ 37
nl	first_name	514 ➡ 499	49 ➡ 15	587 ➡ 572
nl_BE	first_name	99 ➡ 99	199 ➡ 0	100 ➡ 100
pl	first_name	163 ➡ 163	393 ➡ 0	230 ➡ 230
pt_BR	first_name	80 ➡ 80	169 ➡ 1	88 ➡ 88
pt_PT	first_name	93 ➡ 93	188 ➡ 0	95 ➡ 95
ro	first_name	387 ➡ 387	674 ➡ 0	287 ➡ 287
ro_MD	first_name	256 ➡ 245	460 ➡ 11	215 ➡ 204
ru	first_name	80 ➡ 80	401 ➡ 0	321 ➡ 321
sk	first_name	200 ➡ 199	391 ➡ 1	192 ➡ 191
sr_RS_latin	first_name	200 ➡ 200	400 ➡ 0	200 ➡ 200
sv	first_name	100 ➡ 100	200 ➡ 0	100 ➡ 100
th	first_name	687 ➡ 681	1159 ➡ 6	478 ➡ 472
tr	first_name	404 ➡ 392	730 ➡ 679	735 ➡ 723
uk	first_name	192 ➡ 192	387 ➡ 0	195 ➡ 195
ur	first_name	18 ➡ 18	36 ➡ 0	18 ➡ 18
uz_UZ_latin	first_name	133 ➡ 133	360 ➡ 0	227 ➡ 227
vi	first_name	1298 ➡ 1264	2488 ➡ 34	1224 ➡ 1190
yo_NG	first_name	84 ➡ 84	61 ➡ 61	86 ➡ 86
zh_CN	first_name	85 ➡ 85	164 ➡ 115	78 ➡ 78
zh_TW	first_name	41 ➡ 41	113 ➡ 0	72 ➡ 72
zu_ZA	first_name	49 ➡ 48	98 ➡ 1	50 ➡ 49
af_ZA	last_name		162 ➡ 162
ar	last_name		76 ➡ 76
az	last_name	10 ➡ 10	20 ➡ 0	10 ➡ 10
cs_CZ	last_name	991 ➡ 980	1979 ➡ 11	999 ➡ 988
da	last_name		106 ➡ 106
de	last_name		1688 ➡ 1688
de_AT	last_name		1688 ➡ 1688
de_CH	last_name		209 ➡ 209
dv	last_name	248 ➡ 243	355 ➡ 5	112 ➡ 107
el	last_name		200 ➡ 200
en	last_name		473 ➡ 473
en_AU	last_name		286 ➡ 286
en_AU_ocker	last_name		24 ➡ 24
en_GH	last_name		120 ➡ 120
en_HK	last_name		97 ➡ 97
en_IN	last_name		92 ➡ 92
en_NG	last_name		156 ➡ 156
en_ZA	last_name		237 ➡ 237
eo	last_name		100 ➡ 100
es	last_name		625 ➡ 625
es_MX	last_name		687 ➡ 687
fa	last_name		144 ➡ 144
fi	last_name		50 ➡ 50
fr	last_name		150 ➡ 150
fr_BE	last_name		615 ➡ 615
fr_CH	last_name		199 ➡ 199
fr_SN	last_name		148 ➡ 148
he	last_name		738 ➡ 738
hr	last_name		1000 ➡ 1000
hu	last_name		100 ➡ 100
hy	last_name		47 ➡ 47
id_ID	last_name	109 ➡ 108	257 ➡ 1	149 ➡ 148
it	last_name		2170 ➡ 2170
ja	last_name		20 ➡ 20
ka_GE	last_name		169 ➡ 169
ko	last_name		112 ➡ 112
lv	last_name	207 ➡ 195	401 ➡ 12	206 ➡ 194
mk	last_name	495 ➡ 458	951 ➡ 37	493 ➡ 456
nb_NO	last_name		100 ➡ 100
ne	last_name		39 ➡ 39
nl	last_name		131 ➡ 131
nl_BE	last_name		32 ➡ 32
pl	last_name		712 ➡ 712
pt_BR	last_name		21 ➡ 21
pt_PT	last_name		121 ➡ 121
ro	last_name		300 ➡ 300
ro_MD	last_name		299 ➡ 299
ru	last_name	250 ➡ 250	500 ➡ 0	250 ➡ 250
sk	last_name	251 ➡ 251	508 ➡ 0	257 ➡ 257
sr_RS_latin	last_name		999 ➡ 999
sv	last_name		100 ➡ 100
th	last_name		111 ➡ 111
tr	last_name		198 ➡ 198
uk	last_name	230 ➡ 58	297 ➡ 172	239 ➡ 67
ur	last_name		20 ➡ 20
uz_UZ_latin	last_name	209 ➡ 207	416 ➡ 2	209 ➡ 207
vi	last_name		26 ➡ 26
yo_NG	last_name		98 ➡ 98
zh_CN	last_name		1000 ➡ 1000
zh_TW	last_name		100 ➡ 100
zu_ZA	last_name		96 ➡ 96
af_ZA	last_name_pattern		1 ➡ 1
ar	last_name_pattern		1 ➡ 1
az	last_name_pattern	1 ➡ 1		1 ➡ 1
cs_CZ	last_name_pattern	1 ➡ 1		1 ➡ 1
da	last_name_pattern		2 ➡ 2
de	last_name_pattern		1 ➡ 1
de_AT	last_name_pattern		1 ➡ 1
de_CH	last_name_pattern		1 ➡ 1
dv	last_name_pattern	1 ➡ 1		1 ➡ 1
el	last_name_pattern		1 ➡ 1
en	last_name_pattern		2 ➡ 2
en_AU	last_name_pattern		2 ➡ 2
en_AU_ocker	last_name_pattern		2 ➡ 2
en_BORK	last_name_pattern		2 ➡ 2
en_CA	last_name_pattern		2 ➡ 2
en_GB	last_name_pattern		2 ➡ 2
en_GH	last_name_pattern		2 ➡ 2
en_HK	last_name_pattern		1 ➡ 1
en_IE	last_name_pattern		2 ➡ 2
en_IN	last_name_pattern		2 ➡ 2
en_NG	last_name_pattern		2 ➡ 2
en_US	last_name_pattern		2 ➡ 2
en_ZA	last_name_pattern		2 ➡ 2
eo	last_name_pattern		2 ➡ 2
es	last_name_pattern		1 ➡ 1
es_MX	last_name_pattern		2 ➡ 2
fa	last_name_pattern		1 ➡ 1
fi	last_name_pattern		1 ➡ 1
fr	last_name_pattern		1 ➡ 1
fr_BE	last_name_pattern		1 ➡ 1
fr_CA	last_name_pattern		1 ➡ 1
fr_CH	last_name_pattern		1 ➡ 1
fr_LU	last_name_pattern		1 ➡ 1
fr_SN	last_name_pattern		1 ➡ 1
he	last_name_pattern		1 ➡ 1
hr	last_name_pattern		1 ➡ 1
hu	last_name_pattern		1 ➡ 1
hy	last_name_pattern		1 ➡ 1
id_ID	last_name_pattern	1 ➡ 1		1 ➡ 1
it	last_name_pattern		1 ➡ 1
ja	last_name_pattern		1 ➡ 1
ka_GE	last_name_pattern		1 ➡ 1
ko	last_name_pattern		1 ➡ 1
lv	last_name_pattern	2 ➡ 2		2 ➡ 2
mk	last_name_pattern	1 ➡ 1		1 ➡ 1
nb_NO	last_name_pattern		2 ➡ 2
ne	last_name_pattern		1 ➡ 1
nl	last_name_pattern		1 ➡ 1
nl_BE	last_name_pattern		1 ➡ 1
pl	last_name_pattern		1 ➡ 1
pt_BR	last_name_pattern		1 ➡ 1
pt_PT	last_name_pattern		1 ➡ 1
ro	last_name_pattern		1 ➡ 1
ru	last_name_pattern	1 ➡ 1		1 ➡ 1
sk	last_name_pattern	1 ➡ 1		1 ➡ 1
sv	last_name_pattern		2 ➡ 2
tr	last_name_pattern		1 ➡ 1
uk	last_name_pattern	1 ➡ 1		1 ➡ 1
ur	last_name_pattern		1 ➡ 1
uz_UZ_latin	last_name_pattern	1 ➡ 1		1 ➡ 1
vi	last_name_pattern		1 ➡ 1
yo_NG	last_name_pattern		1 ➡ 1
zh_CN	last_name_pattern		1 ➡ 1
zh_TW	last_name_pattern		1 ➡ 1
zu_ZA	last_name_pattern		1 ➡ 1
da	middle_name	30 ➡ 0	30 ➡ 30	30 ➡ 0
en	middle_name	210 ➡ 207	62 ➡ 40	98 ➡ 95
ru	middle_name	79 ➡ 79		132 ➡ 132
uk	middle_name	116 ➡ 116		116 ➡ 116
ar	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
cs_CZ	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
da	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
de	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_AT	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
de_CH	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
dv	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
el	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
en	prefix	4 ➡ 3	5 ➡ 1	2 ➡ 1
eo	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
es	prefix	2 ➡ 2	3 ➡ 0	1 ➡ 1
es_MX	prefix	2 ➡ 2	3 ➡ 0	1 ➡ 1
fa	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
fr	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_BE	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
fr_CH	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
he	prefix	4 ➡ 1	5 ➡ 3	4 ➡ 1
hr	prefix	3 ➡ 2	4 ➡ 1	2 ➡ 1
hu	prefix	2 ➡ 0	2 ➡ 2	2 ➡ 0
it	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
ka_GE	prefix	2 ➡ 2	4 ➡ 0	2 ➡ 2
lv	prefix	3 ➡ 0	3 ➡ 3	3 ➡ 0
mk	prefix	4 ➡ 2	5 ➡ 2	3 ➡ 1
nb_NO	prefix	2 ➡ 0	2 ➡ 2	2 ➡ 0
nl	prefix	7 ➡ 1	8 ➡ 6	7 ➡ 1
nl_BE	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
pl	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
pt_BR	prefix	3 ➡ 3	5 ➡ 0	2 ➡ 2
pt_PT	prefix	8 ➡ 8	16 ➡ 0	8 ➡ 8
ro	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
ro_MD	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1
sk	prefix	4 ➡ 0	4 ➡ 4	4 ➡ 0
sv	prefix	3 ➡ 0	3 ➡ 3	3 ➡ 0
th	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
tr	prefix	3 ➡ 1	4 ➡ 2	3 ➡ 1
uk	prefix	1 ➡ 1	2 ➡ 0	1 ➡ 1
ur	prefix	2 ➡ 1	3 ➡ 1	2 ➡ 1

matthewmayer · 2024-11-15T14:19:08Z

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic

and then that single name will be returned 20% of the time?

previously

"If female/male is requested: Then the method will mostly (80%) return female/male values with some (20%) generic values sprinkled in. "

This suggests to me that rather than a fixed 80% gendered and 20% generic result, if say female is requested it should pick randomly from the female definitions concatenated with the generic definitions, so that locales with only a small number of generic definitions dont keep picking the same small number of generic names.

ST-DDT · 2024-11-15T14:22:07Z

one issue i can see is that in some cases you end up with only say 1 surviving entry in generic

and then that single name will be returned 20% of the time?

Yes, we can add new generic entries to them later or change the distribution. Do you have suggestion/prefered solution?

ST-DDT · 2024-11-15T14:32:55Z

This suggests to me that rather than a fixed 80% gendered and 20% generic result, if say female is requested it should pick randomly from the female definitions concatenated with the generic definitions, so that locales with only a small number of generic definitions dont keep picking the same small number of generic names.

We also considered this.
There is one downside to this. If (fe-)male has a small set compared to generic, then you get odd distributions there. E.g. 50% Mr. and 50% Dr.

We also considerd weighting them:

binary.length vs generic.length
binary.length + x vs generic.length
binary.length * x vs generic.length
binary = 4 vs generic = 1 (current PR)
binary = 1 - genericPercentage vs generic = genericPercentage

In summary, we haven found the perfect solution yet.

ST-DDT · 2024-11-15T14:53:49Z

I kind of tend to just proceed with a non optimal weight distribution and tweak it in subsequent PRs.

matthewmayer · 2024-11-16T00:34:37Z

What if the percentage of generic names was something that could be set for each locale definition seperately?

export default {
  generic: ['Dr.'],
  female: ['Mrs.', 'Ms.', 'Miss'],
  male: ['Mr.'],
  generic_probability:0.1
};

Then you could have say a 10% chance of getting a gender-neutral english prefix, but a 50% chance of getting a gender-neutral Chinese first_name.

ST-DDT · 2024-11-16T08:01:12Z

What if the percentage of generic names was something that could be set for each locale definition seperately?

Is finding the right percentage for each distribution a (merge or release) blocking issue for you or is that something we can adjust in later PRs?

matthewmayer · 2024-11-16T09:59:46Z

I'd say yes. Having 20 percent of all Japanese first names output the same because there is only one generic name feels like a bug/ regression.

matthewmayer · 2024-11-24T10:11:27Z

Also the en generic first name list are not actually all generic

I'm not sure what the best way to handle this is. We don't want to leave them all in generic otherwise female names would be returned when you asked for male.

So we would have to go through and split the generic names into male and female. There are some (free and paid) apis which might be able to help with that like https://genderize.io/

ST-DDT · 2024-11-24T14:53:46Z

My plan for this - if you are fine with it - looks like this:

Merge refactor(locale): sort person data #3269
Do an intermediate release v9.3 (maybe early December)
Merge refactor(person): refine usage of PersonEntryDefinitions #3259
Merge refactor(locale): filter and cleanup PersonEntryDefintions data #3266
Truncate excessive data to 1000
Identify and replace bad data (per locale group)
Do a release after completion v9.4 (maybe January)
(Check for remaining v9.x ToDos)
(Do a final release for v9.5 in February)
(Start working on v10 features in March)

Please let me know what you think of this and what your suggestions are.

matthewmayer · 2024-11-24T15:06:12Z

In general that sounds fine. There's no great hurry for this and we are less likely to accidentally break things if we spread this over a few releases.

However I think we should try to figure out what we will do the problematic locales so we don't get stuck in future. Even if we truncate en generic first names to 1000 first that's a lot to go through by hand.

ST-DDT · 2024-11-24T16:14:47Z

However I think we should try to figure out what we will do the problematic locales so we don't get stuck in future. Even if we truncate en generic first names to 1000 first that's a lot to go through by hand.

IMO we can either check the existing list, which can be a lot, or we could search for a new list. Whatever is easier for us.

matthewmayer · 2024-11-24T23:52:46Z

Would we allow 1000 male and 1000 female names? Or 1000 total across all genders?

ST-DDT · 2024-11-25T00:08:54Z

I think the current script limits it to up to 1000 each.

…or/person/sex-localeData

ST-DDT · 2024-12-03T12:52:56Z

I'd say yes. Having 20 percent of all Japanese first names output the same because there is only one generic name feels like a bug/ regression.

How about using a ratio of:

sqrt(specific) + 5 vs sqrt(generic)

Percentage of choosing specific

	1 generic	5	10	50	100	500	1000
1 specific	86%	73%	65%	46%	38%	21%	16%
5	88%	76%	70%	51%	42%	24%	19%
10	89%	78%	72%	54%	45%	27%	21%
50	92%	84%	79%	63%	55%	35%	28%
100	94%	87%	83%	68%	60%	40%	32%
500	96%	92%	90%	79%	73%	55%	46%
1000	97%	94%	92%	84%	79%	62%	54%

Percentage of choosing generic

	1 generic	5	10	50	100	500	1000
1 specific	14%	27%	35%	54%	63%	79%	84%
5	12%	24%	30%	49%	58%	76%	81%
10	11%	22%	28%	46%	55%	73%	79%
50	8%	16%	21%	37%	45%	65%	72%
100	6%	13%	17%	32%	40%	60%	68%
500	4%	8%	10%	21%	27%	45%	54%
1000	3%	6%	8%	16%	21%	38%	46%

ST-DDT · 2024-12-03T12:22:47Z

src/locales/cs_CZ/person/first_name.ts

-    'Živana',
-    'Žofie',
-  ],
+  generic: ['Nikola', 'René'],


Is it safe to move them to female and male respectively?

ST-DDT · 2024-12-19T17:25:14Z

Team Decision

We will use sqrt(specific) * 3 vs sqrt(generic)

Sqrt*3	1 generic	5	10	50	100	500	1000
1 specific	75%	57%	49%	30%	23%	12%	9%
5	87%	75%	68%	49%	40%	23%	18%
10	90%	81%	75%	57%	49%	30%	23%
50	95%	90%	87%	75%	68%	49%	40%
100	97%	93%	90%	81%	75%	57%	49%
500	99%	97%	95%	90%	87%	75%	68%
1000	99%	98%	97%	93%	90%	81%	75%

We believe that these values represent the use case best while leaning towards specific values, if specific has been requested.

matthewmayer · 2024-12-20T03:00:18Z

Just wanted to check I understood this right

So for example if there were 9 generic first names, 25 female first names and 36 male first names, then if I request firstName("female") then I'll get a name from the female list versus the generic list in a ratio of

3*sqrt(25) : sqrt(9)

15:3

i.e. I get a name from the female list 15/18 of the time, 83.3 percent.

temp(script): filter PersonEntryDefinitions

b5f6a65

ST-DDT added p: 1-normal Nothing urgent c: locale Permutes locale definitions m: person Something is referring to the person module labels Nov 15, 2024

ST-DDT added this to the vAnytime milestone Nov 15, 2024

ST-DDT self-assigned this Nov 15, 2024

ST-DDT added 3 commits November 15, 2024 14:24

chore: log stats

9c874c1

Merge branch 'refactor/person/sex' into refactor/person/sex-localeData

3d4fd2f

chore: log stats

4f00862

chore: log stats

37f3c96

ST-DDT mentioned this pull request Nov 15, 2024

refactor(person): refine usage of PersonEntryDefinitions #3259

Open

ST-DDT added the s: needs decision Needs team/maintainer decision label Nov 16, 2024

ST-DDT mentioned this pull request Nov 16, 2024

refactor(locale): sort person data #3269

Merged

Merge remote-tracking branch 'origin/refactor/person/sex' into refact…

cd834b4

…or/person/sex-localeData

matthewmayer mentioned this pull request Nov 25, 2024

fix(locale): add Isadora to female names in pt_BR for consistency #3282

Merged

ST-DDT added 4 commits December 3, 2024 12:59

Merge branch 'refactor/person/sex' into refactor/person/sex-localeData

38c0a31

chore: improve stats logging

f87c9f9

chore: revert logging

0f0886c

chore: run generate:locales

f55095f

ST-DDT added 3 commits December 3, 2024 14:08

chore: update snapshots

538faf2

chore: restore pt person prefix comments

ab927a6

chore: fix sk company patterns

8753216

ST-DDT commented Dec 3, 2024

View reviewed changes

Merge branch 'refactor/person/sex' into refactor/person/sex-localeData

6e7289b

ST-DDT removed the s: needs decision Needs team/maintainer decision label Dec 19, 2024

chore: fix locale patterns

7db34bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(locale): filter and cleanup PersonEntryDefintions data #3266

refactor(locale): filter and cleanup PersonEntryDefintions data #3266

ST-DDT commented Nov 15, 2024 •

edited

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

matthewmayer commented Nov 15, 2024

ST-DDT commented Nov 15, 2024 •

edited

Loading

matthewmayer commented Nov 15, 2024

matthewmayer commented Nov 15, 2024

matthewmayer commented Nov 15, 2024 •

edited

Loading

matthewmayer commented Nov 15, 2024

ST-DDT commented Nov 15, 2024

ST-DDT commented Nov 15, 2024 •

edited

Loading

ST-DDT commented Nov 15, 2024

matthewmayer commented Nov 16, 2024

ST-DDT commented Nov 16, 2024 •

edited

Loading

matthewmayer commented Nov 16, 2024

matthewmayer commented Nov 24, 2024

ST-DDT commented Nov 24, 2024 •

edited

Loading

matthewmayer commented Nov 24, 2024 •

edited

Loading

ST-DDT commented Nov 24, 2024

matthewmayer commented Nov 24, 2024

ST-DDT commented Nov 25, 2024

ST-DDT commented Dec 3, 2024

ST-DDT Dec 3, 2024

ST-DDT commented Dec 19, 2024

matthewmayer commented Dec 20, 2024 •

edited

Loading

refactor(locale): filter and cleanup PersonEntryDefintions data #3266

Are you sure you want to change the base?

refactor(locale): filter and cleanup PersonEntryDefintions data #3266

Conversation

ST-DDT commented Nov 15, 2024 • edited Loading

codecov bot commented Nov 15, 2024 • edited Loading

Codecov Report

matthewmayer commented Nov 15, 2024

ST-DDT commented Nov 15, 2024 • edited Loading

matthewmayer commented Nov 15, 2024

matthewmayer commented Nov 15, 2024

matthewmayer commented Nov 15, 2024 • edited Loading

matthewmayer commented Nov 15, 2024

ST-DDT commented Nov 15, 2024

ST-DDT commented Nov 15, 2024 • edited Loading

ST-DDT commented Nov 15, 2024

matthewmayer commented Nov 16, 2024

ST-DDT commented Nov 16, 2024 • edited Loading

matthewmayer commented Nov 16, 2024

matthewmayer commented Nov 24, 2024

ST-DDT commented Nov 24, 2024 • edited Loading

matthewmayer commented Nov 24, 2024 • edited Loading

ST-DDT commented Nov 24, 2024

matthewmayer commented Nov 24, 2024

ST-DDT commented Nov 25, 2024

ST-DDT commented Dec 3, 2024

Percentage of choosing specific

Percentage of choosing generic

ST-DDT Dec 3, 2024

Choose a reason for hiding this comment

ST-DDT commented Dec 19, 2024

matthewmayer commented Dec 20, 2024 • edited Loading

ST-DDT commented Nov 15, 2024 •

edited

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

ST-DDT commented Nov 15, 2024 •

edited

Loading

matthewmayer commented Nov 15, 2024 •

edited

Loading

ST-DDT commented Nov 15, 2024 •

edited

Loading

ST-DDT commented Nov 16, 2024 •

edited

Loading

ST-DDT commented Nov 24, 2024 •

edited

Loading

matthewmayer commented Nov 24, 2024 •

edited

Loading

matthewmayer commented Dec 20, 2024 •

edited

Loading