Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18108 Give all languages a primary script: trivial cases #4210

Conversation

conradarcturus
Copy link
Contributor

This change adds "primary" scripts to many languages in language_script.tsv.

This won't change likely subtags, rather this just future-proofs our data by recognizing a singular primary script, avoiding issues where ambiguities served customers the wrong script.

I also added scripts for languages in country_language_population.tsv that were missing.

CLDR-18108

  • This PR completes the ticket.

There are many more tasks, especially more complicated ones, in the parent ticket https://unicode-org.atlassian.net/browse/CLDR-18102

ALLOW_MANY_COMMITS=true

@macchiati
Copy link
Member

macchiati commented Nov 21, 2024 via email

@conradarcturus conradarcturus force-pushed the conradarcturus/CLDR-18108-add-primary-scripts branch from 3cb6b86 to 7f653f7 Compare November 21, 2024 21:33
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/supplemental/supplementalData.xml is different
  • docs/site/development/updating-codes/update-language-script-info/language-script-description.md is different
  • tools/cldr-code/src/main/java/org/unicode/cldr/tool/ConvertLanguageData.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@macchiati
Copy link
Member

I think we should put this on hold for now, and focus on https://unicode-org.atlassian.net/browse/CLDR-18087.

@conradarcturus conradarcturus force-pushed the conradarcturus/CLDR-18108-add-primary-scripts branch from 191c88d to fd262df Compare November 26, 2024 16:31
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@conradarcturus conradarcturus force-pushed the conradarcturus/CLDR-18108-add-primary-scripts branch from fd262df to 7b4a73a Compare November 26, 2024 16:35
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/supplemental/supplementalData.xml is different
  • tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but i see there's other discussion

@conradarcturus
Copy link
Contributor Author

I think we should put this on hold for now, and focus on https://unicode-org.atlassian.net/browse/CLDR-18087.

I can hold off on this change -- but I want to close out easy tickets rather in parallel with working on the more complex ones. <languageData> is taking a bit of time to fix the problems when deprecating it so I don't know if I can fix that before the new year.

This change adds "primary" scripts to many languages in language_script.tsv.

This won't change likely subtags, rather this just future-proofs our data by recognizing a singular primary script, avoiding issues where ambiguities served customers the wrong script.

I also added scripts for languages in country_language_population.tsv that were missing.
Updated the ConvertLanguageData script to avoid demoting historical scripts/historical langauges.

Also removed multi-primary script notes from the description -- anticipating a re-design, handled by other tasks.
mvn spotless:apply
`und` -> `Latn` makes sense in many context, but `Zyyy` (Undetermined) may make sense as well. To avoid unanticipated side-effects, let's remove this row and only add it in if we need it.
@conradarcturus conradarcturus force-pushed the conradarcturus/CLDR-18108-add-primary-scripts branch from 7b4a73a to 630e20a Compare December 2, 2024 18:18
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • common/supplemental/supplementalData.xml is different
  • docs/site/development/updating-codes/update-language-script-info/language-script-description.md is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@conradarcturus
Copy link
Contributor Author

As discussed in the Unicode CLDR Design meeting, let's move ahead with the straightforward updates -- and work on the long-term changes in parallel. Many of the long-term changes require larger refactors so I think we should still merge this change so the data quality improves incrementally, rather than large / hard to review changes.

srl295
srl295 previously requested changes Dec 4, 2024
Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken links otherwise OK

We should make sure this goes in the release notes under migration

conradarcturus and others added 2 commits December 4, 2024 08:37
Co-authored-by: Steven R. Loomis <[email protected]>
Co-authored-by: Steven R. Loomis <[email protected]>
…e-script-info/language-script-description.md

Co-authored-by: Mark Davis <[email protected]>
@conradarcturus conradarcturus dismissed srl295’s stale review December 10, 2024 06:53

Requested changes made

@conradarcturus conradarcturus merged commit c5fbc96 into unicode-org:main Dec 10, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants