Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-17014 No code fallbacks for language paths #4254

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

btangmu
Copy link
Member

@btangmu btangmu commented Dec 21, 2024

-Use the extra-paths mechanism instead of code-fallback for language paths

CLDR-17014

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

-Use the extra-paths mechanism instead of code-fallback for language paths
@btangmu btangmu self-assigned this Dec 21, 2024
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these are all constant paths, they can be statically initiated once, then just added whenever needed.

I'd suggest a separate class, ExtraPaths, with one method .append to(Collection)

@btangmu
Copy link
Member Author

btangmu commented Dec 21, 2024

a separate class, ExtraPaths

Yes, that sounds good, in fact I was thinking to follow this up with a refactoring PR, to move the extra-path code (not just for languages, but for all extra paths) out of CLDRFile into its own top-level class. It's still not clear to me what part, if any, of extra-path code is locale-specific; the current methods aren't static. It does seem wasteful to re-run code for each locale (each CLDRFile) if it's not locale-specific. Stepping back, though, does the general solution seem right, to use extra paths instead of code fallback for language paths?

There are some test failures, like this, that I still need to study:

  TestCoverageLevel {
    ...
    TestCoverageCompleteness
Error:  (TestCoverageLevel.java:728)  Error: Comprehensive & no exception for path =>	//ldml/localeDisplayNames/languages/language[@type="fa_AF"]

@macchiati
Copy link
Member

Everything in code-fallback is locale independent and immutable. So that's why you can move it all to the new class, and fetch it with no local parameter. We might be able to just call that when building the root cldrfile.

The other stuff in get extra paths is locale dependent. That could also be moved to the new class, but needs a locale parameter when fetched.

@macchiati
Copy link
Member

Stepping back, though, does the general solution seem right, to use extra paths instead of code fallback for language paths?

Yes, with the changes I mention.

-Change es-419 to es (419) in localeDisplayName.txt so TestLocaleDisplay passes

-Add language path in testExtraPaths so it passes

-Remove XMLSource.CODE_FALLBACK_ID, GERMAN, from testGetPaths so it passes
@btangmu
Copy link
Member Author

btangmu commented Dec 22, 2024

The second commit fixes some but not all test failures.

The remaining test failures are TestCoverageCompleteness and testLSR, both in TestCoverageLevel. These need more investigation.

After the tests are all passing, I'll address refactoring/optimizing with a new class ExtraPaths.

-Create the set of language extra paths only once; related refactoring

-Fix another part of testGetPaths so it passes, similar to previous commit

-Comments
private void getLanguageExtraPaths(Set<String> toAddTo) {
Set<String> codes =
StandardCodes.make().getSurveyToolDisplayCodes(NameType.LANGUAGE.getNameName());
codes.remove(XMLSource.ROOT_ID);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong, my mistake -- it actually modifies the set in StandardCodes! Need to make a copy of the set. We might want to prevent this kind of bug by making getSurveyToolDisplayCodes return an unmodifiable set, maybe using Set.copyOf.

We currently have this, which doesn't work: CLDRLanguageCodes = CldrUtility.protectCollection(CLDRLanguageCodes);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by my last commit -- we still should look into Set.copyOf as an alternative to protectCollection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: In general, use ImmutableSet.copyOf instead of Set.copyOf. The latter changes the order in the set, which may be important (TreeSet or LinkedHashSets get messed up, and it doesn't hurt for HashSet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to fix CldrUtility.protectCollection(CLDRLanguageCodes); — if it does't work then a lot of other things could go wrong.

-Do not modify SupplementalDataInfo.CLDRLanguageCodes, returned by sc.getGoodAvailableCodes!
@@ -1073,7 +1075,7 @@ public void testLSR() {

// Get root LSR codes

for (String path : root) {
for (String path : root.fullIterable()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without fullIterable, the loop skipped the extra paths

this may be a concern in general: how often do we loop through a CLDRFile the simple way, which skips extra paths, and what are the consequences and rationale for that?

@btangmu
Copy link
Member Author

btangmu commented Dec 24, 2024

The test failures all seem fixed now except for TestVettingDataDriven

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants