Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Fix potential libcrypto lib loading issue for X86 mac runners #55

Merged
merged 2 commits into from
Feb 20, 2024

Conversation

advancedxy
Copy link
Contributor

Which issue does this PR close?

Closes #41

Rationale for this change

Another attempt to fix libcrypto loading issues.
Currently, CIs are still failing randomly with the following logging, for example:
WARNING: /Users/runner/hostedtoolcache/Java_Adopt_jdk/17.0.10-7/x64/Contents/Home/bin/java is loading libcrypto in an unsafe way

What changes are included in this PR?

install openssl for mac runners and setup corresponding DYLD_LIBRARY_PATH

How are these changes tested?

Existed CI workflow

@viirya
Copy link
Member

viirya commented Feb 20, 2024

Looks like it still fails:

WARNING: /Users/runner/hostedtoolcache/Java_Adopt_jdk/17.0.10-7/x64/Contents/Home/bin/java is loading libcrypto in an unsafe way

And, it is in x64 Mac, not aarch64.

@advancedxy
Copy link
Contributor Author

And, it is in x64 Mac, not aarch64.

Ah, yes. The failures observed for now are all on X86 macs.

Let me see if there's other ways to investigate this.

@advancedxy advancedxy marked this pull request as draft February 20, 2024 07:41
@advancedxy advancedxy force-pushed the fix_libcrypto_loading branch from 98658cc to f33fbbd Compare February 20, 2024 11:36
@advancedxy advancedxy force-pushed the fix_libcrypto_loading branch from f33fbbd to 526bd9c Compare February 20, 2024 11:37
@advancedxy
Copy link
Contributor Author

advancedxy commented Feb 20, 2024

Ah, yes. The failures observed for now are all on X86 macs.
Let me see if there's other ways to investigate this.

I believe I have found the root cause to x86 mac is failing loading libcrypto.dylib.

Solutions:

  1. install openssl and setup DYLD_LIBRARY_PATH with openssl's actual lib path (chosen by this PR)
  2. or passing property commons.crypto.cipher.classes=org.apache.commons.crypto.cipher.JceCipher to loading JCE cipher only for apache commons' crypto lib.

Detailed investigations

Q1: why mac runners are failing with loading libcrypto in an unsafe way?
A1: See this comment: cl-plus-ssl/cl-plus-ssl#114 (comment)

Q2: why libcrypto.dylib is loaded? Or who is loading it?
A2: It's loaded by Apache's commons-crypto lib, which is used by Spark to enable IO encryption. After enabling DYLD_PRINT_SEARCHING=true for mac runners, it shows it's loaded by commons-crypto's native lib
image
Log path: https://github.com/apache/arrow-datafusion-comet/actions/runs/7971374594/job/21760923967

Q3: why only X86 runners are failing?
A3: It's because commons-crypto only bundled with x86's native lib.
image
When running under Silicon Macs, there's no native lib and fails loading. However the native code is to provide OpenSslCipher only. Spark's IO encryption doesn't require OpenSSL. JceCipher is sufficient.

Q4: why is it flaky?
A4: Not sure. Maybe the dynamic libs are loaded in different order for each run. And the correct libcrypto.dylib has a chance to be loaded by the JVM.

@advancedxy advancedxy changed the title build: Fix potential libcrypto lib loading issue for mac runners build: Fix potential libcrypto lib loading issue for X86 mac runners Feb 20, 2024
@advancedxy advancedxy marked this pull request as ready for review February 20, 2024 12:18
@advancedxy advancedxy closed this Feb 20, 2024
@advancedxy advancedxy reopened this Feb 20, 2024
@advancedxy
Copy link
Contributor Author

Close and reopen to trigger another run to make sure it actually works.

cc @viirya @sunchao

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed investigation @advancedxy !

@sunchao sunchao merged commit f7b88e9 into apache:main Feb 20, 2024
12 checks passed
@advancedxy advancedxy deleted the fix_libcrypto_loading branch February 21, 2024 01:19
@advancedxy
Copy link
Contributor Author

I'm still observing this issue when running my own PR, https://github.com/advancedxy/arrow-datafusion-comet/actions/runs/7982531503/job/21796212861 which is already rebased with origin/main and include this fix.

I'm not observing this issue in this repo's PRs though after this commit. Looks like that this PR also don't fix it completely, but it reduces the chances. It might be still possible to load the incorrect libcrypto by the JVM.

Let's see how frequent this issue happens. If it's still annoying, we can try to investigate further and try option 2.
Or we can disable X86 mac runners as last resort since it took much longer time than other runners and slows down the CI execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants