Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VirtualFileSystem NPE on indirect import of from pyexpat import * when using openpyxl #431

Open
jord1e opened this issue Oct 16, 2024 · 9 comments
Assignees

Comments

@jord1e
Copy link

jord1e commented Oct 16, 2024

Hi all,

I was trying to use openpyxl to create some Excel sheets and got a HostException (which is an NPE):

     Traceback (most recent call last):
      File "Unnamed", line 1, in <module>
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/__init__.py", line 7, in <module>
        from openpyxl.workbook import Workbook
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/workbook/__init__.py", line 4, in <module>
        from .workbook import Workbook
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/workbook/workbook.py", line 7, in <module>
        from openpyxl.worksheet.worksheet import Worksheet
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/worksheet/worksheet.py", line 24, in <module>
        from openpyxl.cell import Cell, MergedCell
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/cell/__init__.py", line 3, in <module>
        from .cell import Cell, WriteOnlyCell, MergedCell
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/cell/cell.py", line 27, in <module>
        from openpyxl.styles.styleable import StyleableObject
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/styles/styleable.py", line 13, in <module>
        from .builtins import styles
      File "/graalpy_vfs/venv/lib/python3.11/site-packages/openpyxl/styles/builtins.py", line 1346, in <module>
        ('Normal', NamedStyle.from_tree(fromstring(normal))),
      File "/graalpy_vfs/home/lib-python/3/xml/etree/ElementTree.py", line 1337, in XML
        parser = XMLParser(target=TreeBuilder())
      File "/graalpy_vfs/home/lib-python/3/xml/etree/ElementTree.py", line 1518, in __init__
        from xml.parsers import expat
      File "/graalpy_vfs/home/lib-python/3/xml/parsers/expat.py", line 4, in <module>
        from pyexpat import *
    SystemError: HostException: Cannot invoke "org.graalvm.python.embedding.utils.VirtualFileSystem$BaseEntry.getPlatformPath()" because "entry" is null

Reproduction available here (just run with ./gradlew test - on Linux, pyexpat is not implemented on Windows):
https://github.com/jord1e/poc-gradle-openpyxl-error

CI/CD run with error:
https://github.com/jord1e/poc-gradle-openpyxl-error/actions/runs/11363466654

Diff that creates the error
grafik

I am using the new GraalPy Gradle plugin (org.graalvm.python) and GraalPyResources.createContext() as described on
https://www.graalvm.org/latest/reference-manual/python/Embedding-Build-Tools/#virtual-filesystem

I've not had time to investigate it further

Maybe another side-question:
How to convert a bytes (which is an unsigned byte array in Python) directly to a byte[] in Java? At the moment I am just doing .as(int[].class) and manually looping through it :)

@msimacek
Copy link
Contributor

Thank you for the report. The test diff doesn't seem to matter, it seems it's always the second test that fails (if you swap the order, then works fails and fails succeeds). openpyxl imports pyexpat which is a native module. We cannot always reliably reset the global state of native modules, so only one context can use native modules directly. Later contexts currently fall back on LLVM emulation. It seems there is some bug in loading this fallback path, that's why it only happens in the second context. @tomasstupka could you please have a look? I could still reproduce this in master.

You can work around the issue if you use just one global context for all the tests.

How to convert a bytes (which is an unsigned byte array in Python) directly to a byte[] in Java? At the moment I am just doing .as(int[].class) and manually looping through it :)

In the current release, you can do .as(ByteSequence.class).toByteArray(), it should work with any object that implements the buffer API. In the next release you should be able to also do .as(byte[].class).

@jord1e
Copy link
Author

jord1e commented Oct 16, 2024

Edit: NVM saw your comment, will test it

So what I discovered is that it (importing from openpyxl) works the first time, but not after that

Shown in
https://github.com/jord1e/poc-gradle-openpyxl-error/actions/runs/11364945607
(commit jord1e/poc-gradle-openpyxl-error@dd62b8e)

I tried disabling the sources cache

GraalPyResources.contextBuilder()
        .allowExperimentalOptions(true)
        .option("python.WithCachedSources", "false")
        .build();

But this does not seem to fix the issue

Hopefully that helps in identifying the issue

@jord1e
Copy link
Author

jord1e commented Oct 17, 2024

Hi @msimacek

You can work around the issue if you use just one global context for all the tests.

Thanks, this worked, I used a lock to make it threadsafe in my environment

We cannot always reliably reset the global state of native modules, so only one context can use native modules directly

Does this mean it would be more "efficient" to just keep the global context in my case (low request volume) after the issue is fixed?
Because the native code should be faster than the LLVM-emulated code, right?

Later contexts currently fall back on LLVM emulation

Interesting, is there a reason PyExpat couldn't be emulated using LLVM on Windows in this case? There is a stub that errors on Windows

throw PRaiseNode.raiseUncached(this, PythonBuiltinClassType.NotImplementedError, toTruffleStringUncached("XML pyexpat parser is not implemented"));

In the current release, you can do .as(ByteSequence.class).toByteArray(), it should work with any object that implements the buffer API. In the next release you should be able to also do .as(byte[].class).

Thank you!

@timfel
Copy link
Member

timfel commented Oct 17, 2024

Interesting, is there a reason PyExpat couldn't be emulated using LLVM on Windows in this case? There is a stub that errors on Windows

Yes, we haven't set up building pyexpat in our Windows build (this line in our build file is guarded above by NOT WIN32). There's no reason not do other than the nuisance of setting up the library dependencies on Windows ;)

@jord1e
Copy link
Author

jord1e commented Oct 17, 2024

Interesting, is there a reason PyExpat couldn't be emulated using LLVM on Windows in this case? There is a stub that errors on Windows

Yes, we haven't set up building pyexpat in our Windows build (this line in our build file is guarded above by NOT WIN32). There's no reason not do other than the nuisance of setting up the library dependencies on Windows ;)

Good to know, not a blocker for me at all

The error message was just a bit confusing, I had to search the sources to find out that it is implemented on linux but not on windows - but the next person will most likely end up here ;)

@msimacek
Copy link
Contributor

Does this mean it would be more "efficient" to just keep the global context in my case (low request volume) after the issue is fixed?
Because the native code should be faster than the LLVM-emulated code, right?

It will be better to use one global context even after this is fixed. Whether it's faster depends on the exact scenario, usually it would be. But it would certainly use less memory.

@tomasstupka
Copy link
Member

tomasstupka commented Oct 17, 2024

org.graalvm.python.embedding.utils.VirtualFileSystem$BaseEntry.getPlatformPath()" because "entry" is null

this is fixed already, will be merged into master soon and ready for next release

for what it is worth - i was able to reproduce only when running with more contexts

@JPMoresmau
Copy link

I'm getting this error on the first use of a context, but I'm running my Java code in a Docker container. Is that problematic? Is there a specific thing I need to do to get that import to work under Docker? Or a workaround for this issue? Thanks!

@timfel
Copy link
Member

timfel commented Jan 6, 2025

@JPMoresmau the release with the fix is not out, yet. Unless you are running with a custom build, the issue remains. The next release is due out in February

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants