Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paleolimbot stream reading #2238

Merged
merged 24 commits into from
Oct 1, 2023
Merged

Paleolimbot stream reading #2238

merged 24 commits into from
Oct 1, 2023

Conversation

edzer
Copy link
Member

@edzer edzer commented Sep 30, 2023

@paleolimbot This leaves one test that breaks:

  > if ("GML" %in% st_drivers()$name) {
  +   gml = system.file("gml/fmi_test.gml", package = "sf")
  +   print(dim(st_read(gml, quiet = TRUE)))
  +   gml = system.file("gml/20170930_OB_530964_UKSH.xml.gz", package = "sf")
  +   print(dim(st_read(gml, layer = "Parcely", quiet = TRUE)))
  +   print(dim(st_read(gml, layer = "Parcely", int64_as_string=TRUE, quiet = TRUE)))
  + }
  [1] 22 11
  Error in st_as_sfc.WKB(x) : vapply(x, is.raw, TRUE) are not all TRUE
  Calls: print ... lapply -> FUN -> st_as_sfc -> st_as_sfc.WKB -> stopifnot

we can instrument this to use use_stream=FALSE but maybe this is of interest to look into.

@paleolimbot
Copy link
Contributor

I wonder if that example has a NULL feature (hence is.raw() returning a false). st_as_sfc.wk_wkb() handles that case (but generates invalid sfc for mixed geometry types). It also might be worth calling the CPL_xxx version of the WKB reader directly for speed.

I'm not sure I understand the changes in that last commit but trust you know the internals of this better than I do. You're welcome to continue with/merge this PR if you want...I was planning to add a test to the other PR but I can also add that as a follow-up.

@edzer
Copy link
Member Author

edzer commented Oct 1, 2023

Most changes are because the stream reader returns a geom column called wkb_geometry, and I didn't want this to be always renamed to geometry - some drivers have a different name (GPKG using geom) and others have named, or multiple named geometry columns (databases) where the name matters. All test changes involve shapefiles that assumed the geom column to be called geometry, I relaxed that. But the stream PR is good to go, from your side?

@paleolimbot
Copy link
Contributor

Got it! Yes, the PR is good to go. It does need some testing and possible performance improvements, but I think follow-ups will be better. I think the follow-ups will be better/easier when more people can try reading with use_stream = TRUE and report feedback!

@edzer edzer merged commit 55f4cc8 into main Oct 1, 2023
7 checks passed
@edzer
Copy link
Member Author

edzer commented Oct 1, 2023

Great! with R_SF_ST_READ_USE_STREAM=true R CMD check there are still errors in testthat, but most of it is the different geometry column name, and not converting to multi. We can still see what to do with that.

edzer added a commit that referenced this pull request Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants