Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache arrow 14.0.2 hotfix #1

Closed
wants to merge 73 commits into from
Closed

Conversation

k-anshul
Copy link
Member

Changes from these PRs
apache#42003
apache#41638

rok and others added 30 commits October 11, 2023 16:55
…mensions, implemented using ExtensionType (apache#37166)

### Rationale for this change

For use cases where underlying datatype and number of dimensions in tensors are equal but not the actual shape we want to add a `VariableShapeTensorType`.
See apache#24868 and huggingface/datasets#5272

### What changes are included in this PR?

This introduces definition of `arrow.variable_shape_tensor` extension and it's C++ implementation and a Python wrapper.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

This introduces new extension type to the user.
* Closes: apache#24868

Lead-authored-by: Rok Mihevc <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
…pache#37901)

### Rationale for this change

Python 3.12 will be released in the next couple of weeks. We should add the wheels for pyarrow on our 14.0.0 release.

### What changes are included in this PR?

This PR adds jobs to build pyarrow wheels for Python 3.12.

### Are these changes tested?

They will be tested via archery tasks

### Are there any user-facing changes?

No but users will be able to use pyarrow with Python 3.12

* Closes: apache#37880

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
…conan (apache#38202)

### Rationale for this change
There is a conflict between the required Zlib version when using both thrift and GRPC.

### What changes are included in this PR?

Pinning zlib when using thrifht.

### Are these changes tested?

Via archery

### Are there any user-facing changes?

No
* Closes: apache#38201

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change

The NEWS file needs updating for 14.0.0.

### What changes are included in this PR?

The NEWS file is updated with commits since 13.0.0.

### Are these changes tested?

N/A

### Are there any user-facing changes?

No
* Closes: apache#38142

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Nic Crane <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…eight default (small) on smaller screens (apache#38148)

### Rationale for this change

The Sphinx theme we have been using (PyData Sphinx Theme) has been pinned to an older version for a while now and with the apache#36591 we have updated the code and are now using version 0.14.0 for the dev docs.

This PR fixes bugs we have encountered after the PR updating the theme has been merged.

### What changes are included in this PR?

- Have default header size for smaller screens and keep it increased for bigger screens.

* Closes: apache#38209

Authored-by: AlenkaF <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
…#38176)

### Rationale for this change

It's an internal bundled library. We should not install it as a part of Arrow.

### What changes are included in this PR?

Exclude all Azure SDK for C++ jobs including install jobs aren't executed by default. Building jobs are executed because they are required to build Arrow.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* Closes: apache#37510

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…pache#38222)

### Rationale for this change

Module caches don't have write permission by owner. So we can remove them by `rm -rf`.

### What changes are included in this PR?

Run `go clean -modcache` after all builds.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38200

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
### Rationale for this change

The test-r-versions job is failing because not all of our dependencies support R 3.5. We follow the tidyverse support policy where possible, which means we only support R 3.6 and above. Thus, we can drop the test for R 3.5.

### What changes are included in this PR?

R 3.5 was removed from the test matrix for test-r-versions

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* Closes: apache#38226

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…ncryption tests (apache#38244)

* Closes: apache#38243

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
apache#38229)

### Rationale for this change

The minimal nightly build are failing with examples that won't run without the dataset feature

### What changes are included in this PR?

- Added `examplesIf` where needed
- Redocumented

### Are these changes tested?

Yes, by all R CMD check jobs

### Are there any user-facing changes?

No
* Closes: apache#38228

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…r@v2 (apache#38218)

### Rationale for this change

CI jobs that used setup-r@ v1 no longer run without error.

### What changes are included in this PR?

- Updated the rchk job to use the `setup-r@ v2`
- Updated the devdocs job to use `setup-r@ v2`. To make this work, we needed to remove the Windows build because it was installing an old version of R. It seems that the job has been running an outdated and unsable (for most users) for a very long time.

### Are these changes tested?

Will be covered by crossbow jobs submitted below.

### Are there any user-facing changes?

No.
* Closes: apache#38197

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
apache#38232)

### Rationale for this change

We have several nightly builds failing with errors building the manual as a result of unicode characters. The unicode characters aren't new, so I'm not sure why this happened now.

### What changes are included in this PR?

Install a distribution of latex that supports unicode characters (maybe)?

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* Closes: apache#38227

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change

The latest version of `r/R/install-arrow.R`  was not working properly, since it was relying on the `on_rosetta()` function, which is not defined elsewhere. I just fixed the identification of rosetta in the script.

With the current code, the following gives an error

````r
> source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R") 
> install_arrow()
Error in on_rosetta() : could not find function "on_rosetta"
````

### What changes are included in this PR?

It only removed the `on_rosetta()` function, which was not defined elsewhere, and reverted back to the `rosetta` object to identify if rosetta is present or not on a user's system.

### Are these changes tested?

Yes. It was tested with the current code and the proposed PR. The proposed PR works as expected.

### Are there any user-facing changes?

No.

* Closes: apache#37907

Lead-authored-by: Fernando Mayer <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
### Rationale for this change

Several PRs over the last few months have update the build system to be more friendly for developers. During this process it has also come to light that we haven't supported the Windows development setup documented here since R 4.1 (released in spring 2021). I had to remove Windows from the test-r-devdocs job because the approach used there was not compatible with the `setup-r@ v2` action, and the job was failing with the `@ v1` action.

### What changes are included in this PR?

- Updated the sections on using pre-built static libraries and bundled builds
- Removed the Windows section regarding the bundled build. This section would need rewriting to support the last two minor releases of R but in the meantime I think it is mostly confusing.

### Are these changes tested?

They are documentation changes. They are also slightly optimisitc: we can fix problems with the developer setup incrementally between releases, but it's more difficult to update our documentation. This PR documents the intended behaviour after apache#38236 .

### Are there any user-facing changes?

No.
* Closes: apache#37945

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…8195)

### Rationale for this change

Previously GCS/S3 support would need to be explicitly enabled in source builds (when they are build without `NOT_CRAN`). As we want the macos binaries to be fully featured we should turn the features on when the dependencies exists. 

### What changes are included in this PR?

This PR enables this behavior for macOS only, on Linux setting `NOT_CRAN` or  `LIBARROW_MINIMAL=false` is still required.

### Are these changes tested?

Crossbow and locally (thanks @ paleolimbot )
* Closes: apache#38043

Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…rarily (apache#38238)

* Closes: apache#38239

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
…tring and Binary Types in Hash Join (apache#38147)

### Rationale for this change

We found that the wrong results in inner joins during hash join operations were caused by a problem with how large strings and binary types were handled. The `Slice` function was not calculating their sizes correctly.

To fix this, I changed the `Slice` function to calculate the sizes correctly, based on the type of data for large string and binary. 

* Issue raised: apache#37729 

### What changes are included in this PR?

* The `Slice` function has been updated to correctly calculate the offset for Large String and Large Binary types, and assertion statements have been added to improve maintainability.
* Unit tests (`TEST(KeyColumnArray, SliceBinaryTest)`)for the Slice function have been added. 
* During random tests for Hash Join (`TEST(HashJoin, Random)`), modifications were made to allow the creation of Large String as key column values.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Acero might not have a large user base as it is an experimental feature, but I deemed the issue of incorrect join results as critical and have addressed the bug.

* Closes: apache#38074

Authored-by: Hyunseok Seo <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
…egin() where a char pointer is expected (apache#38265)

### Rationale for this change

The MSVC compiler doesn't seem to allow user code to assume `std::string_view::const_iterator` is `const char*`, so using only `re2::StringPiece` and preferring to call `.data()` instead of `.begin()` should make things more uniform across different compilers and STL implementations.

### What changes are included in this PR?

 - Using `re2::StringPiece` instead of `std::string_view` to interact with `re2`
 - Use `data()` instead of `begin()` where a `char*` is expected

### Are these changes tested?

Yes, by existing tests.
* Closes: apache#38263

Authored-by: Felipe Oliveira Carvalho <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change

We need more disk space...

### What changes are included in this PR?

Remove more pre-installed files.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38206

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
…e#38225)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.15.0 to 0.17.0.
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/golang/net/commit/b225e7ca6dde1ef5a5ae5ce922861bda011cfabd"><code>b225e7c</code></a> http2: limit maximum handler goroutines to MaxConcurrentStreams</li>
<li><a href="https://github.com/golang/net/commit/88194ad8ab44a02ea952c169883c3f57db6cf9f4"><code>88194ad</code></a> go.mod: update golang.org/x dependencies</li>
<li><a href="https://github.com/golang/net/commit/2b60a61f1e4cf3a5ecded0bd7e77ea168289e6de"><code>2b60a61</code></a> quic: fix several bugs in flow control accounting</li>
<li><a href="https://github.com/golang/net/commit/73d82efb96cacc0c378bc150b56675fc191894b9"><code>73d82ef</code></a> quic: handle DATA_BLOCKED frames</li>
<li><a href="https://github.com/golang/net/commit/5d5a036a503f8accd748f7453c0162115187be13"><code>5d5a036</code></a> quic: handle streams moving from the data queue to the meta queue</li>
<li><a href="https://github.com/golang/net/commit/350aad2603e57013fafb1a9e2089a382fe67dc80"><code>350aad2</code></a> quic: correctly extend peer's flow control window after MAX_DATA</li>
<li><a href="https://github.com/golang/net/commit/21814e71db756f39b69fb1a3e06350fa555a79b1"><code>21814e7</code></a> quic: validate connection id transport parameters</li>
<li><a href="https://github.com/golang/net/commit/a600b3518eed7a9a4e24380b4b249cb986d9b64d"><code>a600b35</code></a> quic: avoid redundant MAX_DATA updates</li>
<li><a href="https://github.com/golang/net/commit/ea633599b58dc6a50d33c7f5438edfaa8bc313df"><code>ea63359</code></a> http2: check stream body is present on read timeout</li>
<li><a href="https://github.com/golang/net/commit/ddd8598e5694aa5e966e44573a53e895f6fa5eb2"><code>ddd8598</code></a> quic: version negotiation</li>
<li>Additional commits viewable in <a href="https://github.com/golang/net/compare/v0.15.0...v0.17.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.15.0&new-version=0.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/arrow/network/alerts).

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Matt Topol <[email protected]>
### Rationale for this change
Making sure the documentation that shows up on pkg.go.dev will show that the package is compatible with go1.19+

### What changes are included in this PR?
slight patch/minor version updates of some dependencies along with a documentation update in `doc.go`.
* Closes: apache#38285

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
…nature (apache#38283)

### Rationale for this change

The type signature of `ReplaceString` should be identical when arrow is compiled with or without `ARROW_WITH_RE2`.

### What changes are included in this PR?

The right signature + delegating to the implementation that takes `re2::StringPiece`. The conversion should be a no-op when compiled and optimized.

### Are these changes tested?

By existing tests and CI checks.

* Closes: apache#38282

Authored-by: Felipe Oliveira Carvalho <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
### What changes are included in this PR?

Bump versions of Go for our nightly tests to match supported Go versions

### Are these changes tested?
Via archery

### Are there any user-facing changes?

No

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…images (apache#38287)

### Rationale for this change
Fix CI failures for job that is getting out of space.

### What changes are included in this PR?

Using our free disk space script to add space for the ubuntu-r-only-r images.

### Are these changes tested?

On CI

### Are there any user-facing changes?
No
* Closes: apache#38286

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
### Rationale for this change

The test fail with the latest version of duckdb (0.9.1).

### What changes are included in this PR?

The test was changed so that it did not depend on non-deterministic behaviour. We sort all of the other expectations involving a group_by to avoid this problem...we hadn't changed this one yet because it didn't fail in any previous version of duckdb.

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* Closes: apache#38293

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…rsions.json (apache#38241)

This PR corrects the version for the `version_match` to be equal to the version defined in versions.json. This way the text is correctly displayed in the version switcher button.
* Closes: apache#38240

Authored-by: AlenkaF <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
…pache#38302)

### Rationale for this change

test-r-rhub-ubuntu-gcc-release-latest doesn't have enough disk space.

### What changes are included in this PR?

Remove pre-installed files on Azure Pipelines too.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38295

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change

Verify JDK 21 in CI in time for the Arrow v14 release.

### What changes are included in this PR?

* Bump latest Java version from 20 -> 21 in CI

### Are these changes tested?

Yes, via CI.

### Are there any user-facing changes?

No.
* Closes: apache#36994

Authored-by: Dane Pitkin <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
…he#38303)

### Rationale for this change

`expr` was printing the number of matching chars which showed up as noise in the log (which we want to avoid as much as possible to avoid any false positive checks)
See apache#38236 (comment) for @ jonkeane's investigation.

### What changes are included in this PR?

Replace use of expr with test.

### Are these changes tested?
Crossbow

Lead-authored-by: Jacob Wujciak-Jens <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
…the sidebar TOC (apache#38313)

* Closes: apache#38312

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
assignUser and others added 13 commits December 4, 2023 10:41
### Rationale for this change

Update news.md

### Are these changes tested?
no
* Closes: apache#38904

Authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…sible (apache#38362)

### Rationale for this change

We have external test data repositories, apache/arrow-testing and apache/parquet-testing. We use them as submodule. apache/arrow may not use the latest test data repositories. But our verification script always use the latest test data repositories. It may cause test failures.

### What changes are included in this PR?

Use local test data if they exist.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38345

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
### Rationale for this change

It's better that we always use the latest Homebrew to check with the latest Homebrew that are used by most users. But it's difficult to maintain.

### What changes are included in this PR?

We don't update Homebrew manually. GitHub hosted GitHub Actions Runners update Homebrew periodically. We depend on it instead of manual `brew update`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#39003

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…itly created sub-directories (apache#38845)

### Rationale for this change

See apache#38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object.

### Are these changes tested?

I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup.

### Are there any user-facing changes?

Fixes the regression
* Closes: apache#38618

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
### Rationale for this change

The script was to quiet.

### What changes are included in this PR?

Fix regex and add some output: 
```
Rscript tools/update-checksums.R 14.0.0                                                                           1 ✘
[1] "Extracting libarrow binary paths from tasks.yml"
[1] "Downloading windows/arrow-14.0.0.zip.sha512"
[1] "Converting windows/arrow-14.0.0.zip to windows style line endings"
[1] "Downloading linux-openssl-1.0/arrow-14.0.0.zip.sha512"
[1] "Downloading linux-openssl-1.1/arrow-14.0.0.zip.sha512"
[1] "Downloading linux-openssl-3.0/arrow-14.0.0.zip.sha512"
[1] "Downloading darwin-arm64-openssl-1.1/arrow-14.0.0.zip.sha512"
[1] "Downloading darwin-arm64-openssl-3.0/arrow-14.0.0.zip.sha512"
[1] "Downloading darwin-x86_64-openssl-1.1/arrow-14.0.0.zip.sha512"
[1] "Downloading darwin-x86_64-openssl-3.0/arrow-14.0.0.zip.sha512"
[1] "Checksums updated successfully!"
```

### Are these changes tested?
locally 

### Are there any user-facing changes?
no
* Closes: apache#39041

Authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
…pache#39077)

### Rationale for this change

Running our test suite results in many spurious warnings being printed that make it difficult to spot actual warnings.

### What changes are included in this PR?

The data used for specific tests involving `summarise()` was updated to not trigger the warnings.

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* Closes: apache#39076

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…rification job on AlmaLinux 8 (apache#39073)

### Rationale for this change

The verification task for Almalinux 8 was failing.

### What changes are included in this PR?

Add required python3.11-devel to the Docker image.

### Are these changes tested?

Yes via archery task.

### Are there any user-facing changes?

No

* Closes: apache#39072

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…pache#39082)

### Rationale for this change

`KEYS` may have UTF-8 (non ASCII) characters. Ruby chooses the default encoding based on `LANG`. If `LANG=C`, Ruby uses the `US-ASCII` encoding as the default encoding. If Ruby uses the `US-ASCII` encoding, we can't process `KEYS` because it has non ASCII characters.

### What changes are included in this PR?

Use the `UTF-8` encoding explicitly for `KEYS`. If we specify the `UTF-8` encoding explicitly, our `KEYS` processing don't depend on `LANG`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#39074

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…pache#38450)

### Rationale for this change

On macOS, "cp -a source/ destination/" copies "source/*" to "destination/" (such as "source/a" is copied to "destination/a") not "source/" to "destination/" (such as "source/a" is copied to "destination/source/a").

### What changes are included in this PR?

We need to remove the trailing "/" from "source/" to copy "source/" itself to "destination/source/".

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38449

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@k-anshul k-anshul closed this Jun 24, 2024
k-anshul pushed a commit that referenced this pull request Dec 21, 2024
…n timezone (apache#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
apache#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
apache#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40633

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment