Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 60287e1
Author: Thomas Li <[email protected]>
Date:   Mon Jul 1 17:56:34 2024 +0000

    address more comments

commit 25c25d4
Merge: 7806ce4 51fb873
Author: Thomas Li <[email protected]>
Date:   Mon Jul 1 17:31:44 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 51fb873
Merge: 599ce95 e932fbd
Author: gpuCI <[email protected]>
Date:   Mon Jul 1 12:17:38 2024 -0400

    Merge pull request rapidsai#16145 from rapidsai/branch-24.06

    Forward-merge branch-24.06 into branch-24.08

commit e932fbd
Author: Vyas Ramasubramani <[email protected]>
Date:   Mon Jul 1 09:17:32 2024 -0700

    Add patch for incorrect cuco noexcept clauses (rapidsai#16077)

    [cuco previously marked a number of methods as noexcept that can in fact
    throw exceptions](NVIDIA/cuCollections#510).
    This causes problems for cudf functions that call these methods. The
    issue [was fixed in cuco
    upstream](NVIDIA/cuCollections#511), but we
    cannot easily update to the latest commit of cuco, especially in a patch
    fix for 24.06. This PR instead adds a rapids-cmake patch for the cuco
    clone to address this issue. The patch may be removed once we update to
    a commit of cuco that contains the necessary fix.

    Resolves rapidsai#16059

commit 599ce95
Author: Lawrence Mitchell <[email protected]>
Date:   Mon Jul 1 09:35:35 2024 +0100

    Implement handlers for series literal in cudf-polars (rapidsai#16113)

    A query plan can contain a "literal" polars Series. Often, for example, when calling a contains-like function. To translate these, introduce a new `LiteralColumn` node to capture the concept and add an evaluation rule (converting from arrow).

    Since list-dtype Series need the same casting treatment as in dataframe scan case, factor the casting out into a utility, and take the opportunity to handled casting of nested lists correctly.

    Authors:
      - Lawrence Mitchell (https://github.com/wence-)

    Approvers:
      - Thomas Li (https://github.com/lithomas1)
      - Vyas Ramasubramani (https://github.com/vyasr)

    URL: rapidsai#16113

commit 7806ce4
Author: Thomas Li <[email protected]>
Date:   Sat Jun 29 00:47:53 2024 +0000

    simplify again

commit e57a677
Merge: e940e30 3c3edfe
Author: Thomas Li <[email protected]>
Date:   Sat Jun 29 00:26:03 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 3c3edfe
Author: Yunsong Wang <[email protected]>
Date:   Fri Jun 28 13:58:22 2024 -0700

    Update implementations to build with the latest cuco (rapidsai#15938)

    This PR updates existing libcudf to accommodate a cuco breaking change introduced in NVIDIA/cuCollections#479. It helps avoid breaking cudf when bumping the cuco version in `rapids-cmake`.

    Redundant equal/hash overloads will be removed once the version bump is done on the `rapids-cmake` end.

    Authors:
      - Yunsong Wang (https://github.com/PointKernel)

    Approvers:
      - David Wendt (https://github.com/davidwendt)
      - Nghia Truong (https://github.com/ttnghia)

    URL: rapidsai#15938

commit df88cf5
Author: Bradley Dice <[email protected]>
Date:   Fri Jun 28 15:40:52 2024 -0500

    Use size_t to allow large conditional joins (rapidsai#16127)

    The conditional join kernels were using `cudf::size_type` where `std::size_t` was needed. This PR fixes that bug, which caused `cudaErrorIllegalAddress` as shown in rapidsai#16115. This closes rapidsai#16115.

    I did not add tests because we typically do not test very large workloads. However, I committed the test and reverted it in this PR, so there is a record of my validation code.

    Authors:
      - Bradley Dice (https://github.com/bdice)

    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)
      - https://github.com/nvdbaranec
      - Yunsong Wang (https://github.com/PointKernel)

    URL: rapidsai#16127

commit fb12d98
Author: Robert Maynard <[email protected]>
Date:   Fri Jun 28 12:14:58 2024 -0400

    Installed cudf header use cudf::allocate_like (rapidsai#16087)

    Remove usage of non public cudf::allocate_like from implementations in headers we install

    Authors:
      - Robert Maynard (https://github.com/robertmaynard)

    Approvers:
      - Yunsong Wang (https://github.com/PointKernel)
      - Nghia Truong (https://github.com/ttnghia)

    URL: rapidsai#16087

commit 78f4a8a
Author: Robert Maynard <[email protected]>
Date:   Fri Jun 28 11:26:27 2024 -0400

    Move common string utilities to public api (rapidsai#16070)

    As part of rapidsai#15982 a subset of the strings utility functions have been identified as being worth expsosing as part of the cudf public API.

    The `create_string_vector_from_column`, `get_offset64_threshold`, and `is_large_strings_enabled` are now made part of the public `cudf::strings` api.

    Authors:
      - Robert Maynard (https://github.com/robertmaynard)

    Approvers:
      - MithunR (https://github.com/mythrocks)
      - David Wendt (https://github.com/davidwendt)
      - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub)
      - Lawrence Mitchell (https://github.com/wence-)

    URL: rapidsai#16070

commit a4b951a
Author: nvdbaranec <[email protected]>
Date:   Fri Jun 28 10:20:42 2024 -0500

    Templatization of fixed-width parquet decoding kernels. (rapidsai#15911)

    This PR merges all of the fixed-width parquet decoding kernels into a single templatized kernel that can be selectively instantiated with desired features (dictionary/no-dictionary, nested/non-nested, etc).  It also adds support for (non-list) nested columns in this path. So structs do not have to use the much slower general decode kernel any more.

    A new benchmark was added specific to structs containing only fixed width columns.  I added this because the performance improvement is fairly high (+20%) but we don't see it in the normal struct benchmarks because they include (and are dominated by) string decode times.  The new benchmark shows:

    Before this PR:
    ```
    | data_type |    io_type    | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size |
    |-----------|---------------|-------------|------------|------------------|-------------------|-------------------|
    |    STRUCT | DEVICE_BUFFER |           0 |          1 |      21071216823 |         1.047 GiB |       511.675 MiB |
    |    STRUCT | DEVICE_BUFFER |        1000 |          1 |      18974392387 |       821.312 MiB |       128.884 MiB |
    |    STRUCT | DEVICE_BUFFER |           0 |         32 |      20429356824 |      621.787 MiB  |        28.141 MiB |
    |    STRUCT | DEVICE_BUFFER |        1000 |         32 |      20572327813 |       598.421 MiB |        16.475 MiB |
    ```

    After this PR:

    ```
    | data_type |    io_type    | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size |
    |-----------|---------------|-------------|------------|------------------|-------------------|-------------------|
    |    STRUCT | DEVICE_BUFFER |           0 |          1 |      25805996399 |         1.047 GiB |       511.675 MiB |
    |    STRUCT | DEVICE_BUFFER |        1000 |          1 |      22422306660 |       821.312 MiB |       128.884 MiB |
    |    STRUCT | DEVICE_BUFFER |           0 |         32 |      24460694014 |       621.787 MiB |        28.141 MiB |
    |    STRUCT | DEVICE_BUFFER |        1000 |         32 |      24674861214 |       598.421 MiB |        16.475 MiB |
    ```

    Split-page decoding for fixed-width types + structs are also going through this new path. New test added.

    This brings us closer to eliminating the "general" kernel.  The only things left that run through it are lists and booleans.

    This is PR 1 of 2, with the followup moving a lot of code around.  At this point, I think it makes sense to start consolidating our files a bit.

    I also left some breadcrumbs (a few small commented out code blocks) in the core kernel `gpuDecodePageDataGeneric` for the next step of adding list support. They can be removed if people don't like them.

    Authors:
      - https://github.com/nvdbaranec

    Approvers:
      - Mike Wilson (https://github.com/hyperbolic2346)
      - Vukasin Milovanovic (https://github.com/vuule)
      - Muhammad Haseeb (https://github.com/mhaseeb123)

    URL: rapidsai#15911

commit e434fdb
Author: David Wendt <[email protected]>
Date:   Fri Jun 28 10:57:01 2024 -0400

    Update libcudf compiler requirements in contributing doc (rapidsai#16103)

    Updates the compiler requirements in the contributing document.

    Authors:
      - David Wendt (https://github.com/davidwendt)

    Approvers:
      - Bradley Dice (https://github.com/bdice)
      - Karthikeyan (https://github.com/karthikeyann)

    URL: rapidsai#16103

commit 565c0d1
Author: Matthew Murray <[email protected]>
Date:   Fri Jun 28 10:16:55 2024 -0400

    Migrate lists/contains to pylibcudf (rapidsai#15981)

    Part of rapidsai#15162.

    Authors:
      - Matthew Murray (https://github.com/Matt711)

    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)

    URL: rapidsai#15981

commit c40e0cc
Author: Matthew Murray <[email protected]>
Date:   Fri Jun 28 10:10:31 2024 -0400

    Add support for proxy `np.flatiter` objects (rapidsai#16107)

    Closes rapidsai#15388

    Authors:
      - Matthew Murray (https://github.com/Matt711)

    Approvers:
      - Matthew Roeschke (https://github.com/mroeschke)

    URL: rapidsai#16107

commit 673d766
Author: Paul Mattione <[email protected]>
Date:   Fri Jun 28 09:38:57 2024 -0400

    Make binary operators work between fixed-point and floating args (rapidsai#16116)

    Some of the binary operators in cuDF don't work between fixed_point and floating-point numbers after [this earlier PR](rapidsai#15438) removed the ability to construct and implicitly cast fixed_point numbers from floating point numbers. This PR restores that functionality by detecting and performing the necessary explicit casts, and adds tests for the supported operators.

    Note that the `binary_op_has_common_type` code is modeled after `has_common_type` found in traits.hpp.

    This closes [issue 16090](rapidsai#16090)

    Authors:
      - Paul Mattione (https://github.com/pmattione-nvidia)

    Approvers:
      - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub)
      - Karthikeyan (https://github.com/karthikeyann)

    URL: rapidsai#16116

commit 224ac5b
Author: David Wendt <[email protected]>
Date:   Fri Jun 28 09:26:37 2024 -0400

    Add libcudf public/detail API pattern to developer guide (rapidsai#16086)

    Adds specific description for the public API to detail API function pattern to the libcudf developer guide.
    Also fixes some formatting issues and broken link.

    Authors:
      - David Wendt (https://github.com/davidwendt)

    Approvers:
      - Shruti Shivakumar (https://github.com/shrshi)
      - Karthikeyan (https://github.com/karthikeyann)

    URL: rapidsai#16086

commit 2b547dc
Author: Matthew Roeschke <[email protected]>
Date:   Fri Jun 28 03:11:01 2024 -1000

    Add ensure_index to not unnecessarily shallow copy cudf.Index (rapidsai#16117)

    The `cudf.Index` constructor will shallow copy a `cudf.Index` input. Sometimes, we just need to make sure an input is a `cudf.Index`, so created `ensure_index` (pandas has something similar) so we don't shallow copy these inputs unnecessarily

    Authors:
      - Matthew Roeschke (https://github.com/mroeschke)

    Approvers:
      - GALI PREM SAGAR (https://github.com/galipremsagar)

    URL: rapidsai#16117

commit 57862a3
Author: Robert Maynard <[email protected]>
Date:   Fri Jun 28 08:43:12 2024 -0400

    stable_distinct public api now has a stream parameter (rapidsai#16068)

    As part of rapidsai#15982 we determined that the cudf  `stable_distinct` public API needs to be updated so that a user provided stream can be provided.

    Authors:
      - Robert Maynard (https://github.com/robertmaynard)

    Approvers:
      - Nghia Truong (https://github.com/ttnghia)
      - Srinivas Yadav (https://github.com/srinivasyadav18)
      - Bradley Dice (https://github.com/bdice)

    URL: rapidsai#16068

commit 6b04fd3
Author: Mads R. B. Kristensen <[email protected]>
Date:   Fri Jun 28 12:31:18 2024 +0200

    Memory Profiling (rapidsai#15866)

    Use [RMM's new memory profiler](rapidsai/rmm#1563) to profile all functions already decorated with `_cudf_nvtx_annotate`.

    Example
    ```python
    import cudf
    from cudf.utils.performance_tracking import print_memory_report

    cudf.set_option("memory_profiling", True)

    df1 = cudf.DataFrame({"a": [1, 2, 3]})
    df2 = cudf.DataFrame({"a": [2, 2, 3]})
    df3 = df1.merge(df2)

    print_memory_report()
    ```

    Output:
    ```
    Memory Profiling
    ================

    Ordered by: memory_peak

    ncalls     memory_peak    memory_total  filename:lineno(function)
         1             272             688  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:4072(DataFrame.merge)
         2              32              64  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1043(DataFrame._init_from_dict_like)
         2              32              64  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:690(DataFrame.__init__)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1131(DataFrame._align_input_series_indices)
         7               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:214(RangeIndex.__init__)
         6               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:424(RangeIndex.__len__)
         4               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:271(Frame.__len__)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:3195(DataFrame._insert)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:270(RangeIndex.name)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:369(RangeIndex.copy)
         5               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:134(Frame._from_data)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:1039(Frame._copy_type_metadata)
         2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/indexed_frame.py:315(IndexedFrame._from_columns_like_self)
    ```

    Authors:
      - Mads R. B. Kristensen (https://github.com/madsbk)

    Approvers:
      - Mark Harris (https://github.com/harrism)
      - Lawrence Mitchell (https://github.com/wence-)
      - Vyas Ramasubramani (https://github.com/vyasr)

    URL: rapidsai#15866

commit e35da6b
Author: Lawrence Mitchell <[email protected]>
Date:   Fri Jun 28 09:54:03 2024 +0100

    Implement Ternary copy_if_else (rapidsai#16114)

    A straightforward evaluation using `copy_if_else`.

    Authors:
      - Lawrence Mitchell (https://github.com/wence-)

    Approvers:
      - https://github.com/brandon-b-miller

    URL: rapidsai#16114

commit e940e30
Author: Thomas Li <[email protected]>
Date:   Thu Jun 27 21:44:41 2024 +0000

    Address code review

    Co-authored-by: Vyas Ramasubramani <[email protected]>

commit c847b98
Author: Lawrence Mitchell <[email protected]>
Date:   Thu Jun 27 21:33:29 2024 +0100

    Finish implementation of cudf-polars boolean function handlers (rapidsai#16098)

    The missing nodes were `is_in`, `not` (both easy), `is_finite` and `is_infinite` (obtained by translating to `contains` calls).

    While here, remove the implementation of `IsBetween` and just translate to an expression with binary operations. This removes the need for special-casing scalar arguments to `IsBetween` and reproducing the code for binop evaluation.

    Authors:
      - Lawrence Mitchell (https://github.com/wence-)

    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)

    URL: rapidsai#16098

commit 2ed69c9
Author: Matthew Roeschke <[email protected]>
Date:   Thu Jun 27 10:11:09 2024 -1000

    Ensure MultiIndex.to_frame deep copies columns (rapidsai#16110)

    Additionally, this allows simplification in `MultiIndex.__repr__` which avoids a shallow copy and also caught a bug where `NaT` was not supposed to be quoted

    Authors:
      - Matthew Roeschke (https://github.com/mroeschke)

    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)

    URL: rapidsai#16110

commit a71c249
Author: GALI PREM SAGAR <[email protected]>
Date:   Thu Jun 27 14:29:31 2024 -0500

    Fix dtype errors in `StringArrays` (rapidsai#16111)

    This PR adds proxy classes for `ArrowStringArray` and `ArrowStringArrayNumpySemantics` that will increase the pandas test pass rate by 1%.

    Authors:
      - GALI PREM SAGAR (https://github.com/galipremsagar)

    Approvers:
      - Matthew Roeschke (https://github.com/mroeschke)

    URL: rapidsai#16111

commit 8fc139f
Merge: 79c1dfd f7cd9e6
Author: Thomas Li <[email protected]>
Date:   Thu Jun 27 18:33:52 2024 +0000

    Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into pylibcudf-io-writers

commit 79c1dfd
Author: Thomas Li <[email protected]>
Date:   Thu Jun 27 18:33:40 2024 +0000

    clean source_or_sink

commit c5a3fbe
Merge: aff6178 5d49fe6
Author: Thomas Li <[email protected]>
Date:   Thu Jun 27 18:25:42 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit f7cd9e6
Author: Thomas Li <[email protected]>
Date:   Wed Jun 26 09:15:50 2024 -0700

    cleanup utils

commit aff6178
Author: Thomas Li <[email protected]>
Date:   Tue Jun 25 20:45:47 2024 +0000

    small test fixes

commit 0ed9af6
Author: Thomas Li <[email protected]>
Date:   Tue Jun 25 19:27:14 2024 +0000

    Fix error in testing utils

    Co-authored-by: Lawrence Mitchell <[email protected]>

commit 9a6a896
Merge: 186a2fb cdfb550
Author: Thomas Li <[email protected]>
Date:   Tue Jun 25 19:12:37 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 186a2fb
Merge: 53b821c 0c6b828
Author: Thomas Li <[email protected]>
Date:   Mon Jun 24 17:19:39 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 53b821c
Merge: 624d444 604c16d
Author: Thomas Li <[email protected]>
Date:   Mon Jun 24 17:19:12 2024 +0000

    Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into pylibcudf-io-writers

commit 624d444
Author: Thomas Li <[email protected]>
Date:   Mon Jun 24 17:17:27 2024 +0000

    fix all nested struct cases

commit e6c3ec7
Author: Thomas Li <[email protected]>
Date:   Mon Jun 24 16:57:29 2024 +0000

    address more comments

commit 604c16d
Author: Thomas Li <[email protected]>
Date:   Mon Jun 24 16:57:29 2024 +0000

    address more comments

commit d22953f
Merge: e0901dd dcc153b
Author: Thomas Li <[email protected]>
Date:   Tue Jun 18 10:19:24 2024 -0700

    Merge branch 'branch-24.08' into pylibcudf-io-writers

commit e0901dd
Author: Thomas Li <[email protected]>
Date:   Mon Jun 17 09:45:19 2024 -0700

    fix bad merge

commit 564358f
Merge: e242182 87f6a7e
Author: Thomas Li <[email protected]>
Date:   Mon Jun 17 09:44:11 2024 -0700

    Merge branch 'branch-24.08' into pylibcudf-io-writers

commit e242182
Author: Thomas Li <[email protected]>
Date:   Thu Jun 13 20:52:23 2024 +0000

    address more comments

commit 699efd3
Author: Thomas Li <[email protected]>
Date:   Thu Jun 13 20:09:43 2024 +0000

    cleanup tests

commit 1228569
Author: Thomas Li <[email protected]>
Date:   Thu Jun 13 18:20:03 2024 +0000

    update following feedback

commit b1951d0
Author: Thomas Li <[email protected]>
Date:   Thu Jun 13 03:01:19 2024 +0000

    try fix

commit 9150a6c
Author: Thomas Li <[email protected]>
Date:   Wed Jun 12 23:48:18 2024 +0000

    try something else

commit 63358e9
Merge: 8c4c4e4 b35991c
Author: Thomas Li <[email protected]>
Date:   Wed Jun 12 23:30:56 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 8c4c4e4
Author: Thomas Li <[email protected]>
Date:   Wed Jun 12 18:31:54 2024 +0000

    address comments

commit dc93356
Merge: c54316e 0891c5d
Author: Thomas Li <[email protected]>
Date:   Wed Jun 12 17:49:26 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit c54316e
Author: Thomas Li <[email protected]>
Date:   Tue Jun 11 20:41:18 2024 +0000

    update

commit cd6df5e
Merge: 2b3853f 8efa64e
Author: Thomas Li <[email protected]>
Date:   Tue Jun 11 17:00:05 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 2b3853f
Author: Thomas Li <[email protected]>
Date:   Tue Jun 11 16:49:14 2024 +0000

    add some tests

commit 8c88c7c
Merge: c24664c 719a8a6
Author: Thomas Li <[email protected]>
Date:   Tue Jun 11 00:19:28 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit c24664c
Author: Thomas Li <[email protected]>
Date:   Fri Jun 7 18:25:06 2024 +0000

    update and start writing tests

commit 72204f1
Merge: 15daaaa 9bd16bb
Author: Thomas Li <[email protected]>
Date:   Fri Jun 7 16:02:25 2024 +0000

    Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers

commit 15daaaa
Author: Thomas Li <[email protected]>
Date:   Fri Jun 7 16:02:10 2024 +0000

    update docs

commit 591cdd2
Author: Thomas Li <[email protected]>
Date:   Thu Jun 6 23:54:58 2024 +0000

    Start migrating I/O writers to pylibcudf (starting with JSON)
  • Loading branch information
lithomas1 committed Jul 1, 2024
1 parent 307e243 commit e1683a4
Show file tree
Hide file tree
Showing 108 changed files with 3,837 additions and 1,762 deletions.
13 changes: 6 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,14 @@ for a minimal build of libcudf without using conda are also listed below.

Compilers:

* `gcc` version 9.3+
* `nvcc` version 11.5+
* `cmake` version 3.26.4+
* `gcc` version 11.4+
* `nvcc` version 11.8+
* `cmake` version 3.29.6+

CUDA/GPU:
CUDA/GPU Runtime:

* CUDA 11.5+
* NVIDIA driver 450.80.02+
* Volta architecture or better (Compute Capability >=7.0)
* CUDA 11.4+
* Volta architecture or better ([Compute Capability](https://docs.nvidia.com/deploy/cuda-compatibility/) >=7.0)

You can obtain CUDA from
[https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).
Expand Down
50 changes: 42 additions & 8 deletions cpp/benchmarks/io/parquet/parquet_reader_input.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -59,20 +59,18 @@ void parquet_read_common(cudf::size_type num_rows_to_read,
}

template <data_type DataType>
void BM_parquet_read_data(nvbench::state& state, nvbench::type_list<nvbench::enum_type<DataType>>)
void BM_parquet_read_data_common(nvbench::state& state,
data_profile const& profile,
nvbench::type_list<nvbench::enum_type<DataType>>)
{
auto const d_type = get_type_or_group(static_cast<int32_t>(DataType));
auto const cardinality = static_cast<cudf::size_type>(state.get_int64("cardinality"));
auto const run_length = static_cast<cudf::size_type>(state.get_int64("run_length"));
auto const source_type = retrieve_io_type_enum(state.get_string("io_type"));
auto const compression = cudf::io::compression_type::SNAPPY;
cuio_source_sink_pair source_sink(source_type);

auto const num_rows_written = [&]() {
auto const tbl = create_random_table(
cycle_dtypes(d_type, num_cols),
table_size_bytes{data_size},
data_profile_builder().cardinality(cardinality).avg_run_length(run_length));
auto const tbl =
create_random_table(cycle_dtypes(d_type, num_cols), table_size_bytes{data_size}, profile);
auto const view = tbl->view();

cudf::io::parquet_writer_options write_opts =
Expand All @@ -85,6 +83,32 @@ void BM_parquet_read_data(nvbench::state& state, nvbench::type_list<nvbench::enu
parquet_read_common(num_rows_written, num_cols, source_sink, state);
}

template <data_type DataType>
void BM_parquet_read_data(nvbench::state& state,
nvbench::type_list<nvbench::enum_type<DataType>> type_list)
{
auto const cardinality = static_cast<cudf::size_type>(state.get_int64("cardinality"));
auto const run_length = static_cast<cudf::size_type>(state.get_int64("run_length"));
BM_parquet_read_data_common<DataType>(
state, data_profile_builder().cardinality(cardinality).avg_run_length(run_length), type_list);
}

template <data_type DataType>
void BM_parquet_read_fixed_width_struct(nvbench::state& state,
nvbench::type_list<nvbench::enum_type<DataType>> type_list)
{
auto const cardinality = static_cast<cudf::size_type>(state.get_int64("cardinality"));
auto const run_length = static_cast<cudf::size_type>(state.get_int64("run_length"));
std::vector<cudf::type_id> s_types{
cudf::type_id::INT32, cudf::type_id::FLOAT32, cudf::type_id::INT64};
BM_parquet_read_data_common<DataType>(state,
data_profile_builder()
.cardinality(cardinality)
.avg_run_length(run_length)
.struct_types(s_types),
type_list);
}

void BM_parquet_read_io_compression(nvbench::state& state)
{
auto const d_type = get_type_or_group({static_cast<int32_t>(data_type::INTEGRAL),
Expand Down Expand Up @@ -247,3 +271,13 @@ NVBENCH_BENCH(BM_parquet_read_io_small_mixed)
.add_int64_axis("cardinality", {0, 1000})
.add_int64_axis("run_length", {1, 32})
.add_int64_axis("num_string_cols", {1, 2, 3});

// a benchmark for structs that only contain fixed-width types
using d_type_list_struct_only = nvbench::enum_type_list<data_type::STRUCT>;
NVBENCH_BENCH_TYPES(BM_parquet_read_fixed_width_struct, NVBENCH_TYPE_AXES(d_type_list_struct_only))
.set_name("parquet_read_fixed_width_struct")
.set_type_axes_names({"data_type"})
.add_string_axis("io_type", {"DEVICE_BUFFER"})
.set_min_samples(4)
.add_int64_axis("cardinality", {0, 1000})
.add_int64_axis("run_length", {1, 32});
7 changes: 6 additions & 1 deletion cpp/cmake/thirdparty/get_cucollections.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# =============================================================================
# Copyright (c) 2021-2022, NVIDIA CORPORATION.
# Copyright (c) 2021-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
Expand All @@ -15,6 +15,11 @@
# This function finds cuCollections and performs any additional configuration.
function(find_and_configure_cucollections)
include(${rapids-cmake-dir}/cpm/cuco.cmake)
include(${rapids-cmake-dir}/cpm/package_override.cmake)

set(cudf_patch_dir "${CMAKE_CURRENT_FUNCTION_LIST_DIR}/patches")
rapids_cpm_package_override("${cudf_patch_dir}/cuco_override.json")

if(BUILD_SHARED_LIBS)
rapids_cpm_cuco(BUILD_EXPORT_SET cudf-exports)
else()
Expand Down
227 changes: 227 additions & 0 deletions cpp/cmake/thirdparty/patches/cuco_noexcept.diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
diff --git a/include/cuco/aow_storage.cuh b/include/cuco/aow_storage.cuh
index 7f9de01..5228193 100644
--- a/include/cuco/aow_storage.cuh
+++ b/include/cuco/aow_storage.cuh
@@ -81,7 +81,7 @@ class aow_storage : public detail::aow_storage_base<T, WindowSize, Extent> {
* @param size Number of windows to (de)allocate
* @param allocator Allocator used for (de)allocating device storage
*/
- explicit constexpr aow_storage(Extent size, Allocator const& allocator = {}) noexcept;
+ explicit constexpr aow_storage(Extent size, Allocator const& allocator = {});

aow_storage(aow_storage&&) = default; ///< Move constructor
/**
@@ -122,7 +122,7 @@ class aow_storage : public detail::aow_storage_base<T, WindowSize, Extent> {
* @param key Key to which all keys in `slots` are initialized
* @param stream Stream used for executing the kernel
*/
- void initialize(value_type key, cuda_stream_ref stream = {}) noexcept;
+ void initialize(value_type key, cuda_stream_ref stream = {});

/**
* @brief Asynchronously initializes each slot in the AoW storage to contain `key`.
diff --git a/include/cuco/detail/open_addressing/open_addressing_impl.cuh b/include/cuco/detail/open_addressing/open_addressing_impl.cuh
index c2c9c14..8ac4236 100644
--- a/include/cuco/detail/open_addressing/open_addressing_impl.cuh
+++ b/include/cuco/detail/open_addressing/open_addressing_impl.cuh
@@ -125,7 +125,7 @@ class open_addressing_impl {
KeyEqual const& pred,
ProbingScheme const& probing_scheme,
Allocator const& alloc,
- cuda_stream_ref stream) noexcept
+ cuda_stream_ref stream)
: empty_slot_sentinel_{empty_slot_sentinel},
erased_key_sentinel_{this->extract_key(empty_slot_sentinel)},
predicate_{pred},
@@ -233,7 +233,7 @@ class open_addressing_impl {
*
* @param stream CUDA stream this operation is executed in
*/
- void clear(cuda_stream_ref stream) noexcept { storage_.initialize(empty_slot_sentinel_, stream); }
+ void clear(cuda_stream_ref stream) { storage_.initialize(empty_slot_sentinel_, stream); }

/**
* @brief Asynchronously erases all elements from the container. After this call, `size()` returns
@@ -599,7 +599,7 @@ class open_addressing_impl {
*
* @return The number of elements in the container
*/
- [[nodiscard]] size_type size(cuda_stream_ref stream) const noexcept
+ [[nodiscard]] size_type size(cuda_stream_ref stream) const
{
auto counter =
detail::counter_storage<size_type, thread_scope, allocator_type>{this->allocator()};
diff --git a/include/cuco/detail/static_map/static_map.inl b/include/cuco/detail/static_map/static_map.inl
index e17a145..3fa1d02 100644
--- a/include/cuco/detail/static_map/static_map.inl
+++ b/include/cuco/detail/static_map/static_map.inl
@@ -123,7 +123,7 @@ template <class Key,
class Allocator,
class Storage>
void static_map<Key, T, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::clear(
- cuda_stream_ref stream) noexcept
+ cuda_stream_ref stream)
{
impl_->clear(stream);
}
@@ -215,7 +215,7 @@ template <class Key,
class Storage>
template <typename InputIt>
void static_map<Key, T, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::
- insert_or_assign(InputIt first, InputIt last, cuda_stream_ref stream) noexcept
+ insert_or_assign(InputIt first, InputIt last, cuda_stream_ref stream)
{
return this->insert_or_assign_async(first, last, stream);
stream.synchronize();
@@ -465,7 +465,7 @@ template <class Key,
class Storage>
static_map<Key, T, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size_type
static_map<Key, T, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size(
- cuda_stream_ref stream) const noexcept
+ cuda_stream_ref stream) const
{
return impl_->size(stream);
}
diff --git a/include/cuco/detail/static_multiset/static_multiset.inl b/include/cuco/detail/static_multiset/static_multiset.inl
index 174f9bc..582926b 100644
--- a/include/cuco/detail/static_multiset/static_multiset.inl
+++ b/include/cuco/detail/static_multiset/static_multiset.inl
@@ -97,7 +97,7 @@ template <class Key,
class Allocator,
class Storage>
void static_multiset<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::clear(
- cuda_stream_ref stream) noexcept
+ cuda_stream_ref stream)
{
impl_->clear(stream);
}
@@ -183,7 +183,7 @@ template <class Key,
class Storage>
static_multiset<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size_type
static_multiset<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size(
- cuda_stream_ref stream) const noexcept
+ cuda_stream_ref stream) const
{
return impl_->size(stream);
}
diff --git a/include/cuco/detail/static_set/static_set.inl b/include/cuco/detail/static_set/static_set.inl
index 645013f..d3cece0 100644
--- a/include/cuco/detail/static_set/static_set.inl
+++ b/include/cuco/detail/static_set/static_set.inl
@@ -98,7 +98,7 @@ template <class Key,
class Allocator,
class Storage>
void static_set<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::clear(
- cuda_stream_ref stream) noexcept
+ cuda_stream_ref stream)
{
impl_->clear(stream);
}
@@ -429,7 +429,7 @@ template <class Key,
class Storage>
static_set<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size_type
static_set<Key, Extent, Scope, KeyEqual, ProbingScheme, Allocator, Storage>::size(
- cuda_stream_ref stream) const noexcept
+ cuda_stream_ref stream) const
{
return impl_->size(stream);
}
diff --git a/include/cuco/detail/storage/aow_storage.inl b/include/cuco/detail/storage/aow_storage.inl
index 3547f4c..94b7f98 100644
--- a/include/cuco/detail/storage/aow_storage.inl
+++ b/include/cuco/detail/storage/aow_storage.inl
@@ -32,8 +32,8 @@
namespace cuco {

template <typename T, int32_t WindowSize, typename Extent, typename Allocator>
-constexpr aow_storage<T, WindowSize, Extent, Allocator>::aow_storage(
- Extent size, Allocator const& allocator) noexcept
+constexpr aow_storage<T, WindowSize, Extent, Allocator>::aow_storage(Extent size,
+ Allocator const& allocator)
: detail::aow_storage_base<T, WindowSize, Extent>{size},
allocator_{allocator},
window_deleter_{capacity(), allocator_},
@@ -64,7 +64,7 @@ aow_storage<T, WindowSize, Extent, Allocator>::ref() const noexcept

template <typename T, int32_t WindowSize, typename Extent, typename Allocator>
void aow_storage<T, WindowSize, Extent, Allocator>::initialize(value_type key,
- cuda_stream_ref stream) noexcept
+ cuda_stream_ref stream)
{
this->initialize_async(key, stream);
stream.synchronize();
diff --git a/include/cuco/static_map.cuh b/include/cuco/static_map.cuh
index c86e90c..95da423 100644
--- a/include/cuco/static_map.cuh
+++ b/include/cuco/static_map.cuh
@@ -269,7 +269,7 @@ class static_map {
*
* @param stream CUDA stream this operation is executed in
*/
- void clear(cuda_stream_ref stream = {}) noexcept;
+ void clear(cuda_stream_ref stream = {});

/**
* @brief Asynchronously erases all elements from the container. After this call, `size()` returns
@@ -387,7 +387,7 @@ class static_map {
* @param stream CUDA stream used for insert
*/
template <typename InputIt>
- void insert_or_assign(InputIt first, InputIt last, cuda_stream_ref stream = {}) noexcept;
+ void insert_or_assign(InputIt first, InputIt last, cuda_stream_ref stream = {});

/**
* @brief For any key-value pair `{k, v}` in the range `[first, last)`, if a key equivalent to `k`
@@ -690,7 +690,7 @@ class static_map {
* @param stream CUDA stream used to get the number of inserted elements
* @return The number of elements in the container
*/
- [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const noexcept;
+ [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const;

/**
* @brief Gets the maximum number of elements the hash map can hold.
diff --git a/include/cuco/static_multiset.cuh b/include/cuco/static_multiset.cuh
index 0daf103..fbcbc9c 100644
--- a/include/cuco/static_multiset.cuh
+++ b/include/cuco/static_multiset.cuh
@@ -235,7 +235,7 @@ class static_multiset {
*
* @param stream CUDA stream this operation is executed in
*/
- void clear(cuda_stream_ref stream = {}) noexcept;
+ void clear(cuda_stream_ref stream = {});

/**
* @brief Asynchronously erases all elements from the container. After this call, `size()` returns
@@ -339,7 +339,7 @@ class static_multiset {
* @param stream CUDA stream used to get the number of inserted elements
* @return The number of elements in the container
*/
- [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const noexcept;
+ [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const;

/**
* @brief Gets the maximum number of elements the multiset can hold.
diff --git a/include/cuco/static_set.cuh b/include/cuco/static_set.cuh
index a069939..3517f84 100644
--- a/include/cuco/static_set.cuh
+++ b/include/cuco/static_set.cuh
@@ -240,7 +240,7 @@ class static_set {
*
* @param stream CUDA stream this operation is executed in
*/
- void clear(cuda_stream_ref stream = {}) noexcept;
+ void clear(cuda_stream_ref stream = {});

/**
* @brief Asynchronously erases all elements from the container. After this call, `size()` returns
@@ -687,7 +687,7 @@ class static_set {
* @param stream CUDA stream used to get the number of inserted elements
* @return The number of elements in the container
*/
- [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const noexcept;
+ [[nodiscard]] size_type size(cuda_stream_ref stream = {}) const;

/**
* @brief Gets the maximum number of elements the hash set can hold.
14 changes: 14 additions & 0 deletions cpp/cmake/thirdparty/patches/cuco_override.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

{
"packages" : {
"cuco" : {
"patches" : [
{
"file" : "${current_json_dir}/cuco_noexcept.diff",
"issue" : "Remove erroneous noexcept clauses on cuco functions that may throw [https://github.com/rapidsai/cudf/issues/16059]",
"fixed_in" : ""
}
]
}
}
}
Loading

0 comments on commit e1683a4

Please sign in to comment.