feat: Introduce `CometTaskMemoryManager` and native side memory pool #83

sunchao · 2024-02-22T01:13:39Z

Which issue does this PR close?

Closes #34.

Rationale for this change

Currently Comet uses the default memory pool implementation in DataFusion, which is not aware of the memory manager on the JVM Spark side. Therefore, in the case when a Spark job has both Spark and Comet operators, we'd need to initialize two memory pools separately for each of them, and make sure there is enough budget in them. In addition, since we cannot trigger spilling from native to JVM, or vise versa, the budget need to be large enough which means Comet typically will need to use more memory than Spark does.

Since Spark already has a UnifiedMemoryManager, this PR proposes to create a new memory pool implementation which delegate calls to the JVM side UnifiedMemoryManager, which serves as the source of truth and serves memory acquisition and release from both JVM and native side.

What changes are included in this PR?

This PR introduces a CometMemoryPool class on the native side, overriding the default memory pool used by DF. This memory pool dispatches calls to Spark's TaskMemoryManager for acquiring and releasing memory.

The newly added memory pool will only be activated when spark.memory.offHeap.enabled is set to true. Otherwise, the behavior remains the same as before (and spark.executor.memoryOverhead need to be large enough for native execution). In TestBosonBase, spark.memory.offHeap.enabled is enabled so all the tests within Comet are tested with the new feature.

How are these changes tested?

All the existing tests are updated to use the new memory manager implementation.

sunchao · 2024-02-23T22:16:08Z

cc @viirya

viirya · 2024-02-23T22:20:58Z

Thanks @sunchao. I will review this in next days (or week).

advancedxy

Wow, this is in great shape. Left some minor comments.

advancedxy · 2024-02-26T12:01:48Z

core/src/execution/jni_api.rs

@@ -103,6 +103,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_createPlan(
    iterators: jobjectArray,
    serialized_query: jbyteArray,
    metrics_node: JObject,
+    task_memory_manager_obj: JObject,


how about rename this to comet_task_memory_manager_obj?

When I first read the code, I thought it was the Spark's TaskMemoryManager object. However it's comet's CometTaskMemoryManager. It would be clear to call it comet_task_memory_manager_obj

Other occurrence could be renamed too.

advancedxy · 2024-02-26T12:03:29Z

core/src/execution/jni_api.rs

+        .get("use_unified_memory_manager")
+        .ok_or(CometError::Internal(
+            "Config 'use_unified_memory_manager' is not specified from Comet JVM side".to_string(),
+        ))?


I believe a more permissive way is to treat unsetting use_unified_memory_manager as false?

Yea sure, although this is an internal error from developer side if not set.

advancedxy · 2024-02-26T12:06:20Z

spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java

+   * share the same API as Spark and cannot trigger spill when acquire memory. Therefore, when
+   * acquiring memory from native or JVM, spilling can only be triggered from JVM operators.
+   */
+  private class NativeMemoryConsumer extends MemoryConsumer {


The consumer's toString might be used when the debugging log is turned on.

It would be great that we can override this class to provide toString method and also add a unique flag/id to identify the corresponding consumer for the native plan/execution.

Sure, I can add a toString for now and we can figure out how to use it for debugging purpose later.

advancedxy · 2024-02-26T12:07:19Z

spark/src/main/scala/org/apache/comet/CometExecIterator.scala

+      cometBatchIterators,
+      protobufQueryPlan,
+      nativeMetrics,
+      new CometTaskMemoryManager)


I'm referring this. I think we can pass id to CometTaskMemoryManager and use that for identity mark.

advancedxy · 2024-02-26T12:10:19Z

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala

-    conf.set("spark.shuffle.manager", shuffleManager)
+    conf.set(SQLConf.ANSI_ENABLED.key, "false")
+    conf.set(SHUFFLE_MANAGER, shuffleManager)
+    conf.set(MEMORY_OFFHEAP_ENABLED.key, "true")


Are we still going to test the default memory pool implementation in DataFusion?

Seems like all the test code path are the unified memory manager now.

In the long term I'm thinking to only use the memory pool defined in Comet. This currently requires users to turn on off-heap mode in Spark and set the off-heap memory accordingly, so configuration changes are necessary when they want to use Comet. Ideally we should be able to use DriverPlugin to override the memory settings so Comet may just work out of box (need to change Spark in a few places).

The default memory manager path is kept only for now until we are able to do the override through DriverPlugin. Internally we still run all the Spark SQL tests using the default memory manager, and can probably do the same here too.

Ideally we should be able to use DriverPlugin to override the memory settings so Comet may just work out of box (need to change Spark in a few places).

Oh. It seems that DriverPlugin is initialized before task scheduler. Which places in Spark do we need for CometDriverPlugin to override memory settings? The memory overhead is already override by comet.

and can probably do the same here too.

Is this still in this PR's scope?

Oh. It seems that DriverPlugin is initialized before task scheduler. Which places in Spark do we need for CometDriverPlugin to override memory settings? The memory overhead is already override by comet.

I already made one change in Spark: apache/spark#45052 for this. We'll need a few more changes so we can completely overwrite executor memory setting through DriverPlugin.

Is this still in this PR's scope?

Not really. Will do that in #8

I already made one change in Spark: apache/spark#45052 for this. We'll need a few more changes so we can completely overwrite executor memory setting through DriverPlugin.

Great work. Looking forward that we can completely overwrite memory settings through DriverPlugin.

advancedxy

LGTM from my side, except the conflict should be resolved.

snmvaughan

LGTM. Good catch on check_exception in jni_call.

viirya · 2024-03-04T06:45:16Z

core/src/execution/memory_pool.rs

+    fn grow(&self, _: &MemoryReservation, additional: usize) {
+        self.used.fetch_add(additional, Relaxed);
+    }


Don't we need to update (acquire) the required memory from JVM memory manager?

I think grow is not really used by DataFusion except in tests, that's why I didn't do it. But you are right, it's better to add it too for future proof.

viirya · 2024-03-04T06:50:04Z

core/src/execution/memory_pool.rs

+                if acquired < additional as i64 {
+                    return Err(DataFusionError::Execution(format!(
+                        "Failed to acquire {} bytes, only got {}. Reserved: {}",
+                        additional,
+                        acquired,
+                        self.reserved(),
+                    )));
+                }


When we fail to get the required number and return error, I think we should notify the JVM memory manager to release the allocated acquired bytes.

Good point. Will add.

viirya · 2024-03-04T06:51:41Z

core/src/jvm_bridge/comet_task_memory_manager.rs

+use crate::jvm_bridge::get_global_jclass;
+
+/// A DataFusion `MemoryPool` implementation for Comet, which delegate to the JVM
+/// side `CometTaskMemoryManager`.


The comment looks not correct as CometMemoryPool is the one implementing DataFusion MemoryPool. Maybe this should be moved to CometMemoryPool?

Yea let me move it.

viirya · 2024-03-04T06:54:18Z

spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java

+  // Called by Comet native through JNI.
+  // Returns the actual amount of memory (in bytes) granted.
+  public long acquireMemory(long size) {
+    return internal.acquireExecutionMemory(size, nativeMemoryConsumer);
+  }


As we claim Send and Sync for CometMemoryPool, should we make this synchronized method?

I think TaskMemoryManager is already synchronized on the acquireExecutionMemory and releaseExecutionMemory, so it doesn't seem necessary.

Oh okay, then it is fine.

viirya · 2024-03-04T06:57:47Z

spark/src/test/scala/org/apache/spark/sql/CometTPCHQuerySuite.scala

@@ -87,10 +89,11 @@ class CometTPCHQuerySuite extends QueryTest with CometTPCBase with SQLQueryTestH
      "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager")
    conf.set(CometConf.COMET_ENABLED.key, "true")
    conf.set(CometConf.COMET_EXEC_ENABLED.key, "true")
-    conf.set(CometConf.COMET_MEMORY_OVERHEAD.key, "2g")


We don't need it?

This is now replaced by

conf.set(MEMORY_OFFHEAP_ENABLED.key, "true") conf.set(MEMORY_OFFHEAP_SIZE.key, "2g")

below

viirya · 2024-03-04T06:58:57Z

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala

-        SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1g",
-        SQLConf.ADAPTIVE_AUTO_BROADCASTJOIN_THRESHOLD.key -> "1g",


For broadcast join threshold configs, should we keep them?

Oops I accidentally removed these when rebasing. Let me add them back.

viirya · 2024-03-05T20:49:46Z

@sunchao Do you see we use less memory (e.g., no more extra memory overhead needed) than before this patch when running TPCH/TPCDS queries on EKS?

sunchao · 2024-03-06T05:50:55Z

I tried this with columnar shuffle sometime back and in TPC-DS there were other issues that caused OOM. I plan to try benchmarking this again soon. Let me merge this PR first and address remaining issues in follow-ups.

sunchao · 2024-03-06T05:51:07Z

Thanks, merged

sunchao force-pushed the memory-manager-comet branch 2 times, most recently from 76002b9 to 26f4802 Compare February 22, 2024 21:58

sunchao marked this pull request as ready for review February 23, 2024 22:15

advancedxy reviewed Feb 26, 2024

View reviewed changes

sunchao force-pushed the memory-manager-comet branch from d58ab8d to 0cfb2f7 Compare February 28, 2024 18:53

advancedxy approved these changes Feb 29, 2024

View reviewed changes

snmvaughan approved these changes Mar 3, 2024

View reviewed changes

viirya reviewed Mar 4, 2024

View reviewed changes

sunchao added 6 commits March 4, 2024 11:23

feat: Introduce CometTaskMemoryManager and native side memory pool

460ba7f

add license header

808214a

add license header

2ad7aca

comments

a8ce1a9

fix format

c2ce536

comments

dd67291

sunchao force-pushed the memory-manager-comet branch from 54db52b to dd67291 Compare March 4, 2024 19:24

viirya approved these changes Mar 5, 2024

View reviewed changes

sunchao merged commit e83635a into apache:main Mar 6, 2024
19 checks passed

Kontinuation mentioned this pull request Aug 29, 2024

SparkOutOfMemoryError happens when running CometColumnarExchange #886

Closed

Kontinuation mentioned this pull request Sep 18, 2024

A more comprehensive tuning guide for memory related options #949

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Introduce `CometTaskMemoryManager` and native side memory pool #83

feat: Introduce `CometTaskMemoryManager` and native side memory pool #83

sunchao commented Feb 22, 2024 •

edited

Loading

sunchao commented Feb 23, 2024

viirya commented Feb 23, 2024

advancedxy left a comment

advancedxy Feb 26, 2024

sunchao Feb 27, 2024

advancedxy Feb 26, 2024

sunchao Feb 27, 2024

advancedxy Feb 26, 2024

sunchao Feb 27, 2024

advancedxy Feb 26, 2024

advancedxy Feb 26, 2024

sunchao Feb 27, 2024

advancedxy Feb 29, 2024

sunchao Feb 29, 2024

advancedxy Feb 29, 2024

advancedxy left a comment

snmvaughan left a comment

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya Mar 4, 2024

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya Mar 4, 2024

sunchao Mar 4, 2024

viirya commented Mar 5, 2024 •

edited

Loading

sunchao commented Mar 6, 2024

sunchao commented Mar 6, 2024

		SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1g",
		SQLConf.ADAPTIVE_AUTO_BROADCASTJOIN_THRESHOLD.key -> "1g",

feat: Introduce CometTaskMemoryManager and native side memory pool #83

feat: Introduce CometTaskMemoryManager and native side memory pool #83

Conversation

sunchao commented Feb 22, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

sunchao commented Feb 23, 2024

viirya commented Feb 23, 2024

advancedxy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

snmvaughan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Mar 5, 2024 • edited Loading

sunchao commented Mar 6, 2024

sunchao commented Mar 6, 2024

feat: Introduce `CometTaskMemoryManager` and native side memory pool #83

feat: Introduce `CometTaskMemoryManager` and native side memory pool #83

sunchao commented Feb 22, 2024 •

edited

Loading

viirya commented Mar 5, 2024 •

edited

Loading