Skip to content

Commit

Permalink
Add batch query support for drop step [tp-tests]
Browse files Browse the repository at this point in the history
- Reuse multi-query optimization for TinkerPop DropStep
- Change restriction on eligible multi-query traversals and allow multi-query optimizations to be used for queries with steps which contain drop() step

Signed-off-by: Oleksandr Porunov <[email protected]>
  • Loading branch information
porunov committed Oct 30, 2024
1 parent af63176 commit 72e2427
Show file tree
Hide file tree
Showing 24 changed files with 701 additions and 85 deletions.
26 changes: 26 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,32 @@ Example: `storage.berkeleyje.ext.je.lock.timeout=5000 ms`
For simplicity JSON schema initialization options has been added into JanusGraph.
See [documentation](./schema/schema-init-strategies.md) to learn more about JSON schema initialization process.

##### Batched Queries Enhancement: Introduction of `JanusGraphNoOpBarrierVertexOnlyStep`

In previous versions, when a query that could benefit from batch-query optimization (multi-query) was executed without
a user-defined barrier step, JanusGraph would inject a `NoOpBarrierStep` by default. This approach allowed batching
for edges and properties, which do not gain advantages from multi-query optimization.

Starting with JanusGraph 1.1.0, this behavior has been improved. The system now injects a
`JanusGraphNoOpBarrierVertexOnlyStep` instead of the standard `NoOpBarrierStep` when no barrier steps are detected.
This change ensures that batching is applied exclusively to vertices, which do benefit from batch queries,
while excluding edges and properties from the batching process.

If a user explicitly defines a `.barrier()` step in the query, the system will continue to use the `NoOpBarrierStep` as expected.

##### Batch Query Optimizations Now Support Traversals Containing the `drop()` Step

Starting with JanusGraph 1.1.0, batch optimizations for vertex removal have been introduced in the `drop()` step and
are enabled by default. Previously, any batch optimization would be skipped for queries containing at least one
`drop()` step. However, with this update, such queries are now eligible for batch query optimization (multi-query).

Please note that the `LazyBarrierStrategy` (a TinkerPop strategy) is disabled for any query that includes at least one `drop()` step.

To disable the `drop()` step optimization and maintain the previous behavior, users can set the following configuration:
```
query.batch.drop-step-mode=none
```

### Version 1.0.1 (Release Date: ???)

/// tab | Maven
Expand Down
1 change: 1 addition & 0 deletions docs/configs/janusgraph-cfg.md
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,7 @@ Configuration options to configure batch queries optimization behavior

| Name | Description | Datatype | Default Value | Mutability |
| ---- | ---- | ---- | ---- | ---- |
| query.batch.drop-step-mode | Batching mode for `drop()` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all` - Drops all vertices in a batch.<br>- `none` - Skips drop batching optimization.<br> | String | all | MASKABLE |
| query.batch.enabled | Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend. If `false` then all other configuration options under `query.batch` namespace are ignored. | Boolean | true | MASKABLE |
| query.batch.has-step-mode | Properties pre-fetching mode for `has` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all_properties` - Pre-fetch all vertex properties on any property access (fetches all vertex properties in a single slice query)<br>- `required_properties_only` - Pre-fetch necessary vertex properties for the whole chain of foldable `has` steps (uses a separate slice query per each required property)<br>- `required_and_next_properties` - Prefetch the same properties as with `required_properties_only` mode, but also prefetch<br>properties which may be needed in the next properties access step like `values`, `properties,` `valueMap`, `elementMap`, or `propertyMap`.<br>In case the next step is not one of those properties access steps then this mode behaves same as `required_properties_only`.<br>In case the next step is one of the properties access steps with limited scope of properties, those properties will be<br>pre-fetched together in the same multi-query.<br>In case the next step is one of the properties access steps with unspecified scope of property keys then this mode<br>behaves same as `all_properties`.<br>- `required_and_next_properties_or_all` - Prefetch the same properties as with `required_and_next_properties`, but in case the next step is not<br>`values`, `properties,` `valueMap`, `elementMap`, or `propertyMap` then acts like `all_properties`.<br>- `none` - Skips `has` step batch properties pre-fetch optimization.<br> | String | required_and_next_properties | MASKABLE |
| query.batch.label-step-mode | Labels pre-fetching mode for `label()` step. Used only when `query.batch.enabled` is `true`.<br>Supported modes:<br>- `all` - Pre-fetch labels for all vertices in a batch.<br>- `none` - Skips vertex labels pre-fetching optimization.<br> | String | all | MASKABLE |
Expand Down
3 changes: 2 additions & 1 deletion docs/operations/batch-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Batched query processing takes into account two types of steps:

1. Batch compatible step. This is the step which will execute batch requests. Currently, the list of such steps
is the next: `out()`, `in()`, `both()`, `inE()`, `outE()`, `bothE()`, `has()`, `values()`, `properties()`, `valueMap()`,
`propertyMap()`, `elementMap()`, `label()`.
`propertyMap()`, `elementMap()`, `label()`, `drop()`.
2. Parent step. This is a parent step which has local traversals with the same start. Such parent steps also implement the
interface `TraversalParent`. There are many such steps, but as for an example those could be: `and(...)`, `or(...)`,
`not(...)`, `order().by(...)`, `project("valueA", "valueB", "valueC").by(...).by(...).by(...)`, `union(..., ..., ...)`,
Expand Down Expand Up @@ -331,3 +331,4 @@ See configuration option `query.batch.has-step-mode` to control properties pre-f
See configuration option `query.batch.properties-mode` to control properties pre-fetching behaviour for `values`,
`properties`, `valueMap`, `propertyMap`, and `elementMap` steps.
See configuration option `query.batch.label-step-mode` to control labels pre-fetching behaviour for `label` step.
See configuration option `query.batch.drop-step-mode` to control drop batching behaviour for `drop` step.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
import org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropStep;
import org.apache.tinkerpop.gremlin.process.traversal.step.filter.HasStep;
import org.apache.tinkerpop.gremlin.process.traversal.step.util.WithOptions;
import org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.SubgraphStrategy;
Expand Down Expand Up @@ -60,6 +61,7 @@
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.RelationType;
import org.janusgraph.core.SchemaViolationException;
import org.janusgraph.core.Transaction;
import org.janusgraph.core.VertexLabel;
import org.janusgraph.core.VertexList;
import org.janusgraph.core.attribute.Cmp;
Expand Down Expand Up @@ -138,10 +140,12 @@
import org.janusgraph.graphdb.relations.StandardVertexProperty;
import org.janusgraph.graphdb.serializer.SpecialInt;
import org.janusgraph.graphdb.serializer.SpecialIntSerializer;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphDropStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphElementMapStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphHasStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphPropertiesStep;
import org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphPropertyMapStep;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryHasStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryLabelStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryPropertiesStrategyMode;
Expand Down Expand Up @@ -220,6 +224,7 @@
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE_CLEAN_WAIT;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DB_CACHE_TIME;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.DROP_STEP_BATCH_MODE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.FORCE_INDEX_USAGE;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.HARD_MAX_LIMIT;
import static org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.HAS_STEP_BATCH_MODE;
Expand Down Expand Up @@ -10074,11 +10079,7 @@ public void testMultiQueryDropsVertices() {

int verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
vertex.property("name", "name_test");
vertex.property("details", "details_" + i);
}
addVerticesForDropTest(verticesAmount, tx);

clopen();

Expand All @@ -10090,20 +10091,161 @@ public void testMultiQueryDropsVertices() {
.map(v -> (JanusGraphVertex) v)
.collect(Collectors.toList());

int actualCount = tx.multiQuery(vertices).drop();
int actualCount = tx.multiQuery(vertices).drop().size();
clopen();

assertEquals(verticesAmount, actualCount);

int afterDropCount = tx.traversal()
.V()
.has("name", "name_test")
.toList()
.size();
long afterDropCount = getVerticesForDropTestCount(tx.traversal());

assertEquals(0, afterDropCount);
}

@Test
public void testMultiQueryDropsStrategyModes() {

mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
PropertyKey nameProp = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("details").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.buildIndex("nameIndex", Vertex.class).addKey(nameProp).buildCompositeIndex();

finishSchema();

long verticesAmount = 42;

// Mode: NONE

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.NONE.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
TraversalMetrics profileT = graph.traversal().V().drop().profile().next();
assertTrue(profileT.getMetrics().stream().anyMatch(metrics -> metrics.getName().equals(DropStep.class.getSimpleName())));
graph.tx().commit();
assertEquals(0, getVerticesForDropTestCount());

// Mode: ALL

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
profileT = graph.traversal().V().drop().profile().next();
assertEquals("true", profileT.getMetrics().stream().filter(metrics -> metrics.getName().equals(JanusGraphDropStep.class.getSimpleName())).findAny().get().getAnnotation("multi"));
graph.tx().commit();
assertEquals(0, getVerticesForDropTestCount());

// `limit` with `drop` step.

addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.NONE.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
int limitSize = 2;
profileT = graph.traversal().V().limit(limitSize).drop().profile().next();
assertTrue(profileT.getMetrics().stream().anyMatch(metrics -> metrics.getName().equals(DropStep.class.getSimpleName())));
graph.tx().commit();
long afterDropCount = getVerticesForDropTestCount();
assertEquals(verticesAmount-limitSize, afterDropCount);
graph.traversal().V().drop().iterate();
graph.tx().commit();
addVerticesForDropTest(verticesAmount);
graph.tx().commit();
clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());
assertEquals(verticesAmount, getVerticesForDropTestCount());
profileT = graph.traversal().V().limit(limitSize).drop().profile().next();
assertEquals("true", profileT.getMetrics().stream().filter(metrics -> metrics.getName().equals(JanusGraphDropStep.class.getSimpleName())).findAny().get().getAnnotation("multi"));
graph.tx().commit();
afterDropCount = getVerticesForDropTestCount();
assertEquals(verticesAmount-limitSize, afterDropCount);
}

@Test
public void testMetaPropertiesDrop(){
mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
PropertyKey nameProp = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("details").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.buildIndex("nameIndex", Vertex.class).addKey(nameProp).buildCompositeIndex();

finishSchema();

long verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
VertexProperty<String> property = vertex.property("name", "name_test");
property.property("details", "details_" + i);
}
graph.tx().commit();

clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());

assertEquals(verticesAmount, graph.traversal().V().properties("name").properties("details").count().next());

graph.traversal().V().properties("name").properties("details").drop().hasNext();

assertEquals(0, graph.traversal().V().properties("name").properties("details").count().next());

graph.tx().commit();

assertEquals(0, graph.traversal().V().properties("name").properties("details").count().next());
assertEquals(verticesAmount, graph.traversal().V().has("name").count().next());
}

@Test
public void testEdgePropertiesDrop(){
mgmt.makePropertyKey("id").dataType(Integer.class).cardinality(Cardinality.SINGLE).make();
mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make();
mgmt.makeEdgeLabel("relate").make();

finishSchema();

long verticesAmount = 42;

for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
Vertex vertex2 = tx.addVertex("id", i+verticesAmount);
vertex.addEdge("relate", vertex2).property("name", "name_"+i);
}

graph.tx().commit();

clopen(option(DROP_STEP_BATCH_MODE), MultiQueryDropStepStrategyMode.ALL.getConfigName());

assertEquals(verticesAmount, graph.traversal().E().properties("name").count().next());

graph.traversal().E().properties("name").drop().hasNext();

assertEquals(0, graph.traversal().E().properties("name").count().next());

graph.tx().commit();

assertEquals(0, graph.traversal().E().properties("name").count().next());
assertEquals(verticesAmount, graph.traversal().E().count().next());
}

private void addVerticesForDropTest(long verticesAmount){
addVerticesForDropTest(verticesAmount, graph);
}

private long getVerticesForDropTestCount(){
return getVerticesForDropTestCount(graph.traversal());
}

private void addVerticesForDropTest(long verticesAmount, Transaction tx){
for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = tx.addVertex("id", i);
vertex.property("name", "name_test");
vertex.property("details", "details_" + i);
}
}

private long getVerticesForDropTestCount(GraphTraversalSource g){
return g.V()
.has("name", "name_test")
.count().next();
}

@ParameterizedTest
@ValueSource(booleans = {true, false})
public void testParallelBackendOps(boolean parallelBackendOpsEnabled) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.janusgraph.diskstorage.configuration.WriteConfiguration;
import org.janusgraph.diskstorage.cql.CQLConfigOptions;
import org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
Expand Down Expand Up @@ -65,6 +66,7 @@ public WriteConfiguration getConfiguration() {
config.set(GraphDatabaseConfiguration.STORAGE_BACKEND,"cql");
config.set(CQLConfigOptions.LOCAL_DATACENTER, "dc1");
config.set(GraphDatabaseConfiguration.USE_MULTIQUERY, true);
config.set(GraphDatabaseConfiguration.DROP_STEP_BATCH_MODE, MultiQueryDropStepStrategyMode.NONE.getConfigName());
return config.getConfiguration();
}

Expand Down Expand Up @@ -103,7 +105,7 @@ public Integer dropVertices() {
.map(v -> (JanusGraphVertex) v)
.collect(Collectors.toList());

dropCount = tx.multiQuery(vertices).drop();
dropCount = tx.multiQuery(vertices).drop().size();
} else {
dropCount = tx.traversal()
.V()
Expand All @@ -117,6 +119,27 @@ public Integer dropVertices() {
return dropCount;
}

@Benchmark
public Integer dropVerticesGremlinQuery() {

JanusGraphTransaction tx;
if (isMultiDrop) {
tx = graph.buildTransaction().setDropStepStrategyMode(MultiQueryDropStepStrategyMode.ALL).start();
} else {
tx = graph.buildTransaction().setDropStepStrategyMode(MultiQueryDropStepStrategyMode.NONE).start();
}

Integer dropCount = tx.traversal()
.V()
.has("name", "name_test")
.drop()
.toList()
.size();

tx.rollback();
return dropCount;
}

private void addVertices() {
for (int i = 0; i < verticesAmount; i++) {
Vertex vertex = graph.addVertex("id", i);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ public interface JanusGraphMultiVertexQuery<Q extends JanusGraphMultiVertexQuery
*/
JanusGraphMultiVertexQuery addAllVertices(Collection<? extends Vertex> vertices);


@Override
Q adjacent(Vertex vertex);

Expand Down Expand Up @@ -156,7 +155,8 @@ public interface JanusGraphMultiVertexQuery<Q extends JanusGraphMultiVertexQuery
/**
* Drops all vertices that match this query
*
* @return Count of dropped vertices
* @return Map of vertices and their relations which were dropped
*/
Integer drop();
Map<JanusGraphVertex, Iterable<JanusGraphRelation>> drop();

}
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
package org.janusgraph.core;

import org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryDropStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryHasStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryLabelStepStrategyMode;
import org.janusgraph.graphdb.tinkerpop.optimize.strategy.MultiQueryPropertiesStrategyMode;
Expand Down Expand Up @@ -186,6 +187,15 @@ public interface TransactionBuilder {
*/
TransactionBuilder setLabelsStepStrategyMode(MultiQueryLabelStepStrategyMode labelStepStrategyMode);

/**
* Sets `drop` step strategy mode.
* <p>
* Doesn't have any effect if multi-query was disabled via config `query.batch.enabled = false`.
*
* @return Object with the set drop strategy mode settings
*/
TransactionBuilder setDropStepStrategyMode(MultiQueryDropStepStrategyMode dropStepStrategyMode);

/**
* Sets the group name for this transaction which provides a way for gathering
* reporting on multiple transactions into one group.
Expand Down
Loading

0 comments on commit 72e2427

Please sign in to comment.