From d25ba6da42c70b8bedc5c53c17f846f14cf5d177 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jens=20Pryce-=C3=85klundh?= <112686610+JPryce-Aklundh@users.noreply.github.com> Date: Wed, 10 Jan 2024 12:06:33 +0100 Subject: [PATCH] post-review update --- modules/ROOT/pages/indexes/index.adoc | 17 +++------ .../index-hints.adoc | 2 +- .../managing-indexes.adoc | 26 +++----------- .../search-performance-indexes/overview.adoc | 14 ++++---- .../using-indexes.adoc | 36 +++++++++---------- 5 files changed, 34 insertions(+), 61 deletions(-) diff --git a/modules/ROOT/pages/indexes/index.adoc b/modules/ROOT/pages/indexes/index.adoc index b48dcc93d..36264dc26 100644 --- a/modules/ROOT/pages/indexes/index.adoc +++ b/modules/ROOT/pages/indexes/index.adoc @@ -7,16 +7,9 @@ In short, much like indexes in a book, their function in a Neo4j graph database Once an index has been created, it will be automatically populated and updated by the DBMS. -Neo4j supports two categories of indexes: xref:indexes/search-performance-indexes/overview.adoc[search-performance indexes] (including range, text, point, and token lookup indexes) and xref:indexes/semantic-indexes/overview.adoc[semantic indexes] (including full-text and vector indexes). +Neo4j supports two categories of indexes: -[[naming-rules-and-recommendations]] -== Naming rules and best practices - -The following is true for naming indexes: - -* Best practice is to give the index a name when it is created. -If the index is not explicitly named, it gets an auto-generated name. -* The index name must be unique among both indexes and xref:constraints/index.adoc[constraints]. -* Index creation is by default not idempotent, and an error will be thrown if you attempt to create the same index twice. -Using the keyword `IF NOT EXISTS` makes the command idempotent, and no error will be thrown if you attempt to create the same index twice. -* As of Neo4j 5.16, indexes can be named using parameters. \ No newline at end of file +- xref:indexes/search-performance-indexes/overview.adoc[Search-performance indexes], for speeding up data retrieval based on _exact_ matches. +This category includes range, text, point, and token lookup indexes. +- xref:indexes/semantic-indexes/overview.adoc[Semantic indexes], for _approximate_ matches and to compute similarity scores between a query string and the matching data. +This category includes full-text and vector indexes. diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/index-hints.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/index-hints.adoc index 8b0986370..4944a7d77 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/index-hints.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/index-hints.adoc @@ -1,7 +1,7 @@ :description: A planner hint is used to influence the decisions of the planner when building an execution plan for a query. [[query-using]] -= Index hints += Index hints for the Cypher planner A planner hint is used to influence the decisions of the planner when building an execution plan for a query. Planner hints are specified in a query with the `USING` keyword. diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc index 6d0461024..acb4fcb33 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc @@ -1,5 +1,5 @@ :description: This page explains how to manage indexes used for search performance. -= Managing search-performance indexes += Create, show, and delete indexes This page describes how to create, list, and delete search-performance indexes. The following index types are included in this category: @@ -207,15 +207,11 @@ Note that the index name must be unique. Text indexes have no supported index configuration and, as of Neo4j 5.1, they have two index providers available, `text-2.0` (default) and `text-1.0` (deprecated). -[TIP] -Text indexes only store `STRING` values and do not support multiple properties. - [[text-indexes-supported-predicates]] [discrete] ==== Supported predicates Text indexes only solve predicates operating on `STRING` values. -That means that text indexes are only used in Cypher queries when it is known that the predicate evaluates to `null` or `false` for all non-`STRING` values. The following predicates that only operate on `STRING` values are always solvable by a text index: @@ -274,11 +270,10 @@ CONTAINS |=== As of Neo4j 5.11, the above set of predicates can be extended with the use of type constraints. -See xref:indexes/search-performance-indexes/using-indexes.adoc#index-compatibility-type-constraints[Index compatibility and type constraints] for more information. +See the section about xref:indexes/search-performance-indexes/using-indexes.adoc#type-constraints[index compatibility and type constraints] for more information. [TIP] -Unlike text indexes, full-text indexes can search the content of `STRING` properties. -For more information, see the page about xref:indexes/semantic-indexes/full-text-indexes.adoc[full-text indexes]. +Text indexes are only used for exact query matches. To perform approximate matches (including, for example, variations and typos), and to compute a similarity score between `STRING` values, use semantic xref:indexes/semantic-indexes/full-text-indexes.adoc[full-text indexes] instead. [discrete] [[text-indexes-examples]] @@ -377,16 +372,11 @@ Note that the index name must be unique. Point indexes have supported index configuration, but only one index provider available, `point-1.0`. -[TIP] -Point indexes only store `POINT` values and do not support multiple properties. - [discrete] [[point-indexes-supported-predicates]] ==== Supported predicates Point indexes only solve predicates operating on `POINT` values. -Therefore, point indexes are only used when it is known that the predicate evaluates to `null` or `false` for all non-`POINT` values. - Point indexes support the following predicates: @@ -634,18 +624,10 @@ Only one relationship type lookup index can exist at a time. If it is not known whether an index exists or not, add `IF NOT EXISTS` to ensure it does. -.Parameters -[source,javascript, indent=0] ----- -{ - "name": "node_label_lookup" -} ----- - .Creating a node label lookup index with `IF NOT EXISTS` [source, cypher] ---- -CREATE LOOKUP INDEX $name IF NOT EXISTS FOR (n) ON EACH labels(n) +CREATE LOOKUP INDEX node_label_lookup IF NOT EXISTS FOR (n) ON EACH labels(n) ---- The index will not be created if there already exists an index with the same schema and type, same name or both. diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/overview.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/overview.adoc index dcd120591..7b8429bd4 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/overview.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/overview.adoc @@ -5,20 +5,20 @@ Search-performance indexes enable quicker retrieval of exact matches between an There are four different search-performance indexes available in Neo4j: * *Range indexes*: Neo4j’s default index. -Supports most types of predicates, including node label and relationship types. +Supports most types of predicates. * *Text indexes*: solves predicates operating on `STRING` values. -Optimized for queries filtering `STRING` properties for what they `CONTAIN` or `ENDS WITH`. +Optimized for queries filtering with the `STRING` operators `CONTAINS` and `ENDS WITH`. * *Point indexes*: solves predicates on spatial `POINT` values. -Optimized for queries filtered on distance or within bounding boxes. +Optimized for queries filtering on distance or within bounding boxes. -* *Token lookup indexes*: only solves node label and relationship type predicates (i.e. they cannot solve any predicates filtered on properties). +* *Token lookup indexes*: only solves node label and relationship type predicates (i.e. they cannot solve any predicates filtering on properties). Two token lookup indexes (one for node labels and one for relationship types) are present when a database is created in Neo4j. -To learn more about creating, listing, and deleting these indexes, as well as more details about the predicates supported by each index type, see xref:indexes/search-performance-indexes/managing-indexes.adoc[Managing search-performance Indexes]. +To learn more about creating, listing, and deleting these indexes, as well as more details about the predicates supported by each index type, see xref:indexes/search-performance-indexes/managing-indexes.adoc[]. -For information about how Cypher uses the various types of search-performance indexes, as well as some heuristics for when to use (and not to use) a search-performance index, see xref:indexes/search-performance-indexes/using-indexes.adoc[Using search-performance indexes]. +For information about how indexes impact the performance of Cypher queries, as well as some heuristics for when to use (and not to use) a search-performance index, see xref:indexes/search-performance-indexes/using-indexes.adoc[]. Search-performance indexes are used automatically, and if several indexes are available, the xref:planning-and-tuning/execution-plans.adoc[Cypher planner] will try to use the index (or indexes) that can most efficiently solve a particular predicate. -It is, however, possible to explicitly force a query to use a particular index with the `USING` keyword. For more information, see xref:indexes/search-performance-indexes/index-hints.adoc[Index hints]. +It is, however, possible to explicitly force a query to use a particular index with the `USING` keyword. For more information, see xref:indexes/search-performance-indexes/index-hints.adoc[]. diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc index 2888e8d03..aab706148 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc @@ -1,6 +1,6 @@ :description: Information about how to use the search-performance indexes in Neo4j. :test-skip: true -= Using search-performance indexes += The impact of indexes on query performance Search-performance indexes enable quicker and more efficient pattern matching by solving a particular combination of node label/relationship type and property predicate. They are used automatically by the Cypher planner in `MATCH` clauses, usually at the start of a query, to scan the graph for the most appropriate place to start the pattern-matching process. @@ -14,8 +14,8 @@ It will also provide some general heuristics for when to use indexes, and advice The examples on this page center around finding routes and points of interest in Central Park, New York, based on data provided by link:https://www.openstreetmap.org/[OpenStreetMap]. The data model contains two node labels: -* `OSMNode` (Open Street Map Node) - a junction node with geo-spatial properties linking together routes from specific points. -* `PointOfInterest` - a subcategory of `OSMNode`. +* `OSMNode` (Open Street Map Node) -- a junction node with geo-spatial properties linking together routes from specific points. +* `PointOfInterest` -- a subcategory of `OSMNode`. In addition to geospatial properties, these nodes also contain information about specific points of interest, such as statues, baseball courts, etc. in Central Park. The data model also contains one relationship type: `ROUTE`, which specifies the distance in meters between the nodes in the graph. @@ -34,7 +34,7 @@ Two token lookup indexes are present by default when creating a Neo4j database. They store copies of all node labels and relationship types in the database and only solve node label and relationship type predicates. The following query footnote:[The example queries on this page are prepended with `PROFILE`. This both runs the query and generates its execution plan. -For more information, see xref:planning-and-tuning/index.adoc#profile-and-explain[Execution plans and query tuning -> Note on PROFILE and EXPLAIN].], which counts the number of `PointOfInterest` nodes that have a `baseball` `type` value, will access the node label lookup index: +For more information, see xref:planning-and-tuning/index.adoc#profile-and-explain[Execution plans and query tuning -> Note on PROFILE and EXPLAIN].], which counts the number of `PointOfInterest` nodes that have value `baseball` for the `type` property, will access the node label lookup index: .Query [source,cypher] @@ -223,7 +223,7 @@ RETURN n.name, n.type Total database accesses: 7, total allocated memory: 312 ---- -The reason for is that range indexes store `STRING` values alphabetically. +This is because range indexes store `STRING` values alphabetically. This means that, while they are very efficient for retrieving exact matches of a `STRING`, or for prefix matching, they are less efficient for suffix and contains searches, where they have to scan all relevant properties to filter any matches. Text indexes do not store `STRING` properties alphabetically, and are instead optimized for suffix and contains searches. That said, if no range index had been present on the name property, the previous query would still have been able to utilize the text index. @@ -256,7 +256,7 @@ For information about calculating the size of indexes, see link:https://neo4j.co Point indexes solve predicates operating on spatial xref:values-and-types/spatial.adoc#spatial-values-point-type[`POINT`] values. Point indexes are optimized for queries filtering for the xref:functions/spatial.adoc#functions-distance[distance] between property values, or for property values within a xref:functions/spatial.adoc#functions-withinBBox[bounding box]. -The following example creates a point index which is then accessed by a query that uses the `point.distance()` function to return the `name` and `type` of all `PointOfInterest` nodes within 100 meters of the `William Shakespeare` statue: +The following example creates a point index which is then accessed through the `point.distance()` function to return the `name` and `type` of all `PointOfInterest` nodes within 100 meters of the `William Shakespeare` statue: .Create a point index [source,cypher] @@ -315,7 +315,7 @@ For more information about the predicates supported by text indexes, see xref:in [[point-index-config-settings]] === Point index configuration settings -It is possible to configure point indexes to only index properties within a specific geographical area. +It is possible to xref:indexes/search-performance-indexes/managing-indexes.adoc#create-a-point-index-specifying-the-index-configuration[configure point indexes] to only index properties within a specific geographical area. This is done by specifying either of the following settings in the `indexConfig` part of the `OPTIONS` clause when creating a point index: * `spatial.cartesian.min` and `spatial.cartesian.max`: used for xref:values-and-types/spatial.adoc#spatial-values-crs-cartesian[Cartesian 2D] coordinate systems. @@ -340,7 +340,7 @@ OPTIONS { } ---- -Restricting the geographic area of a point index can improve the performance of spatial queries by making the index more efficient at retrieving the indexed `POINT` values. +Restricting the geographic area of a point index improves the performance of spatial queries. This is especially beneficial when dealing with complex, large geo-spatial data, and when spatial queries are a significant part of an application’s functionality. [[composite-indexes]] @@ -522,9 +522,6 @@ This is because, when using composite indexes, any predicate after a prefix sear [[composite-index-rules]] === Composite index rules -As indicated in the previous section, composite indexes follow specific rules that are useful to know before using them. -All rules concerning how composite indexes solve property predicates are listed below: - * If a query contains an equality check or a list membership check predicates, they need to be for the first properties defined when creating the composite index. * Queries utilizing a composite index can contain up to one range search or prefix search predicate. @@ -743,7 +740,7 @@ RETURN count(n) AS nodes 1+d|Rows:1 |=== -The query result now includes both the three nodes with an unset `name` value found in the previous query and the two nodes with a `name` value containing the `STRING` `'William'` (`William Shakespeare` and `William Tecumseh Sherman`). +The query result now includes both the three nodes with an unset `name` value found in the previous query and the two nodes with a `name` value containing `William` (`William Shakespeare` and `William Tecumseh Sherman`). .Execution plan ---- @@ -820,7 +817,7 @@ This plan shows that the previously created range index on the `name` property i Text indexes require that predicates only include `STRING` properties. -To use text indexes in situations where any of the queried properties may be either of an incompatible type or null rather than a STRING value, add the type predicate expression `IS {two-colons} STRING NOT NULL` (or its alias, introduced in Neo4j 5.14, `IS {two-colons} STRING!`) to the query. +To use text indexes in situations where any of the queried properties may be either of an incompatible type or `null` rather than a `STRING` value, add the type predicate expression `IS {two-colons} STRING NOT NULL` (or its alias, introduced in Neo4j 5.14, `IS {two-colons} STRING!`) to the query. This will enforce both the existence of a property and its `STRING` type, discarding any rows where the property is missing or not of type `STRING`, and thereby enable the use of text indexes. For example, if the `WHERE` predicate in the previous query is altered to instead append `IS {two-colons} STRING NOT NULL`, then the text index rather than the range index is used (range indexes do not support type predicate expressions): @@ -868,7 +865,7 @@ Since xref:constraints/examples.adoc#constraints-examples-node-property-type[typ To show this, the following example will first drop the existing range index on the `name` property (this is necessary because type constraints only extend the compatibility of type-specific indexes - range indexes are not limited by a value type). It will then run the same query with a `WHERE` predicate on the `name` property (for which there exists a previously created text index) before and after creating a type constraint, and compare the resulting execution plans. -.Query range index +.Drop range index [source,cypher] ---- DROP INDEX range_index_name @@ -947,7 +944,7 @@ Note that xref:constraints/examples.adoc#constraints-examples-node-property-exis While it is impossible to give exact directions on when a search-performance index might be beneficial for a particular use-case, the following points provide some useful heuristics for when creating an index might improve query performance: -* *Frequent property-based queries*: if particular node label/relationship type properties are used frequently for filtering or matching, consider creating an index on those properties. +* *Frequent property-based queries*: if some properties are used frequently for filtering or matching, consider creating an index on them. * *Performance optimization*: If certain queries are too slow, re-examine the properties that are filtered on, and consider creating indexes for those properties that may cause bottlenecking. * *High cardinality properties*: high cardinality properties have many distinct values (e.g., unique identifiers, timestamps, or user names). Queries that seek to retrieve such properties will likely benefit from indexing. * *Complex queries*: if queries traverse complex paths in a graph (for example, by involving multiple hops and several layers of filtering), adding indexes to the properties used in those queries can improve query performance. @@ -962,7 +959,7 @@ They should, however, be used judiciously for the following reasons: * *Storage space*: because each index is a secondary copy of the data in the primary database, each index essentially doubles the amount of storage space occupied by the indexed data. * *Slower write queries*: adding indexes impacts the performance of write queries. This is because indexes are updated with each write query. If a system needs to perform a lot of writes quickly, it may be counterproductive to have an index on the affected data entities. -In other words, if write performance is crucial for a particular use case, it may be beneficial to only add indexes where they are necessary for read-query purposes. +In other words, if write performance is crucial for a particular use case, it may be beneficial to only add indexes where they are necessary for read purposes. As a result of these two points, deciding what to index (and what not to index) is an important and non-trivial task. @@ -997,8 +994,9 @@ If any unused indexes are identified, it may be beneficial to delete them using * Point indexes are used when queries filter on distances and bounding boxes. -* Token lookup indexes are not defined in this order since they never solve the same predicates as other indexes. -Deleting them will negatively impact query performance. +* Token lookup indexes only solve node label and relationship type predicates. +They do not solve any property predicates. +Deleting token lookup indexes will negatively impact query performance. * Composite indexes are only used if the query filters on all properties indexed by the composite index. The order in which the properties are defined when creating a composite index impacts how the planner solves query predicates. @@ -1007,7 +1005,7 @@ The order in which the properties are defined when creating a composite index im * A Cypher query can use several indexes if the planner deems it beneficial to the performance of a query. -* Neo4j indexes do not store `null` values, and the planner must be able to rule out any unset properties in order to use an index. +* * Neo4j indexes do not store `null` values, and the planner must be able to rule out any entities with properties containing `null` values in order to use an index. There are several strategies to ensure the use of indexes. * The columns `lastRead`, `readCount`, and `trackedSince` returned by the `SHOW INDEX` command can be used to identify redundant indexes that take up unnecessary space.