Logical Planning

After the semantic analysis and symbol generation, the AST is converted to a tree of logical operators. This conversion is called planning and the tree of logical operators is called a plan. The whole planning process is done in the following steps.

  1. AST Preprocessing

    The first step is to preprocess the AST by collecting information on filters, divide the query into parts, normalize patterns in MATCH clauses, etc.

  2. Logical Operator Planning

    After the preprocess step, the planning can be done via 2 planners: VariableStartPlanner and RuleBasedPlanner. The first planner will generate multiple plans where each plan has different starting points for searching the patterns in MATCH clauses. The second planner produces a single plan by mapping the query parts as they are to logical operators.

  3. Logical Plan Postprocessing

    In this stage, we perform various transformations on the generated logical plan. Here we want to optimize the operations in order to improve performance during the execution. Naturally, transformations need to preserve the semantic behaviour of the original plan.

  4. Cost Estimation

    After the generation, the execution cost of each plan is estimated. This estimation is used to select the best plan which will be executed.

The implementation can be found in the query/plan directory, with the public entry point being query/plan/planner.hpp.

AST Preprocessing

Each openCypher query consists of at least 1 single query. Multiple single queries are chained together using a query combinator. Currently, there is only one combinator, UNION. The preprocessing step starts in the CollectQueryParts function. This function will take a look at each single query and divide it into parts. Each part is separated with RETURN and WITH clauses. For example:

MATCH (n) CREATE (m) WITH m MATCH (l)-[]-(m) RETURN l
|                          |                        |
|------- part 1 -----------+-------- part 2 --------|
|                                                   |
|-------------------- single query -----------------|

Each part is created by collecting all MATCH clauses and normalizing their patterns. Pattern normalization is the process of converting an arbitrarily long pattern chain of nodes and edges into a list of triplets (start node, edge, end node). The triplets should preserve the semantics of the match. For example:

MATCH (a)-[p]-(b)-[q]-(c)-[r]-(d)

is equivalent to:

MATCH (a)-[p]-(b), (b)-[q]-(c), (c)-[r]-(d)

With this representation, it becomes easier to reorder the triplets and choose different strategies for pattern matching.

In addition to normalizing patterns, all of the filter expressions in patterns and inside of the WHERE clause (of the accompanying MATCH) are extracted and stored separately. During the extraction, symbols used in the filter expression are collected. This allows for planning filters in a valid order, as the matching for triplets is being done. Another important benefit of having extra information on filters, is to recognize when a database index could be used.

After each MATCH is processed, they are all grouped, so that even the whole MATCH clauses may be reordered. The important thing is to remember which symbols were used to name edges in each MATCH. With those symbols we can plan for cyphermorphism, i.e. ensure different edges in the search pattern of a single MATCH map to different edges in the graph. This preserves the semantic of the query, even though we may have reordered the matching. The same steps are done for OPTIONAL MATCH.

Another clause which needs processing is MERGE. Here we normalize the pattern, since the MERGE is a bit like MATCH and CREATE in one.

All the other clauses are left as is.

In the end, each query part consists of: