In this phase, various semantic and variable type checks are performed. Additionally, we generate symbols which map AST nodes to stored values computed from evaluated expressions.
Implementation can be found in query/frontend/semantic/symbol_generator.cpp
.
Symbols are generated for each AST node that represents data that needs to have storage. Currently, these are:
NamedExpression
CypherUnion
Identifier
Aggregation
You may notice that the above AST nodes may not correspond to something named by a user. For example, Aggregation
can be a part of larger expression and thus remain unnamed. The reason we still generate symbols is to have a uniform behaviour when executing a query as well as allow for caching the results of expression evaluation.
AST nodes do not actually store a Symbol
instance, instead they have a int32_t
index identifying the symbol in the SymbolTable
class. This is done to minimize the size of AST types as well as allow easier sharing of same symbols with multiple instances of AST nodes.
The storage for evaluated data is represented by the Frame
class. Each symbol determines a unique position in the frame. During interpretation, evaluation of expressions which have a symbol will either read or store values in the frame. For example, instance of an Identifier
will use the symbol to find and read the value from Frame
. On the other hand, NamedExpression
will take the result of evaluating its own expression and store it in the Frame
.
When a symbol is created, context of creation is used to assign a type to that symbol. This type is used for simple type checking operations. For example, MATCH (n)
will create a symbol for variable n
. Since the MATCH (n)
represents finding a vertex in the graph, we can set Symbol::Type::Vertex
for that symbol. Later, for example in MATCH ()-[n]-()
we see that variable n
is used as an edge. Since we already have a symbol for that variable, we detect this type mismatch and raise a SemanticException
.
Basic rule of symbol generation, is that variables inside MATCH
, CREATE
, MERGE
, WITH ... AS
and RETURN ... AS
clauses establish new symbols.
Inside MATCH
, symbols are created only if they didn’t exist before. For example, patterns in MATCH (n {a: 5})--(m {b: 5}) RETURN n, m
will create 2 symbols: one for n
and one for m
. RETURN
clause will, in turn, reference those symbols. Symbols established in a part of pattern are immediately bound and visible in later parts. For example, MATCH (n)--(n)
will create a symbol for variable n
for 1st (n)
. That symbol is referenced in 2nd (n)
. Note that the symbol is not bound inside 1st (n)
itself. What this means is that, for example, MATCH (n {a: n.b})
should raise an error, because n
is not yet bound when encountering n.b
. On the other hand, MATCH (n)--(n {a: n.b})
is fine.
The CREATE
is similar to MATCH
, but it always establishes symbols for variables which create graph elements. What this means is that, for example MATCH (n) CREATE (n)
is not allowed. CREATE
wants to create a new node, for which we already have a symbol. In such a case, we need to throw an error that the variable n
is being redeclared. On the other hand MATCH (n) CREATE (n)-[r :r]->(n)
is fine, because CREATE
will only create the edge r
, connecting the already existing node n
. Remaining behaviour is the same as in MATCH
. This means that we can simplify CREATE
to be like MATCH
with 2 special cases.
CREATE (n)
? If yes, then the symbol for n
must not have been created before. Otherwise, we reference the existing symbol.CREATE
? If yes, then that variable must not reference a symbol.The MERGE
clause is treated the same as CREATE
with regards to symbol generation. The only difference is that we allow bidirectional edges in the pattern. When creating such a pattern, the direction of the created edge is arbitrarily determined.