How We Handle Parsing Errors

Currently, we have a very simple way of handling Antlr parsing errors. We just print the line and the character position at which the parser failed, and we spit out the "Anything could be wrong here. You figure it out." message generously provided by the Antlr parser.

We could do orders of magnitude better, as Antlr provides an entire toolbox for handling errors, from being able to inspect the parsing rule chain (i.e. the stack trace) to error recovery. Before we list the capabilities, let's first focus on our parsing workflow.

We feed Antlr with the .g4 files (CypherLexer.g4, Cypher.g4, MemgraphCypherLexer.g4 , MemgraphCypher.g4, and UnicodeCategories.g4) that define our custom Cypher grammar. Antlr then generates a host of .cpp/.hpp files that implement the Cypher language parser, along with extra functionalities such as error observers, that can be plugged into the parser to modify its logic. All the generated files can be found in the src/query/frontend/opencypher/generated dir.

Now, the MemgraphCypher parser is actually implemented in the src/query/frontend/opencypher/parser.hpp file. It's quite simple - it has a constructor that takes the raw query string, and tries to parse it into a antlr4::tree::ParseTree object. In addition, the constructor sets up our custom parsing error handler, which is an object deriving from antlr4::BaseErrorListener . Take a look at the constructor definition:

Parser(const std::string query) : query_(std::move(query)) {
    parser_.removeErrorListeners();
    parser_.addErrorListener(&error_listener_);
    tree_ = parser_.cypher();
    if (parser_.getNumberOfSyntaxErrors()) {
      throw query::SyntaxException(error_listener_.error_);
    }
  }

You might be puzzled by the first statement (parser_.removeErrorListeners();). Here's how it works - the internal parser_ object (of type antlropencypher::MemgraphCypher), when constructed, registers a default error **listener (a.k.a observer) that prints out the parser-generated error message to STDERR. If we don't remove that default listener, we'll get extra printouts that we don't want. So, we remove it.

Then we add our custom error listener, defined as follows:

class FirstMessageErrorListener : public antlr4::BaseErrorListener {
    void syntaxError(antlr4::IRecognizer *, antlr4::Token *, size_t line, size_t position, const std::string &message,
                     std::exception_ptr) override {
      if (error_.empty()) {
        error_ = "line " + std::to_string(line) + ":" + std::to_string(position + 1) + " " + message;
      }
    }

The syntaxError method, declared in the antlr4::BaseErrorListener class, is overridden. Note its signature - it takes the following:

One important thing to note is that syntaxError method's signature has changed in newer version of Antlr (we checked for v4.8 and up), and now it's first argument is antrl4::Recognizer .