The Tao of Antlr Parsing Errors

How We Handle Parsing Errors

Currently, we have a very simple way of handling Antlr parsing errors. We just print the line and the character position at which the parser failed, and we spit out the "Anything could be wrong here. You figure it out." message generously provided by the Antlr parser.

We could do orders of magnitude better, as Antlr provides an entire toolbox for handling errors, from being able to inspect the parsing rule chain (i.e. the stack trace) to error recovery. Before we list the capabilities, let's first focus on our parsing workflow.

We feed Antlr with the .g4 files (CypherLexer.g4, Cypher.g4, MemgraphCypherLexer.g4 , MemgraphCypher.g4, and UnicodeCategories.g4) that define our custom Cypher grammar. Antlr then generates a host of .cpp/.hpp files that implement the Cypher language parser, along with extra functionalities such as error observers, that can be plugged into the parser to modify its logic. All the generated files can be found in the src/query/frontend/opencypher/generated dir.

Now, the MemgraphCypher parser is actually implemented in the src/query/frontend/opencypher/parser.hpp file. It's quite simple - it has a constructor that takes the raw query string, and tries to parse it into a antlr4::tree::ParseTree object. In addition, the constructor sets up our custom parsing error handler, which is an object deriving from antlr4::BaseErrorListener . Take a look at the constructor definition:

Parser(const std::string query) : query_(std::move(query)) {
    parser_.removeErrorListeners();
    parser_.addErrorListener(&error_listener_);
    tree_ = parser_.cypher();
    if (parser_.getNumberOfSyntaxErrors()) {
      throw query::SyntaxException(error_listener_.error_);
    }
  }

You might be puzzled by the first statement (parser_.removeErrorListeners();). Here's how it works - the internal parser_ object (of type antlropencypher::MemgraphCypher), when constructed, registers a default error **listener (a.k.a observer) that prints out the parser-generated error message to STDERR. If we don't remove that default listener, we'll get extra printouts that we don't want. So, we remove it.

Then we add our custom error listener, defined as follows:

class FirstMessageErrorListener : public antlr4::BaseErrorListener {
    void syntaxError(antlr4::IRecognizer *, antlr4::Token *, size_t line, size_t position, const std::string &message,
                     std::exception_ptr) override {
      if (error_.empty()) {
        error_ = "line " + std::to_string(line) + ":" + std::to_string(position + 1) + " " + message;
      }
    }

The syntaxError method, declared in the antlr4::BaseErrorListener class, is overridden. Note its signature - it takes the following:

a pointer to an antlr4::IRecognizer object. This object is, in fact, a wrapper for the parser, that provides an extra set of functionalities on top of the parser itself, like debugging, printing etc. It's quite powerful as it gives you fine-grained control over the parser's execution. The private parser_ member that we mentioned actually derives from it. The default antlr4::IRecognizer type actually used in our code is antlr4::Recognizer , declared in libs/antlr4/runtime/Cpp/include/antlr4-runtime/Recognizer.h .

As this object's API allows us to ask for various pieces of the parser's state, it could be immensely useful in enhancing our error messages.
a pointer to an antlr4::Token object, representing the so-called offending symbol - the symbol that caused the parser to fail. If you want to print it out, just call its getText() method.
a size_t line and position , which are just the line and the character position at which the parsing failed. Note that position is null-indexed, so when we print it out, we add 1 to it so as not to confuse our users.
a std::exception_ptr , which is just an opaque pointer to a general exception caught and handled by Antlr. In most cases, this will actually be a pointer to a antlr4::RecognitionException (declared in libs/antlr4/runtime/Cpp/include/antlr4-runtime/RecognitionException.h). It can be used to rethrow or handle the exception.

One important thing to note is that syntaxError method's signature has changed in newer version of Antlr (we checked for v4.8 and up), and now it's first argument is antrl4::Recognizer .