Currently, we have a very simple way of handling Antlr parsing errors. We just print the line and the character position at which the parser failed, and we spit out the "Anything could be wrong here. You figure it out." message generously provided by the Antlr parser.
We could do orders of magnitude better, as Antlr provides an entire toolbox for handling errors, from being able to inspect the parsing rule chain (i.e. the stack trace) to error recovery. Before we list the capabilities, let's first focus on our parsing workflow.
We feed Antlr with the .g4
files (CypherLexer.g4
, Cypher.g4
, MemgraphCypherLexer.g4
, MemgraphCypher.g4
, and UnicodeCategories.g4
) that define our custom Cypher grammar. Antlr then generates a host of .cpp/.hpp
files that implement the Cypher language parser, along with extra functionalities such as error observers, that can be plugged into the parser to modify its logic. All the generated files can be found in the src/query/frontend/opencypher/generated
dir.
Now, the MemgraphCypher parser is actually implemented in the src/query/frontend/opencypher/parser.hpp
file. It's quite simple - it has a constructor that takes the raw query string, and tries to parse it into a antlr4::tree::ParseTree
object. In addition, the constructor sets up our custom parsing error handler, which is an object deriving from antlr4::BaseErrorListener
. Take a look at the constructor definition:
Parser(const std::string query) : query_(std::move(query)) {
parser_.removeErrorListeners();
parser_.addErrorListener(&error_listener_);
tree_ = parser_.cypher();
if (parser_.getNumberOfSyntaxErrors()) {
throw query::SyntaxException(error_listener_.error_);
}
}
You might be puzzled by the first statement (parser_.removeErrorListeners();
). Here's how it works - the internal parser_
object (of type antlropencypher::MemgraphCypher
), when constructed, registers a default error **listener (a.k.a observer) that prints out the parser-generated error message to STDERR. If we don't remove that default listener, we'll get extra printouts that we don't want. So, we remove it.
Then we add our custom error listener, defined as follows:
class FirstMessageErrorListener : public antlr4::BaseErrorListener {
void syntaxError(antlr4::IRecognizer *, antlr4::Token *, size_t line, size_t position, const std::string &message,
std::exception_ptr) override {
if (error_.empty()) {
error_ = "line " + std::to_string(line) + ":" + std::to_string(position + 1) + " " + message;
}
}
The syntaxError
method, declared in the antlr4::BaseErrorListener
class, is overridden. Note its signature - it takes the following:
a pointer to an antlr4::IRecognizer
object. This object is, in fact, a wrapper for the parser, that provides an extra set of functionalities on top of the parser itself, like debugging, printing etc. It's quite powerful as it gives you fine-grained control over the parser's execution. The private parser_
member that we mentioned actually derives from it. The default antlr4::IRecognizer
type actually used in our code is antlr4::Recognizer
, declared in libs/antlr4/runtime/Cpp/include/antlr4-runtime/Recognizer.h
.
As this object's API allows us to ask for various pieces of the parser's state, it could be immensely useful in enhancing our error messages.
a pointer to an antlr4::Token
object, representing the so-called offending symbol - the symbol that caused the parser to fail. If you want to print it out, just call its getText()
method.
a size_t
line
and position
, which are just the line and the character position at which the parsing failed. Note that position
is null-indexed, so when we print it out, we add 1
to it so as not to confuse our users.
a std::exception_ptr
, which is just an opaque pointer to a general exception caught and handled by Antlr. In most cases, this will actually be a pointer to a antlr4::RecognitionException
(declared in libs/antlr4/runtime/Cpp/include/antlr4-runtime/RecognitionException.h
). It can be used to rethrow or handle the exception.
One important thing to note is that syntaxError
method's signature has changed in newer version of Antlr (we checked for v4.8 and up), and now it's first argument is antrl4::Recognizer
.