Skip to content

Latest commit

 

History

History
171 lines (123 loc) · 9.78 KB

cpp-target.md

File metadata and controls

171 lines (123 loc) · 9.78 KB

C++

The C++ target supports all platforms that can either run MS Visual Studio 2017 (or newer), XCode 7 (or newer) or CMake (C++17 required). All build tools can either create static or dynamic libraries, both as 64bit or 32bit arch. Additionally, XCode can create an iOS library. Also see Antlr4 for C++ with CMake: A practical example.

How to create a C++ lexer or parser?

This is pretty much the same as creating a Java lexer or parser, except you need to specify the language target, for example:

$ antlr4 -Dlanguage=Cpp MyGrammar.g4

You will see that there are a whole bunch of files generated by this call. If visitor or listener are not suppressed (which is the default) you'll get:

  • MyGrammarLexer.h + MyGrammarLexer.cpp
  • MyGrammarParser.h + MyGrammarParser.cpp
  • MyGrammarVisitor.h + MyGrammarVisitor.cpp
  • MyGrammarBaseVisitor.h + MyGrammarBaseVisitor.cpp
  • MyGrammarListener.h + MyGrammarListener.cpp
  • MyGrammarBaseListener.h + MyGrammarBaseListener.cpp

Where can I get the runtime?

Once you've generated the lexer and/or parser code, you need to download or build the runtime. Prebuilt C++ runtime binaries for Windows (Visual Studio 2013/2015), OSX/macOS and iOS are available on the ANTLR web site:

Use CMake to build a Linux library (works also on OSX, however not for the iOS library).

Instead of downloading a prebuilt binary you can also easily build your own library on OSX or Windows. Just use the provided projects for XCode or Visual Studio and build it. Should work out of the box without any additional dependency.

How do I run the generated lexer and/or parser?

Putting it all together to get a working parser is really easy. Look in the runtime/Cpp/demo folder for a simple example. The README there describes shortly how to build and run the demo on OSX, Windows or Linux.

How do I create and run a custom listener?

The generation step above created a listener and base listener class for you. The listener class is an abstract interface, which declares enter and exit methods for each of your parser rules. The base listener implements all those abstract methods with an empty body, so you don't have to do it yourself if you just want to implement a single function. Hence use this base listener as the base class for your custom listener:

#include <iostream>

#include "antlr4-runtime.h"
#include "MyGrammarLexer.h"
#include "MyGrammarParser.h"
#include "MyGrammarBaseListener.h"

using namespace antlr4;

class TreeShapeListener : public MyGrammarBaseListener {
public:
  void enterKey(ParserRuleContext *ctx) override {
	// Do something when entering the key rule.
  }
};


int main(int argc, const char* argv[]) {
  std::ifstream stream;
  stream.open(argv[1]);
  ANTLRInputStream input(stream);
  MyGrammarLexer lexer(&input);
  CommonTokenStream tokens(&lexer);
  MyGrammarParser parser(&tokens);

  tree::ParseTree *tree = parser.key();
  TreeShapeListener listener;
  tree::ParseTreeWalker::DEFAULT.walk(&listener, tree);

  return 0;
}

This example assumes your grammar contains a parser rule named key for which the enterKey function was generated.

Special cases for this ANTLR target

There are a couple of things that only the C++ ANTLR target has to deal with. They are described here.

Code Generation Aspects

The code generation (by running the ANTLR4 jar) allows to specify 2 values you might find useful for better integration of the generated files into your application (both are optional):

  • A namespace: use the -package parameter to specify the namespace you want.
  • An export macro: especially in VC++ extra work is required to export your classes from a DLL. This is usually accomplished by a macro that has different values depending on whether you are creating the DLL or import it. The ANTLR4 runtime itself also uses one for its classes:
  #ifdef ANTLR4CPP_EXPORTS
    #define ANTLR4CPP_PUBLIC __declspec(dllexport)
  #else
    #ifdef ANTLR4CPP_STATIC
      #define ANTLR4CPP_PUBLIC
    #else
      #define ANTLR4CPP_PUBLIC __declspec(dllimport)
    #endif
  #endif

Just like the ANTLR4CPP_PUBLIC macro here you can specify your own one for the generated classes using the -DexportMacro=... command-line parameter or grammar option options {exportMacro='...';} in your grammar file.

In order to create a static lib in Visual Studio define the ANTLR4CPP_STATIC macro in addition to the project settings that must be set for a static library (if you compile the runtime yourself).

For gcc and clang it is possible to use the -fvisibility=hidden setting to hide all symbols except those that are made default-visible (which has been defined for all public classes in the runtime).

Compile Aspects

When compiling generated files, you can configure a compile option according to your needs (also optional):

  • A thread local DFA macro: Add -DANTLR4_USE_THREAD_LOCAL_CACHE=1 to the compilation options will enable using thread local DFA cache (disabled by default), after that, each thread uses its own DFA. This will increase memory usage to store thread local DFAs and redundant computation to build thread local DFAs (not too much). The benefit is that it can improve the concurrent performance running with multiple threads. In other words, when you find your concurent throughput is not high enough, you should consider turning on this option.

Memory Management

Since C++ has no built-in memory management we need to take extra care. For that we rely mostly on smart pointers, which however might cause time penalties or memory side effects (like cyclic references) if not used with care. Currently however the memory household looks very stable. Generally, when you see a raw pointer in code consider this as being managed elsewhere. You should never try to manage such a pointer (delete, assign to smart pointer etc.).

Accordingly a parse tree is only valid for the lifetime of its parser. The parser, in turn, is only valid for the lifetime of its token stream, and so on back to the original ANTLRInputStream (or equivalent). To retain a tree across function calls you'll need to create and store all of these and delete all but the tree when you no longer need it.

Unicode Support

Encoding is mostly an input issue, i.e. when the lexer converts text input into lexer tokens. The parser is completely encoding unaware.

The C++ target always expects UTF-8 input (either in a string or stream) which is then converted to UTF-32 (a char32_t array) and fed to the lexer.

Named Actions

In order to help customizing the generated files there are a number of additional so-called named actions. These actions are tight to specific areas in the generated code and allow to add custom (target specific) code. All targets support these actions

  • @parser::header
  • @parser::members
  • @lexer::header
  • @lexer::members

(and their scopeless alternatives @header and @members) where header doesn't mean a C/C++ header file, but the top of a code file. The content of the header action appears in all generated files at the first line. So it's good for things like license/copyright information.

The content of a members action is placed in the public section of lexer or parser class declarations. Hence it can be used for public variables or predicate functions used in a grammar predicate. Since all targets support header + members they are the best place for stuff that should be available also in generated files for other languages.

In addition to that the C++ target supports many more such named actions. Unfortunately, it's not possible to define new scopes (e.g. listener in addition to parser) so they had to be defined as part of the existing scopes (lexer or parser). The grammar in the demo application contains all of the named actions as well for reference. Here's the list:

  • @lexer::preinclude - Placed right before the first #include (e.g. good for headers that must appear first, for system headers etc.). Appears in both lexer h and cpp file.
  • @lexer::postinclude - Placed right after the last #include, but before any class code (e.g. for additional namespaces). Appears in both lexer h and cpp file.
  • @lexer::context - Placed right before the lexer class declaration. Use for e.g. additional types, aliases, forward declarations and the like. Appears in the lexer h file.
  • @lexer::declarations - Placed in the private section of the lexer declaration (generated sections in all classes strictly follow the pattern: public, protected, private, from top to bottom). Use this for private vars etc.
  • @lexer::definitions - Placed before other implementations in the cpp file (but after @postinclude). Use this to implement e.g. private types.

For the parser there are the same actions as shown above for the lexer. In addition to that there are even more actions for visitor and listener classes:

  • @parser::listenerpreinclude
  • @parser::listenerpostinclude
  • @parser::listenerdeclarations
  • @parser::listenermembers
  • @parser::listenerdefinitions
  • @parser::baselistenerpreinclude
  • @parser::baselistenerpostinclude
  • @parser::baselistenerdeclarations
  • @parser::baselistenermembers
  • @parser::baselistenerdefinitions
  • @parser::visitorpreinclude
  • @parser::visitorpostinclude
  • @parser::visitordeclarations
  • @parser::visitormembers
  • @parser::visitordefinitions
  • @parser::basevisitorpreinclude
  • @parser::basevisitorpostinclude
  • @parser::basevisitordeclarations
  • @parser::basevisitormembers
  • @parser::basevisitordefinitions

and should be self explanatory now. Note: there is no context action for listeners or visitors, simply because they would be even less used than the other actions and there are so many already.