I ran across a major design flaw, during further testing, where I had been exploiting the LexerSharedInputState to handle macro expansion during the processing of `macro_name (i.e. when expanding macro_name into its defined value).
I was detecting the macro_name, determining/expanding its value and then effectively pushing the expansion into a StringReader and switching lexers to this Reader (using LexerSharedInputState) after pushing the current lexer onto a stack. Thus, after the macro expansion was lexed, the EOF would pop the stack and restore the previous lexer, etc.
Seemed a pretty elegant solution; in fact, `include files are handled exactly this way!
So, it's a good thing myself and others continue to test, test, test... So, we came upon a valid (Verilog) input as:
`define N 4
wire [N-1:0] a = `N'b0;
The failure occured at the `N'b0. A lexer rule was defined to grab a sized number token: i.e., 4'b0; but, since the `N was expanded using one lexer and then popped back to continue with 'b0, the parser got: 4 followed by 'b0, which is not a (non-)terminal node, so an error!
Anyway, the more I thought about this, the more apparent a cleaner split between preprocessing and subsequent (parsing) stages made more sense. (Guess it's been too long since I took my Compilers class!)
So, while ANTLR does allow one to chain one lexer stream into another, these would have amounted to preprocessing the entire file, either into a tempfile or an in-memory buffer. Not a long term solution, especially since I do anticipate processing huge/netlist files.
In other words, creating a preprocessing lexer to handle all the `include, `define, `name, ... was easy (thanx to ANTLR!); but, connecting this pre-processed lexer stream to the next level lexer (connected to actual parser) was not trivial.
But, Java threads and PipedReader, PipedWriter to the rescue!
Using those, the preprocessor lexer runs in one thread supplying input to the Verilog lexer+parser in another thread. The Piped object mitigates the need to slurp in the whole tamale, and the Threads keep a clean separation between the writing (file through preprocessor lexer) and reading (by the Verilog lexer to parser).
Java threading makes this easy!
Another side effect of the preprocessor separation is the ability to dump out preprocessed data/files, too; akin to the -E and -C options to gcc. In fact, I co-opted the same options as shown by the usage options:
Usage: analyze (--tcl out.tcl)? (--rb out.rb)? (--outf out.f)?
--only_used? --exit_on_err? (--verbose n)? --abs_paths?
(--redefn n)? (-E -C?)?
-E : dump pre-processed files to "file.v.E".
Useful for debugging preprocessing issues.
-C : do not discard comments when "-E" is specified.
There is a good methodology paper and examples using the default application (analyze) of the v2kparse project here.
If you start diving into the source code, you will notice that I cleaned out parser tree code, related to the analyze application, from the Vlog.g parser (language) definition. I was compelled to get a useful application (analyze) off the parser; but, in that interest, muddied the parser infrastructure with an application. My original intent was to keep the parse tree free of any (reasonable) application/use level intent bias/knowledge.
So, the SimpleParseTree class (in srcs/v2k/parser/tree/SimpleParseTree.java) serves as an example implementation (override) of the default ASTreeBase class methods. The SimpleParseTree is used by the analyze application.
I'm still mulling over a few other future uses for the ever-evolving ASTreeBase!