All about a Verilog V200x parser project hosted at SourceForge: http://v2kparse.sourceforge.net/

Monday, September 22, 2008

Some Cleanup and Examples

In this release uploaded to sourceforge, I added a fairly comprehensive document and examples: http://v2kparse.sourceforge.net/includes.pdf.

I ran across a major design flaw, during further testing, where I had been exploiting the LexerSharedInputState to handle macro expansion during the processing of `macro_name (i.e. when expanding macro_name into its defined value).

I was detecting the macro_name, determining/expanding its value and then effectively pushing the expansion into a StringReader and switching lexers to this Reader (using LexerSharedInputState) after pushing the current lexer onto a stack. Thus, after the macro expansion was lexed, the EOF would pop the stack and restore the previous lexer, etc.

Seemed a pretty elegant solution; in fact, `include files are handled exactly this way!

So, it's a good thing myself and others continue to test, test, test... So, we came upon a valid (Verilog) input as:


`define N 4
...
wire [N-1:0] a = `N'b0;


The failure occured at the `N'b0. A lexer rule was defined to grab a sized number token: i.e., 4'b0; but, since the `N was expanded using one lexer and then popped back to continue with 'b0, the parser got: 4 followed by 'b0, which is not a (non-)terminal node, so an error!

Anyway, the more I thought about this, the more apparent a cleaner split between preprocessing and subsequent (parsing) stages made more sense. (Guess it's been too long since I took my Compilers class!)

So, while ANTLR does allow one to chain one lexer stream into another, these would have amounted to preprocessing the entire file, either into a tempfile or an in-memory buffer. Not a long term solution, especially since I do anticipate processing huge/netlist files.

In other words, creating a preprocessing lexer to handle all the `include, `define, `name, ... was easy (thanx to ANTLR!); but, connecting this pre-processed lexer stream to the next level lexer (connected to actual parser) was not trivial.

But, Java threads and PipedReader, PipedWriter to the rescue!

Using those, the preprocessor lexer runs in one thread supplying input to the Verilog lexer+parser in another thread. The Piped object mitigates the need to slurp in the whole tamale, and the Threads keep a clean separation between the writing (file through preprocessor lexer) and reading (by the Verilog lexer to parser).

Java threading makes this easy!

Another side effect of the preprocessor separation is the ability to dump out preprocessed data/files, too; akin to the -E and -C options to gcc. In fact, I co-opted the same options as shown by the usage options:


Usage: analyze (--tcl out.tcl)? (--rb out.rb)? (--outf out.f)?
--only_used? --exit_on_err? (--verbose n)? --abs_paths?
(--redefn n)? (-E -C?)?
topModule vlogOpts+
...
-E : dump pre-processed files to "file.v.E".
Useful for debugging preprocessing issues.
-C : do not discard comments when "-E" is specified.

There is a good methodology paper and examples using the default application (analyze) of the v2kparse project here.

If you start diving into the source code, you will notice that I cleaned out parser tree code, related to the analyze application, from the Vlog.g parser (language) definition. I was compelled to get a useful application (analyze) off the parser; but, in that interest, muddied the parser infrastructure with an application. My original intent was to keep the parse tree free of any (reasonable) application/use level intent bias/knowledge.

So, the SimpleParseTree class (in srcs/v2k/parser/tree/SimpleParseTree.java) serves as an example implementation (override) of the default ASTreeBase class methods. The SimpleParseTree is used by the analyze application.

I'm still mulling over a few other future uses for the ever-evolving ASTreeBase!

Friday, August 8, 2008

Added Ruby file list generation

In this release uploaded to sourceforge, I added a Ruby (syntax) file list option:

Usage: analyze.rb (--tcl out.tcl)? (--rb out.rb)? (--exit_on_err)?
topModule vlogOpts+

--tcl out.tcl : dump details in tcl format to "out.tcl".
--rb out.rb : dump details in ruby format to "out.rb".
--exit_on_err : exit status != 0 if any parse errors.
: And, no "out.tcl" generated if errors.
...

I also ran across an interesting testcase of some wacky control character, 0x93, in a Verilog source file! I updated the (Antlr) lexer to accept a full 8-bit vocabulary and simply toss all the bizarre ones into a protected lexer rule: CNTL, and simply skip them: (except from Vlog.g, with line numbers shown):

1219 class VlogLexer extends Lexer;
1220 options {
1221 k=3;
1222 charVocabulary='\u0000'..'\u00FF';
1223 testLiterals=false;
1224 }
...
1474 WS : (CNTRL|' '|'\r'|'\t'|'\n' {newline();})
1475 {$setType(Token.SKIP);}
1476 ;
1477
1478 protected
1479 CNTRL
1480 : '\u0000'..'\u0008'
1481 | '\u000B'..'\u000C'
1482 | '\u000E'..'\u001F'
1483 | '\u007F'..'\u00FF'
1484 ;


Let me know any bugs, improvements, ideas, praise, etc.

Thursday, July 31, 2008

Added Tcl file list generation

I uploaded the next version to sourceforge.

This version added a few more options relevant to my initial goal of building file lists and extracting other info for implementation related tasks.

(See the previous post for download and install details.)

After install:

> bin/analyze

Usage: analyze.rb (--tcl out.tcl)? (--exit_on_err)? topModule vlogOpts+

--tcl out.tcl : dump details in tcl format to "out.tcl".
--exit_on_err : exit status != 0 if any parse errors.
: And, no "out.tcl" generated if errors.

vlogOpts is one of
file.v
-f args.f
-v library.v
-y library_directory
+incdir+dir1(+dirn)*
+define+d(=v)?(+dn(=vn)?)*

The additional options are: --tcl and --exit_on_err.

You should try these out. The format/info contained in the generated out.tcl file should be self-explanatory.

Let me know any bugs, improvements, ideas, praise, etc.

Thursday, July 24, 2008

Parser + JRuby (quick) linker

I uploaded the next version to sourceforge.

This new version cleaned up a few missing constructs and pre-processor directives which I encountered while testing even more real design RTL.

So, while getting the parser/grammar stuff done was not too painful, thanx to ANTLR and the great debugger in Netbeans; moving on to the linker stage was a great experience: I bit the bullet and learned JRuby.

(An excellent book on Ruby itself is here.)

Why JRuby? Well, I am normally a Tcl guy, when it comes to complex scripting. I never learned Perl, since when I was at the crossroads of having to abandon complex awk, greps, seds and csh, I was fortunate enough to cross paths w/ John Ousterhout (the creator of Tcl) on a road show at Sun promoting Tcl. Never had any regrets: it's a great language, and ever prevelant in any self-respecting EDA tools.

Hmmm, I still haven't answered the why JRuby, huh?

So, while I could have continued to do more of the rudimentary stuff around the parser (such as quick linking) in Java, I was aware of the JRuby capabilities w/in Netbeans (did I say how much I luv Netbeans?)... so did a little more digging and playing... and it's great!

JRuby essentially gives you the Ruby language, which is a truly object-oriented interpreted language (while some may say scripting language, that has some negative connotations, IMHO, esp. if you compare it w/ other so-called scripting languages (like Perl --- yuk!)). BUT, the best part is that the "J" part brings Java into the picture, too. So, we can use Java for the more elegant parts of our design, and then Ruby for the more "one offs", experimental, oft-changing, ... parts. (Or, just an excuse to use/learn Ruby!)

So, the tact I took in this release is to use the Java part for the analyze part of the parser, and the Ruby side to parse options, iterate looking for candidate files (i.e., the -y +libext+ verilog options) and interact with the parse tree to (quick) link the design, in the interest of finding the complete (flat list of) files from the (typically) succinct .f files.

(Go back to my 1st blog post to refresh to my original motivation for this project.)

So, after you download and untar this release, you will also need to install/add the path to jruby on your system.

If you do not have jruby, I would strongly suggest using the Netbeans "all" pack.

On my system, I have:

> which jruby
/opt/netbeans-6.1/ruby2/jruby-1.1/bin/jruby

After that is all setup, you can run a very simple testcase. (I will eventually add something more complex; but, didn't want to slog the sourceforge downloads w/ huge designs, like the free Sparc one... and the opencores.org ones may not be so easily redistributed... and didn't want any legal hassles at such a young age).

So, the simple one:

# I untar-d the download under /tmp/v2k, in this example
> cd /tmp; mkdir v2k; cd v2k
> download v2kparse-0.2.tar.gz; tar zxf v2kparse-0.2.tar.gz
> bin/analyze

Usage: analyze.rb topModule vlogOpts+

where vlogOpts is one of
file.v
-f args.f
-v library.v
-y library_directory
+incdir+dir1(+dirn)*
+define+d(=v)?(+dn(=vn)?)*

> cd data
# run the simple testcase
> ../bin/analyze m1 -f tc2.f
DBG1: after proc_args: +define+SYNTHESIS +incdir+tc2 -y tc2 tc2/m1.v
Info: Pass 1: -I tc2 -D SYNTHESIS tc2/m1.v.
Info : tc2/m1.v:1: include file "/tmp/v2k/data/tc2/defs.vh". (INCL-1)
Info: Unresolved: m2.
Info: Pass 2: /tmp/v2k/data/tc2/m2.v.
Info: Unresolved: m3.
Info: Pass 3: /tmp/v2k/data/tc2/m3.v.
Info: Link status: true.

In this (simple) testcase, the tc2.f is:

+define+SYNTHESIS
+incdir+tc2
-y tc2
tc2/m1.v

So, as the analyze runs, the analyze.rb script is iterating:
  1. starting with the fully specific file: tc2/m1.v
  2. running the (java-based) parser
  3. querying the parse tree (created by parser) for defined modules and their references
  4. (analyze.rb) keeps track of defined, linked and unresolved references
  5. uses the unresolved references and the -y +libext+ specs to find matching files
  6. repeatedly calls the parser (with the newly found files) and queries the incrementally (modified) parse tree
  7. goto 3 and repeat until fully linked, or nothing else to do
I've left some debug messages in the code; and, the final message indicates whether the link was successful.

DISCLAIMER: this is a quick link, in the sense that an elaborate is not done: there are no port/pin checks, etc. Just a simple check that a referenced module is defined, or is defined to be a leaf cell.

If you want to play around with your own designs, you will likely have instanced memories, standard cells, macros, etc. which you normally would not specify in a .f file, explicitly. These normally are added via a library.f for simulation; and, would use .lib/.db files for typical implementation tasks, like synthesis.

So, there needs to be a convenient way to specify what modules names should be considered leafs. If you look in the ruby/srcs/analyze.rb file (at the bottom), you will see:

210 #TODO: Redefine the is_leaf method to describe module names
211 # which are not defined in any (user created) verilog contexts.
212 # There are typically leaf/library cells.
213 #
...

That describes how to specify leafs for your specific needs.

Have fun with this release, and I'll be back with more soon!

Monday, June 23, 2008

First pass uploaded to sourceforce

I finished the 1st pass of the ANTLR 2.7.7 based v2k parser and lexer and uploaded to sourceforge.

The lexer includes a Verilog preprocessor which handles:
  • `define,`undef
  • `include
  • `ifdef,`ifndef,`else,`endif
  • `timescale

Once you download, at the toplevel directory there is a runParser script which invokes java using the precompiled .jar files released under dist/.

Just type

> runParser

to see the usage.

I was compelled to use ANTLR 2.7.7 since the token stream mechanism does not try to slurp in the whole source file, an issue which I encountered with the more recent ANTLR 3.0.

While Verilog source files are not generally large, netlist files can be humungous, and one can quickly run out of memory by "slurping in the whole tamale."

Anyway, I've communicated the large file slurp file to the author of ANTLR and he'll be working out a solution in future releases.

(If you think large verilog netlists are problematic to slurp; think aout a SPEF file --- where I first encoutered the problem using ANTLR 3.x. Anyway, back to 2.7.7 works fine, even for large SPEF files.)

Anyway, back to the v2kparser...

I've been testing it out on a lot of the RTL for a large SOC I'm working on now; and, so far so good. It has even helped me find "bad style RTL" in the sense that (defined) 'values were used within source files which never `included the file containing the `define. Now, there's looking at an order of analysis problem!

I always advocate using the software style for writing include files and using them too, as in:

//file to be included all over: file.vh
`ifndef _FILE_VH
`define _FILE_VH_
`define N 5
`define M (1 << `N)
...
`endif
//`_FILE_VH_

and the file which wants to use

//rtl.v
`include "file.vh"

Wednesday, June 18, 2008

Implement what your verify

I spend a lot of my working hours doing ASIC design related tasks, primarily on the implementation side.

Hence, I've run through gazillions of lines of Verilog code, for lint tools, synthesis, formal EC, clock domain crossings, on and on.

I have yet to find a consistent way to feed Verilog file lists, include directories, library paths --- all that dot-f stuff: *.v, +incdir+, -y, -v, usu. dumped into a simulator "as-is" --- well, the implementation tools do not take those verbatim, you have to spec Tcl list of files, search paths, define arguments; and, don't even try to do anything resembling -y and +libext+ type options.

I've seen environments where I really wonder whether an ASIC was built with the same design files it was actually simulated with. (I kid you not!)

So, having spent lots of hours mucking around with EDA tool development of various sorts: simulators, parsers, databases and alike, I decided to solve the problem of getting simulation file stuff, verbatim, into the implementation tools.

How?

Well, lets first start with a real parser, so we can process synthesizable Verilog RTL, including the preprocessor. From there, the sky is the limit.

For real parser development, there is one really excellent tool: ANTLR. I've been doing all kinds of hobby and real parser development with ANTLR since it was Terence's PhD thesis back in the late 80's.

There's all kinds of excellent info about ANTLR at the website, and even a great book to boot.