v2kparse

All about a Verilog V200x parser project hosted at SourceForge: http://v2kparse.sourceforge.net/

Wednesday, December 30, 2009

Latest version includes (Synopsys) Liberty File parser

I (finally) uploaded an update here.

The most interesting new feature is support for Liberty Files (aka., .lib files).
Upon download, note a slight rearrangement of the directory structure:


slf/ --- files related to parsing of Synopsys Liberty Files
ssi/ --- files related to Simple Serialization Interface
v2k/ --- files related to Verilog parser



There is also an additional option --gen_xref which can be used to generate a module cross reference.


> v2k/bin/analyze

Usage: analyze --flatten? (--tcl out.tcl)? (--rb out.rb)? (--outf out.f)?
--only_used? --exit_on_err? (--verbose n)? --abs_paths?
(--redefn n)? (-E -C?)?
topModule vlogOpts+

--flatten : stop after flattening .f files.
: Useful with "--outf" to capture flat .f
: for subsequent processing.
--tcl out.tcl : dump details in tcl format to "out.tcl".
--rb out.rb : dump details in ruby format to "out.rb".
--outf out.f : dump details into flat Verilog .f file "out.f"
using only +incdir+, +define+ and file.v.
--only_used : only dump files which contained a module
: definition required for linking "topModule".
--exit_on_err : exit status != 0 if any parse errors.
And, no "out.tcl" generated if errors.
--verbose n : Verbose messages during linking. "n" is:
2 (most verbose) 1 (default); 0 (off).
--abs_paths : Make/display all file/directory names absolute.
Useful for debugging (the where of) include files.
--gen_xref : Generate "topModule.refs.txt".
Contains a cross-reference of module references.
-E : dump pre-processed files to "file.v.E".
Useful for debugging preprocessing issues.
-C : do not discard comments when "-E" is specified.

vlogOpts is one of:
file.v
-f args.f
-v library.v
-y library_directory
+incdir+dir1(+dirn)*
+define+d(=val)?(+dn(=valn)?)*
+no_defn+mod1(+modn)* : specify "mod1" as undefined
(a priori) so no link error.
+slf+f1.lib(+fi.lib)* : specify Synopsys Liberty File "f1.lib".

NOTE: a .f file can contain entries of the form "${VAR}/foo.v"
to specify that the value of the environment variable "VAR"
be used (i.e., replace ${VAR} with ENV['VAR'], in ruby parlance)
during .f file processing. This is useful to root file/directory
locations using an environment variable, rather than hardcoding
in .f files themselves.

The +slf+ option can be used to pass .lib files to resolve any (typically instantiated) library (leaf) or macro cells.

If any of the +slf+ file(s) are actually required to link the design, their filename(s) will be output into the out.tcl file specified using the --tcl option. The tcl list variable slf_used will be used.

Wednesday, March 25, 2009

Unscheduled interruption (but a good one)

I have been adding a few fixes to the grammar and infrastructure: adding a symbol table and Simple Streaming Interface (SSI, as I call it): a file persistance mechanism. Hopefully, I'll have these posted next week, in a very alpha-ish state.

On the interrupt front: during my day job, I do a lot with synthesis and timing/design closure for large ASICs. I have been looking a lot at design/RTL and netlist quality (QOR) and to the latter: do a lot of manipulation/querying in either the synthesis tool or static timing analyzer, since both those have excellent (Tcl-based) UIs for such.

However, since those tools/licenses are very few and very $$$, it seemed natural to just strip down the v2kparse stuff for netlist-only type stuff. I had done an earlier (C++) type project at this sourceforge project and decided to shift gears a bit. So, I am in the process of revamping that project to use Java and Ruby: let's all say it together: JRuby!

At this point, I have the (netlist) and parser infrastructure completed in Antlr and Java, and have added a JRuby layer around it. For example: to iterate over a full (hierarchical) module/netlist to find and print (inefficient) instances of sequential cells with constant clocks and/or constant async set/reset pins, you simply:


#Find important reg pins tied to 0
msg = ": pin tied 0"
top_module.foreach_on_const_net(0) do |conns, val, inst_name|
conns.each do |conn|
if conn.isPin
pin = Pin.new(conn)
Message.warn(pin.get_name(inst_name)+msg) if pin.get_name =~ /\/([CS]DN|CP)$/
end
end
end


Pretty cool!

Anyway, I want to flush through and post my updates to sourceforce and then update v2kparse shortly.

Stay tuned.

Monday, January 26, 2009

The parse tree evolves

The latest release to sourceforge includes the 1st pass of a full parse tree.
The parse tree classes are under:
 srcs/v2k/parser/tree
There are 133 .java source files there; no wonder it's taken a while.

NOTE: ANTLR does have (automatic) AST (Abstract Syntax Tree) building capability. However, I wanted to experience the very hands-on experience of crafting my own, for this project. Not to worry, there will be many more parsers, for many more DSLs to work on; and, automatic AST generation will be more efficient there.

There is a simple script
 > bin/ptree

Error: Must specify at least one "infile".
Usage: v2kparse [options] infile...

Options:

-I dir Add dir (must be readable) to `include file search path.

-D name Predefine name as a macro with value 1.

-D name=defn Predefine name as a macro with value defn.

Those options are less friendly than the (more common) Verilog style options (as in bin/analyze; see earlier blogs); but useful for testing.

Examine the bin/analyze and bin/ptree script to see the latter simply defines the constant USEPT (use parse tree). This constant is detected in file srcs/v2k/parser/Parser.java and selects which sub-class of
  public abstract class ASTreeBase
to build the tree. There are 2 sub-classes:
  1. v2k.parser.tree.basic.ParseTree
  2. v2k.parser.tree.ParseTree
The 1st: tree.basic.ParseTree is used to build a simple tree which simply tracks module names and instances. That is used by the bin/analyze script to glean info appropriate for source file dependency tracking (described in earlier blogs.) The 2nd one: tree.ParseTree is used for full tree construction, as in the bin/ptree script.

So, get your hands dirty playing around with bin/ptree. And, if you embelish the bin/analyze script with use of the -DUSEPT argument to java, then you could also get a full-blown parse tree, and use the more standard Verilog command line arguments, too. However, since the analyze script also uses jruby hooks to generate usable output, by walking the basic.ParseTree, the jruby scripts would also have to be changed (since the API of the basic.ParseTree and (full) tree.ParseTree is slightly different).

The next phase of the project is to add elaboration...

Monday, September 22, 2008

Some Cleanup and Examples

In this release uploaded to sourceforge, I added a fairly comprehensive document and examples: http://v2kparse.sourceforge.net/includes.pdf.

I ran across a major design flaw, during further testing, where I had been exploiting the LexerSharedInputState to handle macro expansion during the processing of `macro_name (i.e. when expanding macro_name into its defined value).

I was detecting the macro_name, determining/expanding its value and then effectively pushing the expansion into a StringReader and switching lexers to this Reader (using LexerSharedInputState) after pushing the current lexer onto a stack. Thus, after the macro expansion was lexed, the EOF would pop the stack and restore the previous lexer, etc.

Seemed a pretty elegant solution; in fact, `include files are handled exactly this way!

So, it's a good thing myself and others continue to test, test, test... So, we came upon a valid (Verilog) input as:


`define N 4
...
wire [N-1:0] a = `N'b0;


The failure occured at the `N'b0. A lexer rule was defined to grab a sized number token: i.e., 4'b0; but, since the `N was expanded using one lexer and then popped back to continue with 'b0, the parser got: 4 followed by 'b0, which is not a (non-)terminal node, so an error!

Anyway, the more I thought about this, the more apparent a cleaner split between preprocessing and subsequent (parsing) stages made more sense. (Guess it's been too long since I took my Compilers class!)

So, while ANTLR does allow one to chain one lexer stream into another, these would have amounted to preprocessing the entire file, either into a tempfile or an in-memory buffer. Not a long term solution, especially since I do anticipate processing huge/netlist files.

In other words, creating a preprocessing lexer to handle all the `include, `define, `name, ... was easy (thanx to ANTLR!); but, connecting this pre-processed lexer stream to the next level lexer (connected to actual parser) was not trivial.

But, Java threads and PipedReader, PipedWriter to the rescue!

Using those, the preprocessor lexer runs in one thread supplying input to the Verilog lexer+parser in another thread. The Piped object mitigates the need to slurp in the whole tamale, and the Threads keep a clean separation between the writing (file through preprocessor lexer) and reading (by the Verilog lexer to parser).

Java threading makes this easy!

Another side effect of the preprocessor separation is the ability to dump out preprocessed data/files, too; akin to the -E and -C options to gcc. In fact, I co-opted the same options as shown by the usage options:


Usage: analyze (--tcl out.tcl)? (--rb out.rb)? (--outf out.f)?
--only_used? --exit_on_err? (--verbose n)? --abs_paths?
(--redefn n)? (-E -C?)?
topModule vlogOpts+
...
-E : dump pre-processed files to "file.v.E".
Useful for debugging preprocessing issues.
-C : do not discard comments when "-E" is specified.

There is a good methodology paper and examples using the default application (analyze) of the v2kparse project here.

If you start diving into the source code, you will notice that I cleaned out parser tree code, related to the analyze application, from the Vlog.g parser (language) definition. I was compelled to get a useful application (analyze) off the parser; but, in that interest, muddied the parser infrastructure with an application. My original intent was to keep the parse tree free of any (reasonable) application/use level intent bias/knowledge.

So, the SimpleParseTree class (in srcs/v2k/parser/tree/SimpleParseTree.java) serves as an example implementation (override) of the default ASTreeBase class methods. The SimpleParseTree is used by the analyze application.

I'm still mulling over a few other future uses for the ever-evolving ASTreeBase!

Friday, August 8, 2008

Added Ruby file list generation

In this release uploaded to sourceforge, I added a Ruby (syntax) file list option:

Usage: analyze.rb (--tcl out.tcl)? (--rb out.rb)? (--exit_on_err)?
topModule vlogOpts+

--tcl out.tcl : dump details in tcl format to "out.tcl".
--rb out.rb : dump details in ruby format to "out.rb".
--exit_on_err : exit status != 0 if any parse errors.
: And, no "out.tcl" generated if errors.
...

I also ran across an interesting testcase of some wacky control character, 0x93, in a Verilog source file! I updated the (Antlr) lexer to accept a full 8-bit vocabulary and simply toss all the bizarre ones into a protected lexer rule: CNTL, and simply skip them: (except from Vlog.g, with line numbers shown):

1219 class VlogLexer extends Lexer;
1220 options {
1221 k=3;
1222 charVocabulary='\u0000'..'\u00FF';
1223 testLiterals=false;
1224 }
...
1474 WS : (CNTRL|' '|'\r'|'\t'|'\n' {newline();})
1475 {$setType(Token.SKIP);}
1476 ;
1477
1478 protected
1479 CNTRL
1480 : '\u0000'..'\u0008'
1481 | '\u000B'..'\u000C'
1482 | '\u000E'..'\u001F'
1483 | '\u007F'..'\u00FF'
1484 ;


Let me know any bugs, improvements, ideas, praise, etc.

Thursday, July 31, 2008

Added Tcl file list generation

I uploaded the next version to sourceforge.

This version added a few more options relevant to my initial goal of building file lists and extracting other info for implementation related tasks.

(See the previous post for download and install details.)

After install:

> bin/analyze

Usage: analyze.rb (--tcl out.tcl)? (--exit_on_err)? topModule vlogOpts+

--tcl out.tcl : dump details in tcl format to "out.tcl".
--exit_on_err : exit status != 0 if any parse errors.
: And, no "out.tcl" generated if errors.

vlogOpts is one of
file.v
-f args.f
-v library.v
-y library_directory
+incdir+dir1(+dirn)*
+define+d(=v)?(+dn(=vn)?)*

The additional options are: --tcl and --exit_on_err.

You should try these out. The format/info contained in the generated out.tcl file should be self-explanatory.

Let me know any bugs, improvements, ideas, praise, etc.

Thursday, July 24, 2008

Parser + JRuby (quick) linker

I uploaded the next version to sourceforge.

This new version cleaned up a few missing constructs and pre-processor directives which I encountered while testing even more real design RTL.

So, while getting the parser/grammar stuff done was not too painful, thanx to ANTLR and the great debugger in Netbeans; moving on to the linker stage was a great experience: I bit the bullet and learned JRuby.

(An excellent book on Ruby itself is here.)

Why JRuby? Well, I am normally a Tcl guy, when it comes to complex scripting. I never learned Perl, since when I was at the crossroads of having to abandon complex awk, greps, seds and csh, I was fortunate enough to cross paths w/ John Ousterhout (the creator of Tcl) on a road show at Sun promoting Tcl. Never had any regrets: it's a great language, and ever prevelant in any self-respecting EDA tools.

Hmmm, I still haven't answered the why JRuby, huh?

So, while I could have continued to do more of the rudimentary stuff around the parser (such as quick linking) in Java, I was aware of the JRuby capabilities w/in Netbeans (did I say how much I luv Netbeans?)... so did a little more digging and playing... and it's great!

JRuby essentially gives you the Ruby language, which is a truly object-oriented interpreted language (while some may say scripting language, that has some negative connotations, IMHO, esp. if you compare it w/ other so-called scripting languages (like Perl --- yuk!)). BUT, the best part is that the "J" part brings Java into the picture, too. So, we can use Java for the more elegant parts of our design, and then Ruby for the more "one offs", experimental, oft-changing, ... parts. (Or, just an excuse to use/learn Ruby!)

So, the tact I took in this release is to use the Java part for the analyze part of the parser, and the Ruby side to parse options, iterate looking for candidate files (i.e., the -y +libext+ verilog options) and interact with the parse tree to (quick) link the design, in the interest of finding the complete (flat list of) files from the (typically) succinct .f files.

(Go back to my 1st blog post to refresh to my original motivation for this project.)

So, after you download and untar this release, you will also need to install/add the path to jruby on your system.

If you do not have jruby, I would strongly suggest using the Netbeans "all" pack.

On my system, I have:

> which jruby
/opt/netbeans-6.1/ruby2/jruby-1.1/bin/jruby

After that is all setup, you can run a very simple testcase. (I will eventually add something more complex; but, didn't want to slog the sourceforge downloads w/ huge designs, like the free Sparc one... and the opencores.org ones may not be so easily redistributed... and didn't want any legal hassles at such a young age).

So, the simple one:

# I untar-d the download under /tmp/v2k, in this example
> cd /tmp; mkdir v2k; cd v2k
> download v2kparse-0.2.tar.gz; tar zxf v2kparse-0.2.tar.gz
> bin/analyze

Usage: analyze.rb topModule vlogOpts+

where vlogOpts is one of
file.v
-f args.f
-v library.v
-y library_directory
+incdir+dir1(+dirn)*
+define+d(=v)?(+dn(=vn)?)*

> cd data
# run the simple testcase
> ../bin/analyze m1 -f tc2.f
DBG1: after proc_args: +define+SYNTHESIS +incdir+tc2 -y tc2 tc2/m1.v
Info: Pass 1: -I tc2 -D SYNTHESIS tc2/m1.v.
Info : tc2/m1.v:1: include file "/tmp/v2k/data/tc2/defs.vh". (INCL-1)
Info: Unresolved: m2.
Info: Pass 2: /tmp/v2k/data/tc2/m2.v.
Info: Unresolved: m3.
Info: Pass 3: /tmp/v2k/data/tc2/m3.v.
Info: Link status: true.

In this (simple) testcase, the tc2.f is:

+define+SYNTHESIS
+incdir+tc2
-y tc2
tc2/m1.v

So, as the analyze runs, the analyze.rb script is iterating:
  1. starting with the fully specific file: tc2/m1.v
  2. running the (java-based) parser
  3. querying the parse tree (created by parser) for defined modules and their references
  4. (analyze.rb) keeps track of defined, linked and unresolved references
  5. uses the unresolved references and the -y +libext+ specs to find matching files
  6. repeatedly calls the parser (with the newly found files) and queries the incrementally (modified) parse tree
  7. goto 3 and repeat until fully linked, or nothing else to do
I've left some debug messages in the code; and, the final message indicates whether the link was successful.

DISCLAIMER: this is a quick link, in the sense that an elaborate is not done: there are no port/pin checks, etc. Just a simple check that a referenced module is defined, or is defined to be a leaf cell.

If you want to play around with your own designs, you will likely have instanced memories, standard cells, macros, etc. which you normally would not specify in a .f file, explicitly. These normally are added via a library.f for simulation; and, would use .lib/.db files for typical implementation tasks, like synthesis.

So, there needs to be a convenient way to specify what modules names should be considered leafs. If you look in the ruby/srcs/analyze.rb file (at the bottom), you will see:

210 #TODO: Redefine the is_leaf method to describe module names
211 # which are not defined in any (user created) verilog contexts.
212 # There are typically leaf/library cells.
213 #
...

That describes how to specify leafs for your specific needs.

Have fun with this release, and I'll be back with more soon!