JPGD Manual

Alexander Merz
$Id: doc.html,v 1.2 2006/03/27 00:33:35 Alexander Exp $

Introduction

The Java-based Parser for Graphviz Documents was created to read definitions of graph structures and their attributes stored in the file format used by the Graphviz Tool collection.

The parser reads a graphviz document from a reader object and created an easy-to-use data structure containing the definition of graphs, their nodes, cluster/sub graphs and edges, all including any given render attributes.

The parser is pure Java and build using JavaCC. All necessary Java class files are included in the Jar file, that is shipped with the JPGD distribution.

JPGD is licensed under the LGPL.

Installation

You can obtain the last release of JPGD from http://www.alexander-merz.com/graphviz/.

After downloading the archive, unpack it. The binaries are are packed in a Jar file. Add this Jar file to a location, where it can be found by the Java interpreter.

The rest of the files in the archive are the source code of the library and documentation.

Invoking the parser

There are only two relevant methods in the class com.alexmerz.graphviz.Parser to parse a Graphviz document: parse() to parse the document and getGraphs to get list of Graph objects holding the graph data.

parse() expects a java.io.Reader or StringBuffer object to read from. Please note: In the beta version of JPGD you can provide an InputStream or a Reader object to the constructor of the Parser class. But this will not work!

Because a Graphviz document may contain more then one graph definition, you will get an java.util.ArrayList from getGraphs(). Each entry in the list stores a com.alexmerz.graphviz.objects.Graph object. This class is described in the next section.

A simple example:


import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;

import com.alexmerz.graphviz.ParseException;
import com.alexmerz.graphviz.Parser;
import com.alexmerz.graphviz.objects.Edge;
import com.alexmerz.graphviz.objects.Graph;
import com.alexmerz.graphviz.objects.Id;
import com.alexmerz.graphviz.objects.Node;

public class Example1 {
    public static void main(String[] args) {
        FileReader in=null;
        File f = new File( file );

        try {
            in = new FileReader(f);
            Parser p = new Parser();
            p.parse(in);
        } catch (FileNotFoundException e) {
            // do something if the file couldn't be found
        } catch (ParseException e) {
            // do something if the parser caused a parser error
        }

        // everything ok
        ArrayList<Graph> gl = p.getGraphs();

        // do something with the Graph objects
    }
}

A parser exception is thrown if the document does not match the grammar. The exception message includes the line number and position in the line.

The data structure

A Graph objects holds the nodes, edges and clusters of a graph. These elements are also represented by objects:

Graph - com.alexmerz.graphviz.objects.Graph
Node - com.alexmerz.graphviz.objects.Node
Edge - com.alexmerz.graphviz.objects.Edge
Cluster - com.alexmerz.graphviz.objects.Graph

Two additional classes exist:

com.alexmerz.graphviz.objects.PortNode is a special decorator to Node objects containing a port attribute. A PortNode always refers to an existing Node object. PortNode objects are only used in Edge objects.
Objects of com.alexmerz.graphviz.objects.Id are used to identify Graphs and Nodes. Because the Graphviz format allows labels as identifier, Graphs and Nodes object do not use a scalar value as identifier, instead they use Id objects. See also the section about Id objects.

Each of this object can hold attributes like render information or formating hints. You can access them using getAttribute() present in each of the classes. In the current version JPGD has no restrictions regarded to attribute names or values. It depends on you to check keywords and values. In future versions the checks will be an additional part of the parser.

Graph/Cluster objects

A Graph object is a container for all objects in the graph.

If a Graph object represents a cluster, the Graph object has also a corresponding Node object in the Node list of the parent Graph object. The Graph object and the Node object have the same Id in such a case and the isSubgraph() method of the Node object returns true.

A Graph object can hold generic attributes, which should apply to all nodes, edges or clusters. The values can be fetched via the getGeneric*Attribute() methods.

Although the generic attributes should applied to all elements, they are not set by the parser, it depends on the application to do this. This allows to save memory and makes parsing faster.

Every Graph object contains a list of Node objects, which are defined in the graph. If a Graph contains clusters, and in these clusters are defined additional nodes, they are not part of the node list in the parent graph. For example:

graph MyGraph {
    node1 [label="Node 1"];
    node2 [label="Node 2"];
    subgraph MyCluster {
        node3 [label="Node 3"];
        node4 [label="Node 4"];
   }
}

The Node list of the Graph object for MyGraph will only include node1 and node2. The Graph object for MyCluster holds node3 and node4. You must take care how to call getNodes(). If you call the method for MyGraph with true as parameter, then you get a list including all nodes, also from MyCluster. If you set the parameter to false, you will only get a list containing node1 and node2.

Node objects

A node has an unique Id object to identify the node and can also holds attributes.

A Node object is also generated to represent a cluster in an Edge object.

Edge objects

An Edge object contains a source node and a target node. Nodes in edge statements are not represented by Node objects, instead a PortNode object is used. The cause is that a port information for rendering can be added to the node in the statement.

If the source or the target is a cluster, the method isSubgraph() of the underlying Node object of the PortNode object returns true.

List assignments like {node1 node2} --> node3 are resolved into single edge statements, so the example will be resolved to node1 --> node3 and node2 --> node3.

The Id object

The Graphviz format allows to use labels as identifier for graphs, clusters and nodes, for example:

"node1" -- "node2";
subgraph "cluster1" {...};

To get a better handling, the parser creates an Id object holding the identifier and/or a label.

To find out if two Node objects or clusters are the same, you can use the isSame() method

You should take care about the label handling. Compare this two statements:

node1 [label="Test"];
"Node2";

In the first statement the Id object of the Node object for node1 contains the value "node1" for the identifier, but an empty string for the label value. Instead the label attribute is set and can be fetched via getAttribute()

In the second statement the Id object will have an empty value for the identifier and "Node2" for the label value.

Short: if a label is used as identifier, then you must fetch it via getLabel() of the Id object. If the label is set in an attribute list, then you must use getAttribute() of the Node or Graph object.

The CLI program

The Jar file contains an executable Java class which expects an filename as parameter.

java -jar graphviz.jar MyFile.viz

It prints the content of the Graph structure to the standard output. The format of the output was designed to allow an further processing using Unix standard tools like grep. In case of errors the exception message is printed to the standard error console and the program exits with a specif error message:

11 - No file name given
12 - File could not be opened
13 - Parser error