Schemaball is an SQL database schema viewer. It requires Perl and a few modules, such as GD and optionally Math::Bezier and SQL::Translator. Schemaball creates flexible visualizations of database schemas. Schemas may be read from an SQL schema dump, flat file or live database.

schemaball

home about screenshots requirements

documentation

tour tutorials README CHANGES man page

download

Perl code

Schemaball man page



NAME

schemaball - visualize an SQL database schema in a circular composition


SYNOPSIS

  # all parameters read from default configuration file
  schemaball [-debug [-debug] ]
  # read parameters from specific configuration file
  schemaball -conf etc/schemaball.conf 
  # override database connection settings
  schemaball [-database DATABASE] [-host HOST] [-user USER] [-password PASSWORD]
  # dump configuration only
  schemaball -dumpconfig
  # read documentation
  schemaball -man


DESCRIPTION

Schemaball generates high-level database schema diagrams. The schema diagram is formatted along a circular track. Tables are represented by glyphs along the circumference and relationships between tables (constraints) are depicted by lines that join the table glyphs. Schemaball can follow these constraints from table to table and automatically annotate the image to help you identify tables participating in a constraint chain. Diagram elements can be flexibly hidden or highlighted to draw attention to specific parts of the schema.

Schemaball is primarily designed to help visualize the relationships between tables, rather than table structure. Currently, the schema diagrams do not indicate which field in a table participates in a constraint relationship and details of internal table structure are not shown.

The database structure can be obtained by connecting to a live server or by reading SQL schema dumps or flat files. The properties of the schema diagram are manipulated by setting parameter values stored in a configuration file.

Up-to-date documentation, code and FAQ can be viewed at http://mkweb.bcgsc.ca/schemaball.


Looking at Database Schemas

When database schemas are small, generating effective illustrations is very easy. Diagrams can be drawn manually without much effort. As the number of tables and constraints grows, it becomes more difficult and time consuming to depict the schema in a compact and informative manner. The schema diagram becomes very complex and automated drawing tools have difficulty in optimizing the placement and routing of the glyphs and lines in the schema diagram. In a highly connected schema, it may not be possible to draw the table glyphs in clusters to limit the total length of constraint lines. As a result, constraint paths are long and tracing them is awkward.

One way to limit the complexity inherent in the routing problem of the constraint lines is to use a pre-determined layout for the table glyphs. By placing these along a circle, constraint lines are easy to generate and follow. The circular schema diagram scales well with schema size - the diagram remains compact and light on the eyes.

Database schemas, especially for large databases, are usually examined in a piece-meal fashion. You don't look at the entire schema all at once - you examine groups of tables and constraints which form a functional group. By decomposing the schema in this manner, the data, objects and interactions modeled by the database can be more easily understood. Schemaball is ideally suited for such decomposition - the colour scheme of arbitrary elements (tables or constraints) can be adjusted. The colour scheme can be propagated along constraints, thereby generating highlighted paths through the schema.


Applications

Schemaball generates static images. Currently only bitmap images (PNG) can be generated. However, coordinates of each element can be obtained and may be used to generated vector images (e.g. SVG). The properties of the schema diagrams can be tuned flexibly making publication-quality output possible.


DATA ELEMENTS

To draw the schema diagram, a list of tables and constraints is required. In diagrams generated by Schemaball, internal table structure is not displayed. Tables and constraints may be loaded from (a) live database (b) SQL schema dump, or (c) list of table or constraint names.


Database Support

Schemaball needs to know how to communicate with your database to be able to

To connect to a live database, the <db> block in the configuration file stores the server name, database name and user authentication tokens. Interface to the database is provided by DBI and the DBI driver is specified in 'dbtype' in the <sqldb> block. This block additionally defines all the SQL commands which are used to poll database structure. The <sql> block is described further on.

In addition to polling a live database, Schemaball can collect schema information from an SQL schema dump or a flat file which lists the table names. The <db> block is used to specify the data source.

  • connecting to a live server

    To connect to a database server, the following need to be specified in the <db> block.

      <db>
      database = sequence
      host = seqdb01
      user = viewer
      password = viewer
      </db>
  • parsing SQL dumps

    The 'sqlfile' variable stores the filename of the schema dump.

      <db>
      sqlfile = schemas/hs.sql
      </db>
  • table lists

    If your table names are in a list, use 'tablefile'.

      <db>
      tablefile = tables.txt
      </db>

    Flat files are useful in cases when you'd like to draw a schema diagram but you don't want to create a database or generate the appropriate SQL commands to create tables.


Defining SQL commands

Named <sql> blocks store SQL commands. You may need to adjust the value of the 'sql' variable in these commands to make them compatible with your server, if you are not using MySQL. The name of the block should not be changed. Each SQL command is defined in the following way

  <sql NAME>
    sql = SQLCOMMAND
    colnum = COLNUM
  </sql>

where NAME is a unique identifier, SQLCOMMAND is the SQL query and COLNUM is the column in the query result that contains the desired information. The value of COLNUM depends on what the query is designed to achieve.

The SQL command may be applied to specific databases and tables by using the strings _DB_ and _TABLE_ in the query string. These strings will be substituted with the current database and table name. The following <sql> blocks are defined by default and are required to extract the required schema information.

  • listtables

    The listtables SQL command is expected to provide a list of tables in the database. In MySQL, the appropriate SQL query is ``SHOW TABLES''.

      <sql listtables>
      sql = "show tables"
      colnum = 0
      </sql>
  • describetable

    This SQL command generates a list of all fields in a table, along with the field type and other information.

  • tablecreate

    One way to gain access to the list of constraints for a table is to obtain the SQL statement that creates the table. In MySQL, this is done using ``SHOW CREATE TABLE tablename''. Schemaball will parse the SQL create command for constraint relationships and identify corresponding tables.

  • tablestatus

    Another way to list constraints is through the table status view. The MySQL query is ``SHOW TABLE STATUS FROM database LIKE tablename''.


Creating a Table List

Schemaball generates its table list from the data source defined in the <db> block. When the <db> block defines a database, the SQL query defined in the <sql listtables> block is used to generate the table list. When the <db> block defines an SQL schema dump, the CREATE TABLE statements in the dump are parsed to extract table names.


Determining Foreign Keys

MySQL does not support foreign keys currently for MyISAM tables (as of v4.0, see http://www.mysql.com/doc/en/ANSI_diff_Foreign_Keys.html). Support for constraints may be implemented by using (a) InnoDB tables which support foreign keys, or (b) application logic. To help automate application logic, foreign key fields may be named using a specific convention, which can be used to work out the target table and field for a given foreign key name. One such convetion is

  FKtype_tablename__field

where type is some arbitrary string to differentiate of the foreign key. This field is specific to an application, which associates different TYPEs with specific business logic. The TYPE makes it possible to have multiple foreign keys pointing to the same field in the same table. The TABLE is the name of the table that the foreign key references and the FIELD is the name of the FIELD in that table. If the canonical name of the referenced field is in the format TABLE_FIELD, then the FIELD string in the FK name does not include the TABLE_ string. For example, if a Person table has a foreign key which references the field Personality_ID in the table Personality, then the name of the key would be FK_Personality__ID (not FK_Personality__Personality_ID).

The <linkrule> block in Schemaball's configuration file specifies how to detect constraint relationships. There are three ways in which constraints may be determined.

  • constraints from SQL

    Constraint relationships can be extracted by polling the database with specific SQL statements. The SQL statement to use is stored in the 'sql' variable. The query result is parsed using a regular expression stored in 'rx'.

      <linkrule fk>
      sql  = tablestatus
      rx   = REFER \w+\/(\w+)
      use  = no
      </linkrule>

    The regular expression uses a capturing bracket to extract the table name to which the constraint refers.

      <linkrule tablecreate>
      sql  = tablecreate
      rx   = REFERENCES `?(\w+)
      use  = no
      </linkrule>
  • constraints from field names

    If you follow a naming convention for your foreign keys, a link rule as shown below should be used.

      <linkrule>
      type = a foreign key
      rx   = .*FK_(\w+)__(\w+)
      <fields>
        table = 0
        field = 1
      </fields>
      </linkrule>

    This link rule defines the regular expression to be applied to the field name. The regular expression contains capturing brackets, which store the table and field name pointed to by the foreign key.

  • constraints from a flat file

    You can specify all or additional constraints using a flat file. Each line specifies a single constrain, which can be in the format table1__table2, where the tables are delimited by __, or ``table1 table2'', where the tables are delimited by a space. In both cases, table1 contains a foreign key pointing to table2.

      <linkrule linkfile>
      file = fk.txt
      use  = yes
      </linkrule>
      ## fk1.txt
      table1__table2
      table2__table3
      ## fk2.txt
      table1 table2
      table2 table3


Making Table and Constraint Dumps

Once Schemaball has analyzed the data source (database, SQL dump, or table list) and compiled a list of the constraints (database, SQL dump, or constraint list), it can generate a dump of what it has parsed. If you toggle debugging (-debug), the dump is reported in lines starting with ``schemadump''. These data can be used to verify that Schemaball has correctly parsed your database structure and to create flat files of table and constraint names.


CONFIGURATION


Syntax

Schemaball gets most of its runtime parameters from an Apache-like configuration file. The Config::General module is required to parse the file. For the syntax of the configuration file, refer to the man page of Config::General.


Configuration File Location

If you do not specify the configuration file, Schemaball checks a few default locations before giving up. The default name of the configuration file is schemaball.conf, or SCRIPTNAME.conf where SCRIPTNAME is the name of the schemaball script. The following paths are checked for the SCRIPTNAME.conf file, in order. Relative paths are relative to the location of the script.


Configuration Blocks

Blocks in the configuration file group parameters that control disparate aspects of the schema diagram. These blocks are

  • <db>

    Data source: (a) server, database name, user and password, (b) SQL schema dump file, or (c) flat file containing table names.

  • <linkrule>

    Named blocks defining rules for parsing constraint relationships from (a) database using SQL queries, (b) field names, or (c) flat file containing constraint names.

  • <image>

    Parent block for schema diagram properties. Contains an <elements> block which stores properties of the following image elements: <link> (constraint lines), <bezier> (bezier curve properties for constraint lines), <label> (text labels for tables), <table> (table glyphs) and <highlight> (constraint tracing).

  • <sqldb>

    Database driver and custom SQL statements. Contains named <sql> blocks which store custom SQL statements.


Image

The image block stores the following entries

  <image>
    file
    image_element_dump
    size
    glyphradius
    annotation
    background
    foreground
    <elements> ... </elements>
  </image>


Image :: Elements :: Link

The link block controls how constraint lines should be drawn.

  <link>
   show_links
   color
   color_hi
   stroke
   stroke_hi
   colour_from
   stroke_from
   foot_fraction
   anchor
   type
   link_to_invisible
   hide_file
   <hide_rx> ... </hide_/rx>
   highlight_file
   <highlight_rx> ... </highlight_rx>
  </link>


Image :: Elements :: Bezier

  <bezier>
   resolution
   radius
  </bezier>

The <bezier> block controls properties of the bezier curves, if this constraint line type is chosen.

  • image :: elements :: bezier :: resolution = INT

    The curve is drawn using piece-wise straight lines. The resolution controls the number of these lines. The higher the number, the more smooth the curve will appear.

  • image :: elements :: bezier :: radius = FLOAT(-1..1)

    The location of the third point that defines the bezier curve. The third point will be half way between the table glyphs, in angular distance, and at a relative radius of 'radius'. The smaller the value of radius, which may be negative, the higher the curvature of the constraint line.


Image :: Elements :: Label

Properties of text and text labels.

  <label>
   font
   size
   margin
  </label>


Image :: Elements :: Table

Table glyph properties.

  <table>
   size
   fill
   fill_hi
   stroke
   outline
   outline_hi
   label
   anonymize
   hide_file
   <hide_rx> ... </hide_rx>
   invisible_hide
   highlight_file
   <highlight_rx> ... </highlight_rx>
  </table>


Image :: Elements :: Highlight

Controls how highlighting is inherited across constraint lines.

  <highlight>
   highlight_by_link
   highlight_by_table
   highlight_by_table_forward
   highlight_by_table_reverse
   highlight_by_iterations
   fade_factor_table
   fade_factor_link
  </highlight>

The inheritance of highlighting helps to follow contraint relationships between tables. For example, by first highlighting a specific table (see table :: highlight_rx)

  <table>
   <highlight_rx>
    mytable
   </highlight_rx>
  </table>

you can ask that all constraint lines originating from and ending at this table are highlighted too.

  highlight_by_table         = yes
  highlight_by_table_forward = yes
  highlight_by_table_reverse = yes
  highlight_by_iterations    = 1

Similarly, if you highlight a constraint line, you can highlight all the tables that participate in the lines

  <link>
   <highlight_rx>
   ^table1__
   <highlight_rx>
  </link>
 
  highlight_by_link = yes
  highlight_by_terations = 1

The effect will be to highlight all tables which participate in a constraint relationship with table1.

The parameter 'highlight_by_link' toggles whether tables should inherit link highlights. Similarly, links inherit table highlights if 'highlight_by_table' is set.

  • image :: elements :: highlight :: highlight_by_link = yes | no

    Toggles whether links are used to highlight tables. In order for tables to become highlighted, some links must be highlighted using the <highlight_rx> block or highlight_file parameter in the <link> block.

  • image :: elements :: highlight :: highlight_by_table = yes | no

    Toggles whether tables are used to highlight links. In order for links to become highlighted, some tables must be highlighted using the <highlight_rx> block or highlight_file parameter in the <table> block.

  • image :: elements :: highlight :: highlight_by_forward = yes | no

    When a table is highlighted, the inheritance of the highlight to its links can be controlled using 'highlight_by_forward' and 'highlight_by_reverse' (see below). To highlight outgoing constraints, use highlight_by_forward. To highlight incoming constraints, use 'highlight_by_reverse'. Both can be set to highlight all constraints that involve a highlighted table.

  • image :: elements :: highlight :: highlight_by_iterations = INT

    The number of iterations of highlight inheritance. Consider the linear schema

      T1 - LINK1 - T2 - LINK2 - T3 - LINK3 

    If we initially highlight T1,

      <table><highlight_rx>T1</highlight_rx></table>
     
    and set the number of iterations to 0, the resulting schema will have only T1 highlighted.
      T1(h) - LINK1 - T2 - LINK2 - T3 - LINK3 ...

    Setting iterations to 1,

      T1(h) - LINK1(h) - T2(h) - LINK2 - T3 - LINK3 

    When two iterations are used,

      T1(h) - LINK1(h) - T2(h) - LINK2(h) - T3(h) - LINK3 

    and so on.

  • image :: elements :: highlight :: fade_factor_table = FLOAT(0..1)
  • image :: elements :: highlight :: fade_factor_link = FLOAT(0..1)

    With each iteration, the colour of the highlights may be diluted to distinguish the inheritance depth. The first highlight is always drawn as specified by color_hi and stroke_hi in the <link> block and outline_hi and fill_hi in the <table> block. Subsequently, the highlight colours are diluted in HSB space using the following recipe

      color_hi(n) = RGB(HSB(color_hi(n-1)) * fade_factor ^ (iteration-1)
                        HSB(color) * (1-fade_factor)^ (iteration-1))

    The size of the fade factor determines how quickly the highlighted colour fades towards the regular colour for the highlighted element. If fade_factor = 1, the colour never fades. If fade_factor = 0.5, the colour fades half-way towards the regular colour (in HSB space). If fade_factor = 0, the highlighted colour immediately turns to the regular colour on the first iteration.


DEBUGGING

Schemaball has two debug levels, specified by the number of -debug command-line parameters. Level 1, set by -debug, will make printdebug() produce output. Level 2, set by -debug -debug, will in addition dump the structure of the $CONF variable.

  • $CONF

    The global $CONF variable holds all runtime information, parsed from the configuration file, as well as other settings. To see the structure of this variable run

      schemaball -dumpconfig


HISTORY

  • v0.11 3 Oct 2004
  • v0.10 19 May 2004

    Database schemas can be parsed from flat files, SQL dumps (using SQL::Translator) or (as before) live servers.

    Table names and constraints are dumped to STDOUT in schemadump lines when -debug is used.

    Added tutorials using the Ensembl (www.ensembl.org) human genome database.

    Fixed HSB dilution bug for greys.

  • v0.02 27 October 2003

    Now using stringFT instead of old stringttf GD API call to make labels (Steve Postma). Fixed label positioning to accomodate proportionately spaced fonts. Bundled minihaha font, since minimono does not have lowercase letter forms.

  • v0.01 26 October 2003

    Initial release.


BUGS

Please report bugs and suggestions to martink@bcgsc.ca.


AUTHOR

Martin Krzywinski, martink@bcgsc.ca

$Id: schemaball,v 1.5 2003/10/27 20:25:11 martink Exp $


CONTACT

  Martin Krzywinski
  Genome Sciences Centre
  Vancouver BC Canada
  www.bcgsc.ca
  martink@bcgsc.ca