These tutorials illustrate different features of Schemaball. Tutorials are divided into sections. Each section covers a particular aspect of a feature or set of features. Each section has its own documentation, configuration file and schema ball image.
sbtut_02 - extracting and drawing constraint relationships
<linkrule fieldregex> fieldrx = ^(\w+)_([a-z]+)$ use = yes avoidself = yes <fields> table = 0 field = 1 </fields> </linkrule>
<link> show_links = yes color = cccccc stroke = 2 anchor = edge type = bezier </link>
<bezier> resolution = 100 radius = 0.1 </bezier>
Schemaball can parse constraint relationships in a number of ways. It understands ``CONSTRAINT'' table options, it can parse field names to generate constraints and read constraints from external flat files. In this tutorial, we'll use the Ensembl human genome database to show how foreign key status can be parsed from field names.
MySQL does not supports foreign keys for MyISAM tables as of version 4.0. Consequently, application logic is required to maintain constraints and database integrity. To aid in this process, developers name foreign key fields in a distinct manner, so that these fields may be easily recognized.
In the Ensembl schema, foreign keys are named with the convention TABLENAME_FIELD, where TABLENAME is the name of the target table and FIELD is the field name.
To instruct Schemaball how to parse constraints from field names, a <linkrule> block is used with a 'fieldrx' parameter. This parameter defines the regular expression applied to the field name. Cpture brackets delineate the target table and target field and the ordinality of these brackets is stored in the <fields> subblock.
For example, the table density_type contains these fields:
density_type_id int(11) NOT NULL auto_increment, analysis_id int(11) NOT NULL default '0', block_size int(11) NOT NULL default '0', value_type enum('sum','ratio') NOT NULL default 'sum',
The analysis_id is a foreign key which points to the table analysis and the field analysis_id. The regular expression to parse this naming convention is therefore ``^([a-z_]+)_([a-z])+$''. The first capture bracket, which defines the target table, includes a ``_'' in the character class because tables may contain this character. Schemaball will perform a sanity check to make sure that the target field exists within the target table. In this specific case, since analysis.id does not exist, a check for analysis.analysis_id will be made.
The appropriate <linkrule> block, which is a named block and may be given an arbitrary name, is <linkrule fieldregex> fieldrx = ^(\w+)_([a-z]+)$ use = yes avoidself = yes <fields> table = 0 field = 1 </fields> </linkrule>
The 'avoidself' parameter removes any foreign keys which appear to be self-referencing.
In addition to the <linkrule> block, a <link> subblock is used to format the constraint lines.
<link> show_links = yes color = cccccc stroke = 2 anchor = edge type = bezier </link>
The 'anchor' determines whether the line will joint the table glyph at its edge, closest to the center of the circle, or at its center. The 'type' determines whether the constraint line is a straight 'line' or a 'bezier' curve. If 'bezier' is used, a <bezier> block is required to define the third point, as well as the rate at which the bezier curve is sampled. The third point is always placed so that its angular component bisects the angle formed by the start and end of the curve. The radial component is specified by 'radius', relative to the radius of the circle.
<bezier> resolution = 100 radius = 0.1 </bezier>
To parse the schema and produce the schema diagram,
> schemaball -conf tutorial/02/sbtut_022.conf -debug > parsed.txt
The diagram is included as sbtut_02.png.
Schemaball generates useful information about the schema when the -debug flag is used.
Now that the constraint relationships can be extracted from the schema, the ``schemadump'' lines in Schemaball's output contain these constraints.
> grep schemadump parsed.txt ... debug schemadump table coord_system debug schemadump table density_feature debug schemadump constraint density_feature density_type debug schemadump constraint density_feature seq_region debug schemadump table density_type debug schemadump constraint density_type analysis ...
The constraint lines list the source table first, and then the target table.
Please report errors and suggestions to martink@bcgsc.ca
Martin Krzywinski, mkweb.bcgsc.ca, martink@bcgsc.ca
$Id: schemaball,v 1.5 2003/10/27 20:25:11 martink Exp $
Martin Krzywinski Genome Sciences Centre Vancouver BC Canada www.bcgsc.ca mkweb.bcgsc.ca martink@bcgsc.ca