.ig >>
<STYLE TYPE="text/css">
<!--
        A:link{text-decoration:none}
        A:visited{text-decoration:none}
        A:active{text-decoration:none}
        OL,UL,P,BODY,TD,TR,TH,FORM { font-family: arial,helvetica,sans-serif;; font-size:small; color: #333333; }

        H1 { font-size: x-large; font-family: arial,helvetica,sans-serif; }
        H2 { font-size: large; font-family: arial,helvetica,sans-serif; }
        H3 { font-size: medium; font-family: arial,helvetica,sans-serif; }
        H4 { font-size: small; font-family: arial,helvetica,sans-serif; }
-->
</STYLE>
<title>shsql: indexes</title>
<body bgcolor=99cc99 vlink=0000FF>
<br>
<br>
<center>
<table cellpadding=2 bgcolor=FFFFFF width=550 ><tr>
<td align=right><a href="shsql_home.html">
<img src="img/shsql.gif" border=0><br><small>SQL database system</a> &nbsp; </td></tr>
<td>
.>>

.TH indexes TDH "09-FEB-2005   TDH scg@jax.org" 

.SH Indexes
Indexes may be used to speed up \fCSELECT\fRs, \fCUPDATE\fRs, and \fCDELETE\fRs
on larger tables or
.ig >>
<a href="ordfiles.html">
.>>
\0ordinary files.
.ig >>
</a>
.>>
\fBshsql\fR uses ISAM (indexed sequential) indexes, generally of two or three tiers
depending on table size.
Indexes are not dynamically updated on the fly,
but instead must be
.ig >>
<a href="maint.html">
.>>
\0rebuilt from time to time.
.ig >>
</a>
.>>

.ig >>
<br><br>
.>>
.SH Creating and removing
Indexes are created using the 
.ig >>
<a href="create.html#index">
.>>
\0CREATE INDEX command
.ig >>
</a>
.>>
and removed using the
.ig >>
<a href="drop.html">
.>>
\0DROP INDEX command.
.ig >>
</a>
.>>
You can also effectively remove indexes by removing the relevant files in
the \fCindexes\fR subdirectory of your project directory.

.ig >>
<br><br>
.>>
.SH Getting information about indexes
You can see which fields in a table have indexes (as well as attibutes of the indexes) by using the
.ig >>
<a href="tabdef.1.html">
.>>
\0tabdef(1) command
.ig >>
</a>
.>>
from the unix command line.
Another way to do it is to go into 
the \fCindexes\fR subdirectory of your project directory..
all files are ascii readable with header information in the first line.

.ig >>
<br><br>
.>>

.SH Indexes and the WHERE clause
When a \fCSELECT\fR, \fCUPDATE\fR, or \fCDELETE\fR command is issued, the \fCWHERE\fR clause
is examined to determine if indexing can be used to quicken the operation.
.LP
\fBSingle conditionals:\fR
Indexed access will occur if \fBall of these are true\fR for the comparison condition:
.IP \(bu
it has a left operand that is a fieldname for which an index exists
.IP \(bu
it uses a comparison operator that is one of: \fC =  LIKE  >  >=  <  <=  IN  INLIKE  INRANGE  OUTRANGE  CONTAINS\fR
.IP \(bu
it has a right operand that is a constant and does not begin with a wild card character.
(With \fCINLIKE\fR, none of the list members may begin with a wild card.
With \fCCONTAINS\fR wild card characters are considered punctuation and are ignored, so this 
isn't an issue.)
.LP
\fBCompound conditionals:\fR
When several individual comparisons are connected by \fBAND\fR,
only the leftmost comparison will be used for indexing,
subject to the above rules.
.LP
When several conditionals are connected by \fBOR\fR,
the leftmost comparison in each \fCOR\fR term will be used for indexing, 
subject to the above rules.  If the leftmost \fCOR\fR term is eligible
for indexing, all \fCOR\fR terms are expected to be eligibile for indexing;
if one turns out not to be eligible, an error is issued 
(this is not ideal and 
may be changed in a future release).

.ig >>
<br><br><br>
.>>
.SH Table scans
Queries not eligible for indexed access will result in a
"table scan", meaning that all records in the table are examined.  For smaller tables this is 
not a problem, but for larger ones performance may be poor.
You can prohibit table scans on any table for which an index exists by setting
.ig >>
<a href="config.html#dbmustindex">
.>>
\0dbmustindex
.ig >>
</a>
.>>
in your project config file.


.ig >>
<br><br><br>
.>>

.SH Alpha vs. Numeric
Indexing uses alphanumeric comparison unless the index is created with
.ig >>
<a href="create.html#index">
.>>
\0ORDER = NUMERIC
.ig >>
</a>
.>>
in which case numeric comparison will be used.
\fCORDER = NUMERIC\fR must be used for proper results with 
where clause comparisons that will use numeric comparison operators
\fC>\fR, \fC>=\fR, \fC<\fR, \fC<=\fR, \fCINRANGE\fR or \fCOUTRANGE\fR.
Non-numerics and \fCNULL\fR can be present in numeric fields, and indexing
may be used to access such values.
.LP
Note: for integer serial number fields, alpha is usually a better choice 
than numeric, since numeric magnitude comparison is usually not needed,
but operations involving \fCIN\fR (etc) are often useful.


.ig >>
<br><br><br>
.>>
.ig >>
<a name=indextype></a>
.>>

.SH Available index types

.LP
\fBstandard index\fR
.IP \0
a \fBstandard\fR index is the default type and is used in most cases.
A two-tier or three tier ISAM index will be built depending on table size.

.ig >>
<br><br><br>
.>>

.LP
\fBdirect\fR 
.IP \0
A \fBdirect index\fR is useful with data files that will seldom or never be updated
using shsql INSERT or UPDATE. 
Direct indexes are higher performance and use less disk space
than a regular index.  
However, you must sort the data file yourself before building or rebuilding a direct index.
.IP
If the field to be direct-indexed is alphanumeric and could contain a mixture of upper and lower case
values, it must be sorted without regard to case (\fCsort -f\fR does this; eg. {abc, Abd, aBe, Abf}).
If you're using \fCORDER = NUMERIC\fR then the field to be direct-indexed must be sorted in 
numeric order (\fCsort -n\fR).  
And, don't forget that \fBthe field name header must be put back at the top of the file\fR after doing the sort...
this can be done in a text editor.
.IP 
You can use any WHERE clause comparison with a direct-indexed field as with a standard-indexed field.
Data files that have a direct index may be updated by shsql INSERT or UPDATE, but
the table must be manually sorted again into the correct order before doing an
.ig >>
<a href="maint.html">
.>>
\0index rebuild
.ig >>
</a>
.>>
(otherwise subsequent retrievals will not work properly).
.IP
To create a direct index use an SQL command like this: 
\fCcreate index type=direct on dictionary ( term )\fR

.ig >>
<br><br><br>
.>>

.LP
\fBword\fR 
.IP \0
A \fBword index\fR is useful when searching fields that contain titles, descriptions, or
lists of values.  
Each word gets its own entry.  
\fBword\fR indexes are often found in combination with multiword search queries that use CONTAINS.
Described in more detail
.ig >>
<a href="multiword.html#wordindex">
.>>
\0here.
.ig >>
</a>
.>>

.ig >>
<br><br><br>
.>>
.LP
\fBcombinedword\fR 
.IP \0
A \fBcombinedword index\fR is similar to a \fBword\fR index, but it takes values from several
database fields to build the index, instead of just one, for better search efficiency
when several fields will frequently be searched together, as is often the case with search engine
applications.
Described in more detail
.ig >>
<a href="multiword.html#combinedwordindex">
.>>
\0here.
.ig >>
</a>
.>>

.ig >>
<br><br><br>
.>>
.LP
\fBcombined\fR 
.IP \0
A \fBcombined index\fR is the same as a \fBcombinedword index\fR, except that database fields
are not parsed into multiple words, but each is taken as a whole.  
The same rules mentioned
.ig >>
<a href="multiword.html#combinedwordindex">
.>>
\0here
.ig >>
</a>
.>>
apply.  (New in version 1.27)
.IP
Example of where this is useful:
An application that searches 3 fields, each of which contains
a single identifier token, but where some of the identifier fields contain
punctuation characters.  An attempt to set up a combinedword breaks these up into multiple "words"
which is incorrect.  Using a \fBcombined\fR index allows there to be one index for all 3 of these fields,
without undergoing the word parsing.  

.ig >>
<br><br><br>
.>>

.SH Notes
.LP
Retrievals that use an index will be ordered on the indexed field by default.
.LP
\fCDISTINCT\fR is automatically in effect on index-eligible \fCSELECT\fR retrievals when:
.IP \(bu
\fCOR\fR is present
.IP \(bu
\fCCONTAINS\fR is present
.IP \(bu
a \fBdirect\fR index is involved and any list-based comparison operators (such as \fCIN\fR
or \fCINLIKE\fR) are present
.LP
This is done to avoid unwanted duplication in the result row set as a consequence of the iterative 
method that \fBshsql\fR uses to retrieve rows in such situations.
.LP
If duplicate list members are specified in an \fCIN\fR or \fCINLIKE\fR expression 
(for example when expressions are generated dynamically), 
\fCSELECT DISTINCT\fR should be used in order to eliminate duplication in the 
result row set.
.LP
Alphanumeric index tags are truncated to a certain length, by default 15 characters.
This can be raised in 
.ig >>
<a href="config.html">
.>>
\0your config file.
.ig >>
</a>
.>>
.LP
Indexes are implemented as tabular ascii files located in the \fCindexes\fR directory.
.LP
Index building is actually done by buildix(1) which in turn invokes unix \fCsort\fR.
\fCsort\fR is invoked such that alphanumerics will be sorted in ascii order (case sensitive).
.LP
A list of table fields that have indexes is maintained by \fBshsql\fR in files called
\fItablename\fC.\fIfieldname\fC.0\fR

.ig >>
<br>
<br>
</td></tr>
<td align=right>
<a href="shsql_home.html">
<img src="img/shsql.gif" border=0></a><br>
<a href="Copyright.html">Copyright Steve Grubb</a> &nbsp;
</td></tr>
</table>
.>>
