Administration: Omnidex Indexing

Index Creation

Overview | Declaring Indexes | Updating Indexes | Index Files | Performance


Performance

Omnidex puts on premium on performance. Applications often use hundreds of Omnidex indexes, so it is important that all of these indexes can be built quickly. Omnidex also puts a premium on index size. Omnidex uses compression algorithms to make the indexes as small as possible while still allowing fast access. This allows the index to be quickly retrieved from the disk, and it also insures that memory usage is kept at a minimum. This in turn improves performance by insuring the fastest I/O and insuring the most caching.

To aid performance, administrators should maintain the Omnidex indexes on disk drives that are regularly defragmented, and have plenty of extra space. It is also valuable to isolate Omnidex indexes onto a separate disk drive, and even a separate disk controller, than the data files and the temporary file directory. In an ideal world, each of these three kinds of file would have their own drive and their own disk controller to minimize disk contention.

For large databases, administrators should also insure that indexes are not loaded at the same time queries are being performed. These two operations compete with each other such that both will go quite slowly. Administrators should either load indexes on a separate server, or should avoid queries while the indexes are being loaded.

Predicting Indexing Time

It is possible to get a rough estimate of the time required for indexing a table. This is only a rough estimate, as there are many variables. Chief among the variables are:

The following formula will provide a rough estimate on the time needed to index a table:

For each table ...
  For that table, plus any prejoined tables ...

    (Number of rows) + (Number of keywords per row)
    -----------------------------------------------
            1 billion keywords per hour

    Number of rows:      the number of rows in the table
    Number of keywords:  the number of parsed words or unparsed values per row.

On commonly available hardware, Omnidex has been seen to index as fast as 5 billion keywords per hour, and as slow as 100 million keywords per hour. After a few tests, you should be able to substitute a rate that is more tailor for your environment.

Prev

Additional Resources

See also: