Administration: Omnidex Indexing

Index Creation

Performance

Omnidex puts on premium on performance. Applications often use hundreds of Omnidex indexes, so it is important that all of these indexes can be built quickly. Omnidex also puts a premium on index size. Omnidex uses compression algorithms to make the indexes as small as possible while still allowing fast access. This allows the index to be quickly retrieved from the disk, and it also insures that memory usage is kept at a minimum. This in turn improves performance by insuring the fastest I/O and insuring the most caching.

To aid performance, administrators should maintain the Omnidex indexes on disk drives that are regularly defragmented, and have plenty of extra space. It is also valuable to isolate Omnidex indexes onto a separate disk drive, and even a separate disk controller, than the data files and the temporary file directory. In an ideal world, each of these three kinds of file would have their own drive and their own disk controller to minimize disk contention.

For large databases, administrators should also insure that indexes are not loaded at the same time queries are being performed. These two operations compete with each other such that both will go quite slowly. Administrators should either load indexes on a separate server, or should avoid queries while the indexes are being loaded.

Predicting Indexing Time

It is possible to get a rough estimate of the time required for indexing a table. This is only a rough estimate, as there are many variables. Chief among the variables are:

  • The speed of the server. Omnidex indexing benefits from fast CPU's and fast I/O channels.
  • The load on the CPU. Omnidex benefits from having unrestricted access to one core for each concurrent table being indexed, plus one unrestricted core available to the operating system for managing file caching.
  • The load on the I/O channel. Omnidex benefits from having access to the data files, the index files and the temporary directory using separate, uncongested I/O channels.
  • The speed of the underlying database. Omnidex is dependent on access to the underlying data. Raw data files usually provide faster performance than even a well-tuned relational database. Relational databases that are accessed remotely, such as through an ODBC or a proprietary interface like Oracle's SQL*Net, will perform noticeably slower. This impact is most noticed on tables with very few indexes.

The following formula will provide a rough estimate on the time needed to index a table:

For each table ...
  For that table, plus any prejoined tables ...

    (Number of rows) + (Number of keywords per row)
    -----------------------------------------------
            1 billion keywords per hour

    Number of rows:      the number of rows in the table
    Number of keywords:  the number of parsed words or unparsed values per row.

On commonly available hardware, Omnidex has been seen to index as fast as 5 billion keywords per hour, and as slow as 100 million keywords per hour. After a few tests, you should be able to substitute a rate that is more tailor for your environment.

Additional Resources

See also:

 
Back to top
admin/indexing/creation/performance.txt ยท Last modified: 2016/06/28 22:38 (external edit)