Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
admin:indexing:creation:performance [2012/01/26 17:18]
127.0.0.1 external edit
admin:indexing:creation:performance [2016/06/28 22:38] (current)
Line 4: Line 4:
 ====== Administration:​ Omnidex Indexing ====== ====== Administration:​ Omnidex Indexing ======
  
-===== Indexing ​Creation =====+===== Index Creation =====
  
 [[admin:​indexing:​creation:​home|Overview]] | [[admin:​indexing:​creation:​home|Overview]] |
Line 11: Line 11:
 [[admin:​indexing:​creation:​files|Index Files]] | [[admin:​indexing:​creation:​files|Index Files]] |
 **[[admin:​indexing:​creation:​performance|Performance]]** **[[admin:​indexing:​creation:​performance|Performance]]**
 +----
 +==== Performance ====
  
-==== Index Maintenance ====+Omnidex puts on premium on performance. ​ Applications often use hundreds of Omnidex indexes, so it is important that all of these indexes can be built quickly. ​ Omnidex also puts a premium on index size.  Omnidex uses compression algorithms to make the indexes as small as possible while still allowing fast access. ​ This allows the index to be quickly retrieved from the disk, and it also insures that memory usage is kept at a minimum. ​ This in turn improves performance by insuring the fastest I/O and insuring the most caching.
  
 +To aid performance,​ administrators should maintain the Omnidex indexes on disk drives that are regularly defragmented,​ and have plenty of extra space. ​ It is also valuable to isolate Omnidex indexes onto a separate disk drive, and even a separate disk controller, than the data files and the temporary file directory. ​ In an ideal world, each of these three kinds of file would have their own drive and their own disk controller to minimize disk contention.  ​
 +
 +For large databases, administrators should also insure that indexes are not loaded at the same time queries are being performed. ​ These two operations compete with each other such that both will go quite slowly. ​ Administrators should either load indexes on a separate server, or should avoid queries while the indexes are being loaded.
 +
 +=== Predicting Indexing Time ===
 +
 +It is possible to get a rough estimate of the time required for indexing a table. ​ This is only a rough estimate, as there are many variables. ​ Chief among the variables are:
 +
 +  * **The speed of the server**. ​ Omnidex indexing benefits from fast CPU's and fast I/O channels.
 +  * **The load on the CPU**. ​ Omnidex benefits from having unrestricted access to one core for each concurrent table being indexed, plus one unrestricted core available to the operating system for managing file caching.
 +  * **The load on the I/O channel**. ​ Omnidex benefits from having access to the data files, the index files and the temporary directory using separate, uncongested I/O channels. ​
 +  * **The speed of the underlying database**. ​ Omnidex is dependent on access to the underlying data.  Raw data files usually provide faster performance than even a well-tuned relational database. ​ Relational databases that are accessed remotely, such as through an ODBC or a proprietary interface like Oracle'​s SQL*Net, will perform noticeably slower. ​ This impact is most noticed on tables with very few indexes.
 +
 +The following formula will provide a rough estimate on the time needed to index a table:
 +
 +<​code>​
 +For each table ...
 +  For that table, plus any prejoined tables ...
 +
 +    (Number of rows) + (Number of keywords per row)
 +    -----------------------------------------------
 +            1 billion keywords per hour
 +
 +    Number of rows:      the number of rows in the table
 +    Number of keywords: ​ the number of parsed words or unparsed values per row.
 +
 +</​code>​
 +
 +On commonly available hardware, Omnidex has been seen to index as fast as 5 billion keywords per hour, and as slow as 100 million keywords per hour.  After a few tests, you should be able to substitute a rate that is more tailor for your environment.
  
 ====  ====  ====  ==== 
 
Back to top
admin/indexing/creation/performance.1327598284.txt.gz ยท Last modified: 2016/06/28 22:38 (external edit)