Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
admin:indexing:creation:performance [2012/01/26 21:30]
doc
admin:indexing:creation:performance [2016/06/28 22:38] (current)
Line 4: Line 4:
 ====== Administration:​ Omnidex Indexing ====== ====== Administration:​ Omnidex Indexing ======
  
-===== Indexing ​Creation =====+===== Index Creation =====
  
 [[admin:​indexing:​creation:​home|Overview]] | [[admin:​indexing:​creation:​home|Overview]] |
Line 11: Line 11:
 [[admin:​indexing:​creation:​files|Index Files]] | [[admin:​indexing:​creation:​files|Index Files]] |
 **[[admin:​indexing:​creation:​performance|Performance]]** **[[admin:​indexing:​creation:​performance|Performance]]**
 +----
 ==== Performance ==== ==== Performance ====
  
-Omnidex puts on premium on indexing ​performance. ​ Applications often use hundreds of Omnidex indexes, ​and so it is important that all of these indexes can be built quickly. ​ Omnidex also puts a premium on index size.  Omnidex uses compression algorithms to make the indexes as small as possible while still allowing fast access. ​ This allows the index to be quickly retrieved from the disk, and it also insures that memory usage is kept at a minimum. ​ This improves performance by insuring the fastest I/O and insuring the most caching.+Omnidex puts on premium on performance. ​ Applications often use hundreds of Omnidex indexes, so it is important that all of these indexes can be built quickly. ​ Omnidex also puts a premium on index size.  Omnidex uses compression algorithms to make the indexes as small as possible while still allowing fast access. ​ This allows the index to be quickly retrieved from the disk, and it also insures that memory usage is kept at a minimum. ​ This in turn improves performance by insuring the fastest I/O and insuring the most caching.
  
-The help insure ​performance,​ administrators should maintain the Omnidex indexes on disk drives that are regularly defragmented,​ and have plenty of extra space. ​ It is also valuable to isolate Omnidex indexes onto a separate disk drive, and even a separate disk controller, than the data files and the temporary file directory. ​ In an ideal world, each of these three kinds of file would have their own drive and their own disk controller to minimize disk contention.  ​+To aid performance,​ administrators should maintain the Omnidex indexes on disk drives that are regularly defragmented,​ and have plenty of extra space. ​ It is also valuable to isolate Omnidex indexes onto a separate disk drive, and even a separate disk controller, than the data files and the temporary file directory. ​ In an ideal world, each of these three kinds of file would have their own drive and their own disk controller to minimize disk contention.  ​ 
 + 
 +For large databases, administrators should also insure that indexes are not loaded at the same time queries are being performed. ​ These two operations compete with each other such that both will go quite slowly. ​ Administrators should either load indexes on a separate server, or should avoid queries while the indexes are being loaded.
  
 === Predicting Indexing Time === === Predicting Indexing Time ===
  
-(Number ​of rows) + (Number of prejoined ​rows) + (Number of keywords) +It is possible to get a rough estimate ​of the time required for indexing a table. ​ This is only a rough estimate, as there are many variables. ​ Chief among the variables are: 
--------------------------------------------------------------------- + 
-                    1 billion keywords per hour+  * **The speed of the server**. ​ Omnidex indexing benefits from fast CPU's and fast I/O channels. 
 +  * **The load on the CPU**. ​ Omnidex benefits from having unrestricted access to one core for each concurrent table being indexed, plus one unrestricted core available to the operating system for managing file caching. 
 +  * **The load on the I/O channel**. ​ Omnidex benefits from having access to the data files, the index files and the temporary directory using separate, uncongested I/O channels.  
 +  * **The speed of the underlying database**. ​ Omnidex is dependent on access to the underlying data.  Raw data files usually provide faster performance than even a well-tuned relational database. ​ Relational databases that are accessed remotely, such as through an ODBC or a proprietary interface like Oracle'​s SQL*Net, will perform noticeably slower. ​ This impact is most noticed on tables with very few indexes. 
 + 
 +The following formula will provide a rough estimate on the time needed to index a table: 
 + 
 +<​code>​ 
 +For each table ... 
 +  For that table, plus any prejoined tables ... 
 + 
 +    ​(Number of rows) + (Number of keywords ​per row
 +    ----------------------------------------------- 
 +            1 billion keywords per hour 
 + 
 +    Number of rows:      the number of rows in the table 
 +    Number of keywords: ​ the number of parsed words or unparsed values per row. 
 + 
 +</​code>​
  
 +On commonly available hardware, Omnidex has been seen to index as fast as 5 billion keywords per hour, and as slow as 100 million keywords per hour.  After a few tests, you should be able to substitute a rate that is more tailor for your environment.
  
 ====  ====  ====  ==== 
 
Back to top
admin/indexing/creation/performance.1327613432.txt.gz · Last modified: 2016/06/28 22:38 (external edit)