Action disabled: source

Administration: Administration Basics

Building Omnidex Applications

Design

As with most applications, this is the most important step of the entire application. During the design step, key decisions are made about how to approach Omnidex queries and updates in the application. These decisions can lead to high-performing applications, and they can also lead to poor performance. It is worth spending the needed time on this step.

This article focuses on the design steps for an Omnidex application. It does not focus on the steps for application design in general. These steps should be interwoven with the design steps needed for the overall application.

Designing an Omnidex application has three important steps:

1. Deciding on an Architecture

An Omnidex architecture is a plan for where the various components of the application will reside and how they will talk to each other. In the simplest applications, a single database resides on a single machine accessed using a single approach. There may not even be a web server or an application server. In more complex applications, there can be multiple web servers, accessing multiple application servers written in multiple languages, accessing multiple databases located on multiple data servers.

Omnidex easily integrates into both simple and complex architectures. Omnidex also adds three additional concepts that affect the application architecture: Omnidex Snapshots, Omnidex Grids and Omnidex Index Servers. These concepts add a great deal of flexibility to the application architecture and are worth considering before embarking on a full application.

Development Languages

Typically, Omnidex is treated as a database layer, accessed through ODBC or JDBC. Omnidex in turn will index and access the underlying database layer as needed. The application can also maintain separate connections to the underlying database in scenarios where Omnidex indexing is not needed.

Relational Databases and Raw Data Files

Omnidex supports the major relational databases, including Oracle, SQL Server and MySQL, and also supports other databases through a generic ODBC interface. Many businesses also use Omnidex as a SQL engine for raw data files. Omnidex can treat raw data files as tables, allowing complex SQL statements to be fully optimized as though the raw data files were a relational database. This approach can avoid the overhead of loading into a relational database and the expense of relational database licenses.

Omnidex Snapshots

In brief, Omnidex Snapshots are simple copies of the database stored in flat files that can be indexed and queried as independent databases. Omnidex Snapshots are quite convenient because they can be heavily indexed and then easily distributed to different servers. Many businesses direct much of their query traffic to Omnidex Snapshots, gaining performance in their application while reducing the load on their relational database. This assures the highest performance without requiring changes to the data model.

Omnidex Grids

Omnidex Grids allow large databases to be partitioned to improve performance and scalability. Databases with over 20 million rows are candidates for Omnidex Grids, and there are several strategies for distributing the nodes of the grid to achieve the greatest flexibility and performance. Omnidex Grids are also a common way to incorporate large amounts of new data. Large volumes of data can be added to a new node in the grid without having to impact the entire database.

Omnidex Index Servers

Omnidex Index Servers are similar to Omnidex Snapshots, but they only distribute the Omnidex indexes rather than a full copy of the database. In many applications, Omnidex fulfills most queries while accessing only the Omnidex indexes. This is especially true in applications that rely heavily on obtaining counts or aggregations. Applications may direct these types of queries to Omnidex Index Servers, improving performance and easing the load on the data server.

Architects would benefit from a deeper understanding of these concepts, and are encouraged to read the article on Omnidex Architecture.

2. Understanding the Data Model

A good indexing strategy requires an understanding of the data model. Specifically, it is important to understand the database schema, the table and column cardinalities and the pattern of queries.

Database Schema

A basic database schema shows all the tables, their respective columns and datatypes, and their respective primary and foreign constraints. This schema provides an understanding of the table relationships and will be used to optimize table joins. The schema also provides a list of columns for identifying likely candidates for indexing.

Database schemas can be obtained from the underlying relational database, or can be obtained using the Omnidex Administrator program.

Table and Column Cardinalities

Table cardinalities reflect the number of rows in each table. Column cardinalities reflect the number of distinct values in each column. For example, a table containing name and addresses about people may have a table cardinality of 300 million, meaning that there are 300 million rows in the table. The GENDER column may have a column cardinality of two, meaning that there are two distinct values in the column. These cardinalities are important to predicting query performance and to tuning queries.

Cardinalities can be obtained from the underlying relational database, or can be obtained using the Omnidex Administrator after an UPDATE STATISTICS statement has been completed on a table.

Sample SQL Queries

Sample SQL queries show the patterns of queries that are expected in an application. These query patterns are key to determining an indexing strategy. When an application is first prototyped, administrators often index everything using Omnidex, but this can result in over-indexing and it doesn't insure that the correct indexing options are used on each column. It also does not insure that table joins and aggregations will be properly optimized. A review of the query patterns insures the best performance for the least cost.

Sample queries can be logged in most relational databases. Omnidex can also log queries; however, that requires that Omnidex is already integrated into the application.

When analyzing SQL queries, there are several patterns to recognize:

  • Table Joins - It is important to identify which tables are being accessed and how they are being joined together. Omnidex frequently optimizes table joins. Also note any nested queries, as these are similar to table joins.
  • Criteria Columns - Columns found in the WHERE clause of a SQL statement usually correlate with Omnidex indexes. The literal values in the criteria are not particularly important unless they contain LIKE operators and wildcards. Note any columns that would benefit from textual indexing.
  • Aggregations - Queries that use the COUNT, SUM, AVERAGE, MIN or MAX functions indicate opportunities for optimization in Omnidex. Note the columns being aggregated and the columns referenced in the GROUP BY clause.
  • Ordering - Queries that use ORDER BY clauses indicate opportunities for optimization in Omnidex. Note the sequence of columns referenced in ascending ORDER BY clauses.
  • Select Items - In some situations, Omnidex may fulfill the query solely from the Omnidex indexes. Select items can be drawn from the indexes, eliminating the need to go to the underlying data. Note queries that return just a few columns.

3. Designing an Indexing Strategy

An Omnidex indexing strategy is quite different from a relational database indexing strategy. With relational databases, it is customary to identify a few commonly-used columns to index. The relational database uses those indexes to retrieve portions of underlying data, thereby reducing the amount of disk access. At the same time, these indexes may help resolve some of the criteria, some of the aggregations and some of the ordering. Once this subset of data has been retrieved from disk, the relational database then processes the data to fulfill the remaining aspects of the query.

Omnidex takes a different approach. Omnidex encourages administrators to index ALL of the elements of the query, and it will use many indexes at the same time to satisfy a query. A query may have 20 pieces of criteria, 20 table joins and a complex aggregation; yet with proper indexing, Omnidex can satisfy the query without ever going to the underlying data. This dramatically reduces the amount of disk access, leading to tremendous increases in performance.

Omnidex can afford to take this approach because it builds indexes quickly and batches multiple indexes together. Omnidex also compresses its indexes, resulting in much less disk space. It is fairly common to have over a hundred Omnidex indexes on a table, yet the indexes are built more quickly and take less disk space than just a few indexes in the relational database.

Designing an indexing strategy requires reviewing the query patterns as follows:

  • Table Joins - Table joins are usually optimized by indexing the join columns. In the case of parent-child relationships, the foreign key in the child is usually indexed. Some parent-child relationships can even be pre-joined, meaning that the index is created with the internal join information embedded. Nested queries are usually optimized in a fashion similar to table joins. The inner query is optimized independently, and then the outer query's corresponding criteria column is indexed.
  • Criteria Columns - Most administrators index all of the columns used in criteria. Omnidex's SQL engine can process criteria that is not indexed, but it is affordable enough that administrators usually start by indexing all of the columns used in criteria. Each index can also have options, such as textual indexes and case sensitivity.
  • Aggregations - Simple counts and distinct counts do not require any special indexing beyond indexing the criteria columns and optimizing the table joins. More complex aggregations, such as sums, averages, mins, maxs, and grouped distinct counts require an index containing the group by columns and the aggregated columns in any order. Multi-column indexes may be needed for high-cardinality group by columns as well. When there are a lot of combinations of columns for aggregations, larger indexes can be created that satisfy multiple scenarios.
  • Ordering - Ascending order by clauses are optimized with an index containing the order by columns. These indexes often overlap those created to optimize aggregations.
  • Select Items - The columns that are returned from a SQL statement can often come directly from the Omnidex indexes. This is automatically available for most primary keys, and it is also common with counts and aggregations. This can also extend to data columns as long as all columns are found in a single index.

Administators would benefit from a deeper understanding of the Omnidex indexing features, and are encouraged to read the article on Omnidex Indexing.

Additional Resources

See also:

 
Back to top
admin/admin/applications/design.txt · Last modified: 2016/06/28 22:38 (external edit)