Once the database is partitioned into grid nodes, the nodes must be distributed to grid servers. Some Omnidex Grids reside on a single server, such as a large, multi-core machine with a RAID array or a SAN. Other Omnidex Grids are distributed to multiple servers, each of which can have their own multi-core processors and independent disk drives. There are several questions that should be answered to determine the distribution plan.
Many Omnidex customers use Omnidex Snapshots for their grids. An Omnidex Snapshot is a physical copy of the underlying data stored in a choice of different portable file formats, such as fixed-length records or comma delimited records. The excellent performance and high portability of Omnidex Snapshots make them an ideal vehicle for distributing a database across a grid.
Other customers logically partition their relational databases through the use of relational views. Each partition is supported by a relational view that narrows the table to specific rows based on the partition qualifier. These grids are usually maintained on a single server since it must remain tethered to the relational database.
Many Omnidex customers have requirements for scalability, replication or redundancy. These requirements are usually met by having a series of Omnidex Grids, sometimes with grid nodes shared between multiple grid controllers. These strategies are generally easier to deploy with Omnidex Snapshots since they are easily copied to multiple servers.
Two types of queries require large network transfers from the grid node on a grid server to the grid controller. Queries that return many rows require that the grid server transfer those rows to the grid controller. If the grid is distributed across multiple grid servers, this will put an increased load on the network. Similarly, if a number of queries perform distinct counts or grouped aggregations on high-cardinality columns that share values between nodes (described earlier), this will put an increased load on the network. These situations benefit from a grid whose nodes are on one server with multiple processors and high-speed disk drives or a SAN.
If these types of queries are not expected, or if their performance falls within requirements, then there are advantages to spreading the load between multiple servers. A multi-core server with plenty of memory allows for concurrent processing in the CPU, but these servers are often bottlenecked by disk I/O. If many concurrent processors all try to read data from the same disk drives, the disk drives may have difficulty keeping up. The disk drives become the weakest link in the chain. This can be somewhat alleviated by distributing nodes across multiple disk controllers or a SAN, but these solutions often don’t perform as well as distributing the nodes to separated hardware dedicated to processing and caching those specific nodes.
The application is unaware of the distribution plan, just like it is unaware of the partitioning scheme. It is possible to alter the distribution plan if needed without changing the application.
Now that you have a partitioning scheme and a distribution plan, it is time to create the grid.