The application domain of the IDIOMS (Intelligent Decision making In On-line Management Systems) machine is combined On-line Transaction Processing (OLTP) and Management Information Services (MIS). The machine is specifically geared to the relational paradigm, and can be considered an SQL engine. It has been designed to support applications where the storage requirement is measured in at least tens of gigabytes of data, and probably hundreds.
OLTP is a mission-critical service, and takes priority over MIS queries. However, within this constraint, it is possible to service MIS queries concurrently. This is a considerable improvement over the typical industrial situation, in which MIS queries are sent to a Data Processing department for processing, where turn-round times may stretch to many weeks. We are working closely with a number of industrial companies, and many of the assumptions on which we base our design are a reflection of current and perceived future requirements.
For MIS queries we choose not to worry about data inconsistency because MIS cannot get an exact picture of the OLTP data unless the whole database is locked (or there is some form of version control). Clearly, we cannot lock the whole database in order to get a consistent snapshot. Given this practical constraint, there is therefore no point in worrying about consistency at the individual tuple level. For all practical purposes, small inconsistencies in MIS queries are irrelevant, although it is vital that the OLTP data is consistent and integral at all times.
Due to the mission critical nature of OLTP processing, there is likely to be a dense index on OLTP data to optimise access to that data. However, IDIOMS does not maintain indices for the MIS system, since MIS queries generally involve range searches on columns, or access columns which are not suitable for indexing. For example, an index on gender is not likely to be very useful, and a column containing age data is likely to be accessed by range for which an index is useless. Furthermore, the cost of maintaining indices can be high. The IDIOMS machine therefore scans a table partition serially identifying the rows that satisfy a query predicate. Once the selected rows have been identified an in-memory sort can be undertaken on particular columns so that subsequent processing of relational operations can be more efficiently undertaken. The in-memory sort mechanism also permits the elimination of duplicates (i.e. the DISTINCT operation).
Initial research has shown that over a set of typical MIS queries only a few columns have any associated selection predicates. Clearly, it is these columns which should be partitioned. The partitioning (or indexing) of other columns serves no purpose, since if there is no selection predicate, all rows are required. Of course, when discussing selection predicates, we refer to the underlying logical query tree, rather than the superficial SQL query.
The underlying design principle of the IDIOMS machine is that of taking a parsed query tree and using it to construct a dataflow graph where the nodes specify the operations to be undertaken. This is shown in section 6. [KERR91a] contains full details of the design of the IDIOMS machine. The basic architecture consists of storage processors and associated disks, relational processors, and a communications ring. Storage processors (S) are of two types, those that hold OLTP data, and those that hold MIS data, e.g. summary tables. OLTP disks are connected to transaction processors, which service OLTP queries. Each table or partition that is stored on disk has an associated Table Handler (TH) that implements SQL statements (select, update, insert, delete). We also have specific processes to deal with referential constraint processing and selections that refer to a column which has a unique constraint. Each Relational (R) processor has two associated input buffers, labelled BO and B1, and an associated Insert (1) processor which is used to send intermediate results back to the communications ring. Figure 8 is a schematic of the machine which shows just those parts pertaining to the flow of data for MIS queries.
A simple concurrency control strategy is used, which permits more than one SELECT query on a table partition, and provided that there is sufficient relational processing power available, this should not lead to degradation of response times for individual queries. Only one INSERT, UPDATE or DELETE operation is permitted at one time on each partition. Thus we implement a table partition locking strategy, which for an MIS based system is not unreasonable. Further research is required to implement a row level or column level locking strategy in a shared-nothing parallel database machine.