One consideration is how to keep the number of message hops to a minimum. The physical location of relational processing resource is important and we want tuples to be sent to physically adjacent sites if possible. This raises the question of whether, on completion of an operation, immediate (possible) re-allocation of processors is in fact the best option. It could be that it is better to keep clusters of processors together.
An access plan is generated using heuristics. It may be that derived tables become very large or very small, and therefore it might be appropriate to re-evaluate the execution strategy, and modify the remainder of the access plan. In other words, dynamic access plan modification allows changes to operator parallelism whilst a query is being evaluated.
For each MIS query, we could either allocate a fixed number of (available) relational processors (static allocation), or take resource as needed at each stage of the evaluation (dynamic allocation). Our initial work uses static allocation, because although dynamic allocation allows better utilisation of resources, it is considerably more complex than static allocation, and we can envisage situations in which evaluation becomes deadlocked with the dynamic strategy. Additionally, processor and memory costs are low, and the typical application environment is not cost sensitive at this level. Further work is required in this area.
Currently, our table design only allows for a single data stream for each MIS query to be sent to the Front End or the MIS disks. For implementations that can accept multiple streams this is clearly a limitation, and we are working on an enhanced table design to deal with this.
The collection of statistics to aid the database administrator to choose an optimal partitioning strategy is one of our goals. We envisage that once the requirements for the statistics have been established, automatic collection and evaluation could take place. Should re-partitioning be required, messages to this effect could be generated. We do not imagine that (in the near future) re-partitioning could be an automatic process, because partitioning requires semantic knowledge of the business system within which the database operates. By this we mean that, for example, although metrics could be used to derive an allocation strategy that "optimised" all queries, it might be a business decision that some have priority and others do not.
If a particular column is used for partitioning, then additional statistics will be held on it. Specifically, we aim to investigate the relationship between query predicates and partitioning values. It may be that small adjustments to partitioning values could lead to large savings in response time. For example, let us say that a certain class of query usually contains the selection predicate "age < = 21". If our partitioning strategy uses the ranges
age < 21
age > = 21 AND age < 50
age > =50
then two partitions must be scanned. By changing the partitioning range (e.g. age < 22), we need only access one partition. This is possible, because, as discussed previously, we do not envisage having a large number of partitions for each partitioning column.