Appendix 6
Glossary of terms and abbreviations
list of general acronyms
For definitions of the following acronyms refer to the glossary.
- AMP
-
- Access Module Processor, Teradata's name for a storage node.
- ATM
-
- Automated Teller Machine.
- DBA
-
- Database Administrator.
- DBM
-
- DataBase Machine.
- DRAT
-
- Dynamically Reconfigurable Array of Transputers, a forerunner of IDIOMS (J. Kerridge, "A proposal for a dynamically reconfigurable array of transputers to support database applications", Proc. 7th technical occam user group, Grenoble 1987, IOS, Amsterdam).
- DSDL
-
- Data Storage Description Language, a language for defining the allocation of data to storage media.
- IDIOMS
-
- Intelligent Decision making In Online Management Systems.
- MIS
-
- Management Information Service.
- OLTP
-
- Online Transaction Processing.
- ORQ
-
- Orthogonal Range Query.
- POS
-
- Point of Sale.
- SQL
-
- Structured Query Language.
- SKI
-
- Single Key Index.
- TP
-
- Transaction Processing.
- TSB
-
- Trustee Savings Bank, collaborators in the IDIOMS project.
- VRP
-
- Value Range Partitioning.
- access frequency
- the frequency with which a data object is accessed.
- access plan
- used in this thesis to mean the graph which represents the implementation of a logical query tree e.g. a single logical relational operation may be carried out on more than one processor; the query tree shows a single node for the process, the access plan shown all the replicated instances of this.
- associative memory device
- a hardware architecture which retrieves data on the basis of the value of data items, rather than by using data location pointers. Occasionally the term is used by researchers to refer to software simulation of this.
- automated teller machine
- a machine which provides banking services, typically for withdrawal of funds by the user.
- B+-tree
- a tree structure consisting of an index set and a sequence set. The index set is a tree structured index (a B-tree), and the sequence set is a list of pointers to data buckets.
- bucket
- the unit of transfer between disc storage and main memory, also known as data page.
- cell
- a) an alternative name for a (logical) data partition b) a hardware unit of a cellular database machine.
- cellular database machine
- consists of a set of cells, each of which is composed of memory and a processor. For each track of a rotating (used in a loose sense to mean disc, drum, magnetic bubble memory) device there is a cell. If the whole database is stored on a set of these cells, then it can be searched in a single revolution. Data is physically accessed by value, rather than address. An example is the RAP machine of Ozkarahan.
- clustered/clustering
- the placement of records physically close together on disc, based (usually) on some attribute value(s).
- condition clause
- see predicate clause.
- data allocation
- the collective term for partitioning and placement of data on storage devices.
- data dictionary
- contains metadata—data about data, for example, database and query statistics, data partitioning and placement information, the machine state.
- dataflow
- in multi-processor database terms is used to describe a pipelined producer-consumer system. As soon as data is produced it is (usually) sent to the next process to be further processed. The use of the word should not be confused with the term as it is used to describe dataflow computer architectures (i.e. a computer which does not use an Instruction Pointer).
- dataflow graph
- (in multi-processor database terminology) a graph where the nodes represent processes and the arcs represent transfers of data between processes. Often used synonymously with query tree.
- data fragmentation
- see data partitioning.
- data migration
- the transfer of a data object from one partition to another.
- data mining
- the scanning of large amounts of data in order to extract information e.g. statistical information on ATM usage.
- data partitioning
- partitioning of (a possibly notional) data file or partition into a number of smaller files or partitions.
- data placement
- placement of data partitions on storage media.
- data skew
- uneven distribution of data values from the data domain.
- data storage description language
- a language which defines data partitioning and placement.
- database administrator
- person responsible for technical administration of a database.
- database machine
- specialised software or hardware configuration designed to manage a database system.
- database management system
- "… can be defined as a software package that provides all data management facilities for database creation, retrieval, manipulation, and maintenance of databases". Su, Database Computers, p20.
- declustered/declustering
- an alternative name for horizontally partitioned/ horizontal partitioning.
- DegDecl
- is the degree of declustering, the Bubba team's term for the number of processors over which data is horizontally partitioned.
- directory
- a) the Grid File structure which dynamically maps logical cells to physical data buckets b) in general, a structure which maps logical partitions to physical media c) a structure defining the location of data objects.
- distributed database
- "implies several computers, each one with a DBMS managing data stored on attached permanent storage devices; a general or local network…; and some facilities to manage data across the network". C. Esculier, Distributed Databases: state of the art, Computer Bulletin vol. 3, page 3, June 1987.
- foreign key
- is an attribute (or combination) in one relation whose values are required to match those of the primary key in some other relation.
- hash partitioning
- the partitioning of data dependent upon the value of the result of applying a randomising (i.e. hash) function to a data value or set of values of a record.
- heat
- an alternative name for access frequency.
- head-per-track
- an architecture in which each track of a rotating device has its own read/write head.
- horizontal partitioning
- the fragmentation of a (possibly notional) single file by placing complete records in different partitions.
- management information service
- used in this thesis to mean those parts of a business which use or require access to large amounts of data (e.g. planning, product targeting, analysis of consumer trends, direct marketing).
- multikey hash function
- a hash function based on many fields of a record.
- online transaction processing
- the processing of user queries (transactions) in an interactive manner (i.e. with fast response). See transaction processing.
- orthogonal range query
- a range query in more than one dimension.
- partition cardinality
- the number of records in a data partition.
- point query
- used in this thesis to mean a query
which references a single point from the domain of an attribute. The term should not be confused with single hit which refers to a query accessing just a single record, although if an attribute has unique values the terms are equivalent.
- point of sale terminal
- is a machine which, in conjunction with a special card, debits a user's bank account directly to the value of the goods purchased.
- predicate
- the condition under which a statement is either true or false, thus used to specify conditions under which data is required.
- predicate clause
- also known as condition clause. That part of a query
which contains or defines the query predicate (e.g. the SQL WHERE statement).
- primary key
- a unique identifier composed of one or more attributes.
- processing element
- a processor and associated local memory.
- query
- used in this thesis to refer to MIS queries, not OLTP transactions.
- query selectivity
- is the fraction of records referenced by a query (i.e. satisfying a condition), see query size.
- query size
- used in this thesis to mean the fraction of records referenced by a range query in a given dimension.
- range query
- a query
which accesses data based on a range of data values specified in the predicate clause.
- range partitioned/partitioning
- placement of data in partitions based upon the value of one or more attributes, each partition containing data within a given range of values.
- rotational latency
- also known as rotational delay. The time taken for a disc to rotate from the current position (on the required track) to the position containing the first required data block.
- round robin
- the placement of records on disc (or in a partition) in a (notionally) sequential manner.
- scan time
- the time taken to access data by means of a sequential scan of the data, rather than by the use of an index to specify which data pages to access.
- schema
- the view, or definition, or description of data.
- shared-disc
- a generic multi-processor query
architecture in which all of the discs are accessible by all the processors, but memory is local to each of the processors.
- shared-everything
- a generic multi-processor database machine architecture in which all discs and memory are directly accessible by all processors.
- shared-nothing
- a generic multi-processor database machine architecture in which each processor has its own disc(s) and memory, the only shared resource being the interconnection network.
- seek time
- time needed to reposition the disc arm from its current track to the required track.
- single key index
- Liou and Yao's term for an index on the primary key [LIOU77].
- table handler
- the name of the process which deals with file handling in the IDIOMS machine; essentially a small distributed file handler.
- transaction
- used in this thesis to refer to OLTP transactions cf. query.
- transaction processing
- the processing of small fixed queries which access one or perhaps a few records in a database.
- transaction processor
- the set of processes in IDIOMS that deals with the processing of OLTP transactions.
- transputer
- a microprocessor with built-in communications links which can operate concurrently with the main processor, developed and produced by INMOS.
- value range partitioning
- the round robin placement of records from each of a set of (possibly notional) partitions, the partitions themselves containing records which were placed (possibly notionally) using range partitioning.
- vertical partitioning
- the fragmentation of a file into subfiles by splitting each record into two or more subrecords, and placing these subrecords in different partitions.