Exploiting Relational Database Technology in GIS

Peter Batty, July 1990

(For a scanned PDF version of the original article, 1.26MB, click here)

All systems for managing data face common problems such as backup, recovery, auditing, security, data integrity and concurrent update. Other challenges include the ability to share data easily between applications and to distribute data across several computers, while continuing to manage the problems already mentioned. Geographic information systems are no exception, and need to tackle all these issues.

Standard relational database management systems provide many features to help solve the issues mentioned so far. This article describes how the IBM geoManager product approaches these issues by storing all its geographic data in a standard relational database product in order to take advantage of such features. Areas in which standard relational database functions need to be extended are highlighted, and the way in which geoManager does this is explained. The performance implications of storing all data in the relational database are discussed.

An important distinction is made between the storage and management of geographic data and the manipulation and analysis of geographic data, which needs to be made when considering the applicability of relational database technology to GIS.

Introduction

This article looks at some of the issues involved in implementing a large corporate database containing geographic information. First a brief categorisation of 'non-database approaches' to storing geographic data is given. A description follows of the way that the IBM geoManager product approaches geographic data management using a standard relational database management system.

The article then examines a number of database issues, discussing the ways in which standard relational database functions can help a GIS address these, and also highlighting where extra functions are required for a GIS. In each case the approach taken by geoManager is compared with a non-database approach. Finally the performance implications of taking the geoManager approach are discussed, and the arguments presented are summarised in a conclusion.

Throughout this article the examples given refer to geoManager and DB2, IBM's relational database product which runs under the MVS operating system. The first release of geoManager also runs under the VM operating system using the SQL/DS database product. The arguments given are by no means specific to these database products but apply in general to any relational database management systems used in the way described in this article.

Alternative approaches to storing geographic data

Non-database approaches

Most Geographic Information Systems store graphics and some or all geographic information in specialised data structures, and store alphanumeric attribute information in a standard database management system, often a relational one.

Various reasons are cited for this approach, the most important of which is generally agreed to be performance. Relational databases are designed primarily for managing alphanumeric data, not for rapid display of graphical information or certain kinds of complex processing such as some types of geographical analysis, so there are issues to be addressed in this area, which will be discussed later.

Clearly these generalised statements cover a very wide range of approaches to storage of geographic information. However, for the purposes of this article the important fact which distinguishes these approaches is that they do not store all aspects of their geographic data in a single database management system. In the rest of this article, these will be referred to as non-database approaches.

The GFIS database approach

IBM's Geographic Facilities Information System (GFIS) takes an object-based approach to storing geographic information, rather than a map-based approach. (Note that the term 'object-based', as used in this article, is not synonymous with the term 'object-oriented', which has a quite specific meaning in computer science. With an object-oriented system relationships between objects are stored as an integral part of the database. With the object-based approach referred to in this article, relationships between objects are maintained by applications using software tools which are provided.) All the information for an object, including alphanumeric attributes, geographical location, network relationships and graphical representation, is stored together as a single logical object in the relational database. The GFIS database has no system of tiling or pre-defined maps, just a continuous set of objects. The appropriate tables are created and managed by the geoManager product. Attribute fields stored in the database for each object include the following:

• Spatial index key - used to cluster data efficiently and optimise area retrievals (discussed in more detail in the section on performance).

• Object extents (min x, min y, max x, max y) - used for area retrieval.

• Absolute points and point connectors - network nodes which are used for network retrieval.

• Detailed graphics (for example the co-ordinates of a multi-point line) are stored in a compact binary format in a variable length character string.

• Alphanumeric attributes.

Geographic analysis and graphical display is not done directly against the geoManager data structure. An important distinction is drawn in this article between the storage and management of geographic data and the manipulation and analysis of geographic data. When looking at the manipulation and analysis of geographic data, different data structures are appropriate for different kinds of application. For example, the data structure used by IBM's Graphics Program Generator product (GPG) is extremely efficient for complex network analysis, while a raster or quadtree data structure such as that used by Tydac Technologies' SPANS™ is much more efficient than a vector data structure for overlaying several sets of polygons. However, it is highly desirable to have some means of storing and managing geographic data in a way which allows it to be accessed by multiple geographic and non-geographic applications within a consistent environment. The geoManager approach is intended to provide an integrated corporate database which fulfils this role.

When a user wants to work with some geographic data, he makes a request to geoManager specifying the geographic area of interest and the objects required within that area (for example buildings, roads and sewers). It is possible for geoManager to extract the requested objects from the database in one of the following three formats:

1. The Interface Format File (IFF), which contains graphical, alphanumeric and relationship information, and is used to pass data to GPG or other GIS packages for full function GIS applications.

2. The Open Format File, which contains just selected alphanumeric data, and can be used to pass data to other applications such as report generators or spreadsheets.

3. The geoManager Graphic Analysis Format, which contains just graphical data, together with a database key for each object which refers to the relevant attributes in the database. This is used by the geoManager Graphic Analysis application which provides functions to display, pan and zoom these graphics on any standard business graphics terminal, as well as the capability of viewing and updating alphanumeric data and performing various types of simple analysis and reporting.

When an IFF extract is requested, the user must specify whether any of the retrieved objects may be updated. If so then geoManager puts locks on the appropriate objects so that they cannot be updated by other users. This is discussed in more detail in the section on concurrent update. Any changes made in the GIS application are passed back to geoManager in an IFF, and this updates the database and releases the locks.

Database issues

Backup and recovery

Regular backups need to be made of any database system in order to be able to restore the database in the event of a major system failure. The larger the database, the longer it will take to back up, and therefore there may be a reasonable length of time between backups. In order to provide the capability of restoring a system to a state which is more recent than that of most recent full backup, database systems such as DB2 provide the ability to log all transactions against the database. This transaction log can be saved on tape much more regularly than a full backup can be done, since the data volumes involved are much smaller. When restoring a database from the backup tapes, the most recent full backup of the database is restored, and the transaction log can then be applied to this to bring it to the most up to date state possible. DB2 also provides facilities to back up individual tables, or parts of large tables, so a full backup can be done more quickly by running concurrent jobs to back up different parts of the database.

Since geoManager stores all GFIS data in DB2, it can exploit all these capabilities automatically. However, a system which stored its graphics in a separate database would have to write a specific logging mechanism for graphical transactions (which included a mechanism for synchronising this log with the log for alphanumeric transactions), in order to provide the same recovery capability.

Security

In a corporate GIS with many users, it is vital to be able to control access to the GIS data. The three main types of access to data are update, read only, or no access. These categories may be subdivided further. For example, geoManager provides three types of update privilege: the ability to create entirely new data; the ability to make structural updates to existing data (i.e. change graphics or network connectivity); and the ability to update attribute data only. It is necessary to be able to grant these privileges at an object level, since some users may be permitted to update or view certain objects but not others. This requirement applies equally to both graphical and alphanumeric aspects of the data.

DB2 provides standard facilities to provide read, update, insert and delete authority on any table for any user. Since the three types of update required by geoManager do not correspond exactly to these DB2 update authorities, the geoManager application implements its own security mechanism at an object level. This security mechanism is complemented by the standard DB2 access control mechanism, which controls access to the data by non-geoManager users. In this way privileges for all users can be controlled in the appropriate way.

It is generally easier to provide a flexible approach to security using an object-based system rather than a map-based system. If the basic unit of information being handled is a map, or a layer of a map, then it is likely to be more difficult to restrict access to certain objects within a layer than it is if objects are the basic unit of information being handled.

Data integrity

There are many data integrity issues which need to be considered in a corporate GIS. This section looks at some of these issues, and the following section looks at concurrent update, which is one particular aspect of data integrity which is particularly complicated in a GIS compared to most applications.

A fundamental issue is ensuring that graphics and alphanumeric data are maintained in synchronization with each other. At the very simplest level this means ensuring that when a graphic record is deleted the corresponding alphanumeric record is deleted, for example. This is not an issue in GFIS because of its integrated object-based approach. However, in a system where alphanumeric and graphical data are stored in separate databases, the system must ensure that this sort of synchronization is maintained.

Synchronisation at this simple level should be straightforward to achieve, but there are far more subtle issues in the same area. Suppose that an operation has been carried out which changes both alphanumeric and graphical data for an object, and that in the process of saving the results of the operation in the database(s), the transaction fails. If at this stage the graphical change has been saved but the alphanumeric change has not, then the two are out of synchronisation. The system must be capable of recognising this, and either rolling back the graphical change which has been made or ensuring the alphanumeric change is also made. It would be extremely difficult to provide this level of data integrity in a system where the graphics are stored in their own data structure. However, DB2 provides function which geoManager exploits to provide exactly this sort of capability. When a group of related transactions are being carried out on the database, they will not be committed until they have all been completed successfully. If the update process fails at any stage then all the updates which have been made will be rolled back so that everything remains synchronised.

Data integrity issues also arise if relationships exist between objects which have been extracted from the database for update and other objects which have not been extracted. The geoManager system handles this by marking such objects as 'partially retrieved', which restricts the type of update operations which can be done on these objects in GPG. For example, if a pipe was extracted which crossed the selected area, and this was connected to pipes which were outside the selected area and therefore not extracted, then that pipe would be marked as partially retrieved. GPG would then not allow the ends of that pipe to be moved, for example, since that would cause the integrity of the network connectivity to be lost.

Concurrent update

A key issue in a large database with many users is the management of concurrent update problems. This is an area where GIS poses some more complex problems than simple alphanumeric applications. The native locking mechanism in standard relational database systems is based on the 'short transaction'. This means that if an application tries to access a record which is locked, the database management system will wait until the lock is released and then return the data to the application. The underlying assumption is that the application which has locked the data will only do so for a short time, a few seconds at most. However, a record which is extracted from the GFIS database for update may be checked out for hours or even days, so this approach is not appropriate.

Issues relating to such 'long transactions' are discussed by Newell and Easterfield (1990). They propose a solution based on version management, which is essentially an 'optimistic approach' which allows multiple users to concurrently update their own copies of a set of data and store both these versions in the database, on the assumption that in the majority of cases there will probably not be any conflict between versions. This approach allows great flexibility, but also has some risks. The main risk, as they acknowledge, is that the amount of work submitted before a conflict is discovered could be large. One of the implicit assumptions underlying optimistic approaches to concurrency control is that it should be relatively easy to modify and re-submit an update job if a conflict occurs. In GIS this is not generally the case, and weeks' work could be wasted. In the worst case it could be that important decisions had been made on the basis of inconsistent information. This may be an unacceptable risk in many organisations.

The geoManager locking mechanism

A different approach is taken by geoManager, which provides some of the features of version management but which does not incur the risks just mentioned. Whenever a user requests some data from geoManager to be put into an IFF, he is asked whether he wishes to update any of this data. If this is the case, then he is asked to indicate which objects he wishes to update. The specified objects within the extraction area are then marked by geoManager as checked out for update (a special status attribute in each object, which is described in more detail below, is used for this purpose). The user is asked to give details of the nature of his or her update for the benefit of other users. Other users are still free to extract this data in read only mode, but are not allowed to extract any objects for update which are already checked out. If an attempt is made to extract an object which is already checked out for update, then the requestor can find out who has checked the object out and obtain any information which was entered about the nature of the update. The requestor can then contact the person who has checked out the objects if necessary.

The fact that the database is object-based means that the minimum possible amount of data can be locked in order to try to minimise contention. In order to further reduce the possibility of contention, geoManager provides the capability of carrying out short geographic transactions using geoManager Graphic Analysis. When an extract is made to Graphic Analysis Format, only graphics are extracted, together with a key for each object which refers to the appropriate set of attributes in the database. No objects are locked at this stage. Using Graphic Analysis the user is restricted to performing attribute update operations, which can be handled as short transactions. A user can point to an object on the screen whose attributes are to be updated. The attributes for this object are then fetched from the database and the object is marked as checked out (assuming of course that it is not already checked out by someone else, in which case the user can view the attributes but not update them). The user can then make any updates and these are returned to the database and the object is checked in again. This ability to handle both long and short transactions is extremely useful in minimising contention. Furthermore, the same checking out mechanism can also be used by other applications to enable them to directly update data stored in geoManager, while ensuring that data integrity is maintained (this is discussed in more detail in the section on application integration).

Version management

The GFIS approach also allows a degree of version management at two levels. First, a GPG user who has extracted an IFF for update can save a copy of this data in GPG workspace format in his own personal storage. He can then create various alternative designs and save each of those in separate workspaces. He could go on and create further alternatives from any of these alternatives, creating a set of workspaces which correspond conceptually to the 'version tree' described by Newell and Easterfield. Each extra workspace duplicates data, but this will only be temporary. These workspaces are all accessible only by this user, or by any other user he chooses to pass them to. The only information the database has about this transaction at this stage is the set of objects which were originally checked out by the user. When the user decides which alternative should be passed back to the database he creates an IFF from the appropriate workspace and passes it back to geoManager.

It is at the stage of updating the database that the second level of version management can be used. When database updates are made they can either be done immediately, or they can be denoted as 'pending'. For example, if a new housing estate had been designed but would not be completed for a year then the updates showing the layout of the estate could be put into the database as pending. When pending work is put into the database, new objects and any modifications to old objects are stored in addition to all the old objects in that area. When a user extracts data from geoManager he can choose to see only current objects, only pending objects, or current and pending objects together (in which case any current objects which were updated by the pending job will be extracted in their new form). The user who put the pending work into the database can subsequently either cancel the pending work, in which case all the original objects are left unchanged, or commit the pending work, in which case the pending work becomes current and any changes to the original objects are made.

This user can also make modifications to the pending work, but only a single level of pending work is maintained in the database. In this way geoManager provides a form of version management at a database level which is simpler than that proposed by Newell and Easterfield but easier to manage.

Pending work is marked as such in the database using the same status field which geoManager uses for indicating whether an object is checked out. This field can take the following values:

• ' ' - current

• 'C' - checked out for update

• 'P' - pending

• 'A' - current but affected by pending work

Only current objects (which are unaffected by pending work) can be checked out for update. The status field will then be changed from ' ', to 'C'. If an object is checked back in again directly the status field willbe reset to ' '. If a set of objects is checked back in as pending then updates will be treated as follows. If an object is to be deleted then its status will be set to 'A'. If an object is to be modified then the original object record will have its status set to 'A' and another record will be added with details of the modified object with a status of 'P'. If a new object is to be added then a new record will be inserted with a status of 'P'. Separate tables maintain lists of all the objects which belong to each pending retrieval set. To cancel a pending retrieval set all the objects in that set with status 'P' are deleted, and all those with status 'A' have this changed to ' '. To commit a pending retrieval set, all objects with status 'A' are deleted and all objects with status 'P' have this changed to ' '.

The three types of retrieval mentioned earlier will retrieve objects with the following status values:

• Current objects only - ' ', 'C' or 'A'.

• Pending objects only - 'P'.

• Current and pending objects - ' ', 'C' or 'P'.

Only objects with a status of ' ', can be checked out for update.

Summary

In summary, geoManager provides a method of locking at object level to prevent concurrent update. Contention problems are minimised in the following ways:

• Locking at object level minimises the amount of data which needs to be locked for any transaction.

• Read only access is permitted to locked objects.

• Attribute updates ean be handled as short transactions using geoManager Graphic Analysis.

• When contention does occur, the user who has checked out the objects which have caused the contention can be identified, so that the two users involved can try to resolve the situation if necessary.

In addition, geoManager provides the ability to store pending work in the database alongside current work, so that users can be aware of work which is in progress.

Data sharing and integration between applications

One of the main advantages of a relational database is that it is possible for multiple applications to access common data in a very flexible way. Since all geoManager data is stored in standard DB2 tables it can be easily accessed either by other applications or by flexible query tools such as QMF (Query Management Facility). This makes it possible to query attribute data stored in the GIS from any alphanumeric terminal within an organisation. There are no real issues with regard to other applications reading the geoManager data, but when it comes to updating the data then issues relating to long transactions arise, as was discussed in the section on concurrent update. Since the updates are being done to geographic data, which could be undergoing update by a long transaction via geoManager, it is necessary for the updating applications to recognise the geoManager status field and only update objects which are not checked out for update. Provided this rule is followed it is possible for other applications to directly update attribute data in the geoManager tables. Other applications should not, in general, directly insert records into, or delete records from, tables which are controlled by geoManager. This is because information about an object and its relationships is stored in multiple tables and to ensure data integrity the addition or deletion of objects should be done via geoManager. These sort ot" complex data integrity constraints are known as semantic integrity constraints, which cannot be enforced automatically within the relational database model itself, but must be enforced through applications (see Elmasri and Navathe, 1989). This is the sort of area where object-oriented databases, which are currently the subject of much research, may be able to give the application developer, and database administrator, more freedom in the future, by allowing such semantic integrity constraints to be stored within the database itself as 'rules' or 'methods'.

lf non-GIS applications are going to be doing most of the updating on certain tables then it is possible to leave these tables outside the control of geoManager (but in the same database system) and just access the data as appropriate from geoManager or GPG. For example, one would probably keep a customer database outside geoManager but refer to this by storing customer reference numbers in appropriate objects within geoManager.

This is one area where a non-database approach which used a relational database to store all its alphanumeric data could provide a similar level of integration for alphanumeric data. The issues which have already been mentioned still need to be addressed, such as the management of long transactions. There is a danger if an appropriate mechanism is not put in place to manage these that a non-GIS application could change an attribute value during the course of a long transaction, in a way which would conflict with what the GIS user was doing.

In the area of 'corporate GIS', geoManager Graphic Analysis gives significant benefits by allowing graphical data to be accessed by a very large number of users. Any user with a standard business graphics screen can view the graphics for a selected area, update related alphanumeric data, and perform simple forms of analysis. This type of facility would be much harder to implement if the graphics were not also stored in the database, especially as one starts to move to a distributed environment, as described in the next section.

Figure 2 shows an overview of some applications which could be integrated with GFIS in a typical environment. TSO, IMS, and CICS are all access methods which can access the DB2 database using the SQL query language. QMF (Query Management Facility) is a flexible query and reporting tool which enables users to analyse data using either the SQL or QBE (Query By Example) languages. AS (Application System) is a Decision Support Tool which, in addition to query and reporting features, provides functions in other areas including business planning, financial modelling, business graphics, project control and management and statistical analysis. AS can run directly against the DB2 database, or against a flat file such as the geoManager Open Format. The latter option is particularly useful for doing analysis on data within a specified geographic area using AS. Any maps, showing thematic information if appropriate, which are produced in GPG or geoManager Graphic Analysis, can be output in a standard graphical format which can be imbedded in AS reports if desired. These reports, including graphics, can be circulated to any number of people using standard office products. This is just a brief overview of some of the ways that other standard products can use the GIS data without having to write any special interfaces, because it is stored in a standard database management system.

Distributed database

The development of database software which can manage a database spread across multiple machines is something which major database vendors, such as IBM, are putting great efforts into. Various benefits come from being able to do this, such as being able to store each district's data locally for improved performance and availability, whilst still being able to access data from other districts in a transparent way (see Elmasri and Navathe for a more detailed discussion). Providing such distributed capability, whilst still maintaining the same function to manage data integrity and other issues, is an extremely complex task. For example, the rollback capability mentioned earlier must be able to work across multiple machines. If a complex transaction makes one update on one machine and is about to make a related update on a second machine when the transaction fails (for any reason, such as the failure of either database system or a network link), then the system must ensure that synchronisation is maintained between both machines.

The ability to provide a distributed database, with this sort of integrity, is perhaps the biggest single argument in favour of storing all aspects of geographic data in a standard database management system. The complexities of producing a true distributed database management system are such that it is difficult to see how any GIS developer could justify taking on this task independently. If a non-database approach is taken and some data is stored outside the database management system then it is not possible for the GIS to exploit any distributed function provided by the database management system.

Since geoManager stores all data in DB2 it can automatically take advantage of any distributed function in DB2 as it appears. There is already some basic distributed function in the latest version of DB2, and it is IBM's intention to greatly enhance this in future versions, first in terms of distribution across multiple DB2 systems on mainframe computers connected over a network, and ultimately including SQL databases on a whole range of hardware platforms from mainframes to PS/2s, within the framework of IBM's Systems Application Architecture (for further details see IBM, 1988).

Performance

As was stated earlier, the main reason for storing some data in a separate database is to optimise performance for graphical display or certain kinds of geographic analysis. It has also been pointed out already that these sort of operations are not done directly against data stored in DB2 by geoManager. Instead, data is extracted from geoManager into a format suitable for interactive display and analysis by GPG, geoManager Graphic Analysis or other GIS packages. Therefore the only performance question which needs to be examined in comparing the geoManager approach with a non-database approach is that of the time taken to retrieve a geographic area from the moment it is initially requested.

Clearly by using a non-database approach and storing the graphics in a specialised format suitable for immediate display, it is possible to achieve a very fast response to certain types of request. For example, if the data is stored as a set of regular map tiles then it should be possible to satisfy a request for a specific map from this set very quickly. The same is true for a request to view an area covering a small number of these maps. In an application where the data viewed is generally quite standard (for example, where it can be specified as one or more of a number of layers), and this is generally viewed at a reasonably constant scale, then this approach is likely to give good performance in terms of elapsed time from initial request to initial display.

In a corporate system rather more flexibility is required because different applications will need to display and analyse data at greatly differing scales. While some applications may need very detailed data at a large scale, other applications may require a sparse set of data, perhaps qualified by attributes, across a much wider area (for example, one might want to retrieve all crimes committed between 10 pm and midnight on a Friday or Saturday night during the last six months in county X). With a map-based (and possibly layered) system it is likely to be rather difficult to create this type of data set. This is where an object-based system has an advantage, since this sort of request is easy to satisfy with an object-based approach. Of course the extra flexibility provided by an object-based approach does not come for free, since in order to display a map at any scale it is necessary to retrieve each individual object in the requested area, rather than just accessing a single predefined file.

Because of this, the initial retrieval of a geographic area at a large scale in an object-based system is likely to be slower than one can achieve with a map-based approach.

Spatial indexing in geoManager

The importance of performance issues has been recognized in the design of geoManager, and a major design point has been to optimise retrieval times from the database. In line with the general philosophy of geoManager, the techniques used have been designed so that they will exploit as much DB2 function as possible, and also take advantage of future enhancements. The key to efficient area retrieval from geoManager is a spatial indexing system based on a quad tree approach. For a general discussion of spatial indexing methods, see Samet (1988) or Vanzella (1988). The specific approach used by geoManager is similar to that described by Abel and Smith (1983), generally known as the Smallest Containing Quad Method or MX-CIF quadtree, with suitable modifications for a relational environment.

A very brief overview of the approach is given here. The area covered by the application is recursively subdivided into quad cells to a predefined level. Each quad cell can be assigned a code as illustrated in Figure 3. A unique key for each quad cell is given by concatenating this code with the level of the quad cell in the tree. The example illustrates this using three levels, but in practice one would use considerably more than this. The spatial index key assigned to each object is given by the key of the smallest quad cell which completely encloses it. The spatial index key is stored as an attribute of every object in the database.

This spatial index key has some important properties which allow us to use it to perform efficient area retrievals from the database. The single most important property is that, in general, objects which are geographically close together in the real world will be assigned spatial index keys which are similar. To exploit this fact, geoManager specifies to DB2 that it should use the spatial index key as a clustering index, which means that it will physically sort the data records on disk in spatial index key order, and try to maintain this physical clustering as far as possible even when records are inserted or deleted. One consequence of doing this is that objects which are geographically close together in the real world are, in general, stored physically close together on disk. This can considerably enhance performance when accessing a number of objects which are geographically close together, because of the way that disk access works. When a request is made to read a record from disk a block of data is read into memory which contains a number of records, including the one which was requested. If a request is made to read another record subsequently, the system checks whether the requested record is already in memory because of a previous disk access. If it is then the record can be returned much more quickly because it can be read far more quickly from memory than from disk This technique can be extended in various ways. One technique is caching, where the disk subsystem stores large pages of data in memory to increase the probability that a requested record will be in memory rather than on disk, and will therefore be able to be accessed much more quickly. DB2 also has the capability of further improving performance using a technique called prefetch, which involves asynchronously reading pages of data in from disk before they are required for processing. The appropriate pages to read in are determined by looking at the index which the query is following. The geoManager spatial index key allows all these methods of improving performance to be used automatically when doing area retrievals from the database.

As objects are inserted and deleted the physical clustering of objects will gradually be lost. Therefore it is necessary to reorganise the tables from time to time to restore the physical clustering. This is a standard function of DB2. The DB2 system allows the database administrator to obtain statistics about each table to enable him to decide when it is appropriate to reorganise that table.

Another property of the spatial index key which helps geoManager perform efficient extractions is that the key for any given quad cell is immediately followed in the ordered list of keys by all its descendents, that is all smaller quad cells entirely contained within it. This means that the quad cells which are candidates for containing objects in the requested retrieval area typically have key values which can be specified as a relatively small number of continuous ranges, rather than a large number of scattered values, which allows the retrieval to be specified using a relatively small number of SQL queries. Furthermore, these queries can be executed efficiently since each one includes a selection qualification which is a continuous range of values of a field which has a clustering index defined on it.

A more detailed discussion of the spatial indexing techniques used by geoManager is really beyond the scope of this article. Perhaps the most important thing to note is that the techniques used are designed to exploit the standard relational database indexing system as far as possible, which means that geoManager will automatically be able to take advantage of future hardware and software developments which enhance the performance of the relational database.

Conclusion

This discussion has looked at a wide range of database issues which apply to all corporate database systems, including corporate Geographic Information Systems. Some ways in which a standard relational database management system can help to tackle these issues have been highlighted, as have areas where additional function must be provided by the GIS. It has been proposed that many of the benefits which can be provided by standard relational database management systems can only be realised by storing all aspects of the geographic data in the database. This applies most of all to providing distributed database function.

An important distinction has been made between the storage and management of geographic data and the manipulation and analysis of geographic data. Many people have claimed that relational databases are not suitable for GIS, but they are normally talking about the manipulation and analysis of geographic data. The main assertion of this article is that commercial relational database management systems can offer significant benefits for the storage and management of geographic information, as outlined with reference to the approach taken by geoManager.

The role of geographic storage and management systems is likely to become increasingly important as GIS develops. An increasing number of specialised GISs are likely to appear which are suitable for specific applications within an organisation, and it will become increasingly unlikely that a single GIS will be suitable for all the manipulation and analysis requirements of an organisation. However, it is vital that all the GIS applications which are used, as well as non-geographic applications, have a means of accessing data from a single consistent database. The trend towards such geographic database management systems is also likely to be encouraged as different organisations seek to solve the problems of using common databases.

REFERENCES

1. Abel, D.J. and Smith, J.L., A Data Structure and Algorithm Based on a Linear Key for a Rectangle Retrieval Problem, Computer Vision, Graphics and Image Processing. Number 24, 1983.

2. Elmasri, Ramez and Navathe, Shamkant B.. Fundamentals of Database Systems, Benjamin/Cummings, Redwood City, California, 1989.

3. IBM UK Programming Announcement ZP88-0472, Distributed Relational Data in Systems Application Architecture, November 1988.

4. Newell. Richard G. and Easterfield, Mark, Version Management the Problem of the Long Transaction, Proceedings of the Mapping Awareness Conference, Oxford, January 1990.

5. Samet. Hanan Hierarchical representations of collections of small rectangles, ACM Computing Surveys, Volume 20, Number 4, December 1988.

6. Vanzella, Luca, Classification of Data Structures for Thematic Data, Technical Report TR 88-14, Department of Computing Science, University of Alberta, June 1988.

The author would like to acknowledge helpful comments given at various stages in the preparation of this article by David Adler, Steven Brown, Paul Cocking, Peter Gee, Wilhelm Kerbl, Scott Kutz, Tim Lloyd and Kurt Mayer.

PETER BATTY is a Systems Engineer in the GIS Market Development Group at IBM, Warwick.

Editor- We are pleased to print this IBM response to the challenges laid down in our earlier database feature and in particular, to the article entitled 'Towards a blueprint for database vendors' which was described as being 'a valuable step towards a database specification'. We will be pleased to publish similar contributions from other suppliers and examples of live applications.

Reprinted from

MAPPING AWARENESS

Published by

MILES ARNOLD, HIGH WINDS, CASSINGTON, OXFORD OX8 1DL

TELEPHONE 0865 880236

MAPPING AWARENESS VOL. 4 NO.6 JULY/AUGUST 1990