Disk Enabled Index Using R Trees Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Spatial data objects often cover areas in multidimensional spaces and are not well represented by point locations. For example, map objects like countries, census tracts etc occupy regions of non-zero size in two dimensions A common operation on spatial data 1s a search for all objects in an area, for example to find all countries that have land within 20 miles of a particular point. This kind of spatial search occurs frequently in computer aided design (CAD) and Geo-data applications, and therefore it is important to be able to retrieve objects efficiently according to their spatial location [1]

In general Spatial objects are indexed through R-trees. A R-tree is a heighted balanced tree such as B+-tree such that all of the leaf nodes contain the pointers to the data objects. Those pointers are also indexed. R-tree can be memory resident as well as disk-resident.

A spatial database consists of a collection of tuples representing spatial objects and each tuple has a unique identifier which can be used to retrieve its leaf nodes in an R-tree contains index record entries of the form (I, tuple-identifier) where tuple-identifier refers to tuple in the database and I is an multidimensional rectangle represents bounding box of the spatial object to be indexed. I=(I0,I1,......,In-1) where n is the number of dimensions.

Related Work

Spatial networks find application in the areas of transportation GIS, network analysis, city planning and others. To effectively use

them, it is essential to store spatial networks inside database systems. This paper shows how a user would conceptually view and make use of spatial networks in a database.

Spatial Networks

Database View

Spatial networks including transportation networks like road networks for cars, buses, and taxis or railway networks, water pipelines and power networks are a ubiquitous spatial concept. The primary purpose of spatial networks is to provide spatially constrained environment for materials to move or flow through them Networks have linear component called channels through which flow of material takes place. These are the roads in a highway network or the pipes in a pipeline network. Each channel has a geometry as well as some thematic information attached to it.

Examples of thematic information are the name of the road or the speed limit of a road in a highway network. It is assumed that a particular network has a set of attributes associated with it and the set defines the type of the network it is. These channels can intersect among themselves to create junction points. Intersection of two roads can be considered as a junction point. Sometimes, the channels cross over or under one another. These points are called crossover points. Bridges over a road are examples of crossover points. Since a network

can have at a minimum of one channel, a single channel can also be considered a spatial network. Spatial networks are directly stored in databases as a stand-alone entity as objects of the table attribute type snet. An entire snet object including the geometry, connectivity and semantics is stored as a single object in the database as a binary large object (blob). The blob has an internal structure which helps quick access and efficient execution of operations on the snet object. This mechanism avoids the need for making objects of snet data type dependent on multiple tables which in turn makes the data type snet a truly first class citizen of the database. The snet data type is described as an abstract data type in this paper so as to provide a user's view of spatial networks in databases without specifying how they are implemented. All user access to a spatial network has to be performed using the operators defined in the interface of snet. This strategy prevents the user from directly accessing and modifying the internals of a spatial networks object thus potentially rendering it inconsistent

Proposed work

This paper focuses on the structures required to build spatial data for index and querying to support disk based access also. Before that there is a need to understand some basic definitions related to spatial data.

What are Spatial Data Types ?

Spatial data types

are special data types needed to model geometry and to suitably represent

geometric data in database systems

• Examples: point, line, region; partitions (maps), graphs (networks)

provide a fundamental abstraction for modelling the geometric structure of objects in

space, their relationships, properties, and operations

are an important part of the data model and the implementation of a spatial DBMS

The definition of Spatial Data Types

is to a large degree responsible for a successful design of spatial data models

decisively affects the performance of spatial database systems

exerts a great influence on the expressiveness of spatial query languages

should be independent from the data model used by a DBMS

An understanding of SDTs is a prerequisite for an effective construction of important components of a spatial database system, spatial index structures, optimizers for spatial data, spatial query languages, storage management, graphical user interfaces for cooperation with extensible DBMS providing spatial type extension packages


Query Processing in Road Network Data

It is expensive to perform spatial operations (e.g., intersect,

contain) on real spatial data. Thus, simpler structure that approximates the objects are used: Minimum Bounding Rectangle or circle

Multi-step spatial query processing


The spatial index prunes the search space to a set of candidates

Based on the approximations of candidates, some of the false hits can be further filtered away

Finally, the actual objects are examined to identify those that match the query

The multi-step strategy can effectively reduce the number of pages accessed, the number of data to be fetched and tested and the computation time through the approximations

- Types of spatial queries

• Spatial selection: point query, range(window) query

• Spatial join between two or more different entities sets

Taxonomy of spatial indexes

Classification of spatial indexes

The transformation approach

The non-overlapping native space indexing approach

The overlapping native space indexing approach

In this paper we are considered the last approach, The Overlapping native space indexing using R-Trees

The structure of the R-tree

R-trees are tree data structures used for spatial access methods, i.e., for indexing multidimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984[1] and has found significant use in both theoretical and applied contexts.[2] A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" or "find the nearest gas station"

Illustration 1: Structure of R-Tree

The management of rectangles in external data structures is more difficult than that of points because rectangles, unlike points, generally do not fall into a unique cell of a partition, but intersect partition boundaries. There are three solutions for this problem:

The transformation approach: Instead of k-dimensional rectangles, we store 2k-dimensional points, using a point data structure.

Overlapping regions: Partitioning space is abandoned; bucket regions may overlap.

Clipping: We keep partitioning space; if a rectangle intersects partition boundaries it is clipped into several pieces and represented within each cell that it intersects.

The transformation approach. A rectangle, represented by four coordinates (xleft, xright, ybottom, ytop), can be regarded as a point in four dimensions. The various types of queries then map to regions of the 4d-space. This approach is usually illustrated by the case of intervals mapped into 2d-space.

It is a multiway tree, like the B-tree, and stores in each node a set of rectangles. For the leaves, these are the rectangles of the set R to be represented. For an internal node, each rectangle is associated with a pointer to a son p and represents the bucket region of p which is the bounding box of all rectangles represented within p. For example, in Figure 11 the root node contains a rectangle A which is the bounding box of the rectangles D, E, and F stored in the son associated with A. Rectangles may overlap; hence, a rectangle can intersect several bucket regions but will be represented only in one of them. An advantage is that a spatial object can be kept in just one bucket. A problem is that search needs now to branch and follow several paths whenever one is interested in a region lying in the overlap of two son regions. To keep search efficient, it is crucial to minimize the overlap of node regions.

Illustration 2: Branch and Bound Algorithm

BB is called with N being the root node of D . If N is a non-leaf node. Lines 3-5 compute the scores T (e) for non-leaf entries e concurrently. Recall that T (e) is an upper bound score for any point in the subtree of e . The techniques for computing T (e) will be discussed shortly. Like Equation 3, with the component scores Tc(e) known so far, we can derive T+(e) , an upper bound of T (e) . If T+(e) µ , then the subtree of e cannot contain better results than those in Wk and it is removed from V . In order to obtain points with high scores early, we sort the entries in descending order of T (e) before invoking the above procedure recursively on the child nodes pointed by the entries in V . If N is a leaf node, we compute the scores for all points of N concurrently and then update the set Wk of the top-k results. Since both Wk and are global variables, the value of is updated during recursive call of BB.

This structure of R-Tree is extended to support disk based access in case of out of memory problem occur. The R-Tree is developed in such a way, first it will store the data records those can fit in main memory. If the number of records' size exceeds the size of the main memory. Then the current R-Tree structure is saved in disk and loaded into the memory based on the query requirement.

Experimental Results

We are used Real Datasets for Spatial Databases: Road Networks and Points of Interest from http://www.cs.fsu.edu/~lifeifei/SpatialDataset.htm.

California Road Network's Nodes (Node ID, Longitude, Latitude)

California Road Network's Edges (Edge ID, Start Node ID, End Node ID, L2 Distance)


Many interesting issues related to spatial database systems could not be included in this survey, for example:

• spatio-temporal modelling

• spatial objects with imprecise boundaries

• multi-scale modelling/cartographic generalization

• data lineage (maintaining information about precision, collection method etc. of data)

• spatial reasoning/deductive spatial databases

Integrating solutions to such problems with the spatial database technology described here will remain a fascinating challenge for database researchers for quite some time