IGNOU University Help Center and Study material : Represent the relational algebra expression in Query processing Optimisation

A relational algebra expression may have many equivalent expressions.
For example,

σ (salary < 5000) (πsalary (EMP)) is equivalent to πsalary (σsalary <5000 (EMP)).

Each relational algebraic operation can be evaluated using one of the several different algorithms. Correspondingly, a relational-algebraic expression can be evaluated in many ways.

An expression that specifies detailed evaluation strategy is known as evaluation-plan, for example, you can use an index on salary to find employees with salary < 5000 or we can perform complete relation scan and discard employees with salary ≥ 5000. The basis of selection of any of the scheme will be the cost.

Query Optimisation: Amongst all equivalent plans choose the one with the lowest cost. Cost is estimated using statistical information from the database catalogue, for example, number of tuples in each relation, size of tuples, etc.
Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost estimation is made on the basis of heuristic rules.

Measure of Query Cost :Cost is generally measured as total elapsed time for answering the query. There are many factors that contribute to time cost. These are disk accesses, CPU time, or even network communication.
Typically disk access is the predominant cost as disk transfer is a very slow event and is also relatively easy to estimate. It is measured by taking into account the following activities:

Number of seeks × average-seek-cost
Number of blocks read × average-block-read-cost
Number of blocks written × average-block-written-cost.

Please note that the cost for writing a block is higher than the cost for reading a block. This is due to the fact that the data is read back after being written to ensure that the write was successful. However, for the sake of simplicity we will just use number of block transfers from disk as the cost measure. We will also ignore the difference in cost between sequential and random I/O, CPU and communication costs. The I/O cost depends on the search criteria i.e., point/range query on an ordering/other fields and the file structures: heap, sorted, hashed. It is also dependent on the use of indices such as primary, clustering, secondary, B+ tree, multilevel, etc. There are other cost factors also, these may include buffering, disk placement, materialisation, overflow / free space management etc.
In the subsequent section, let us try to find the cost of various operations.

IGNOU University Help Center and Study material

Pages

From Twitter

Represent the relational algebra expression in Query processing Optimisation

No comments:

Post a Comment

Labels

Blog Archive