The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks
The merge-join is applicable to equi-joins and natural joins only. It has the following process:
1) Sort both relations on their join attribute (if not already sorted).
2) Merge the sorted relations to join them. The join step is similar to the merge stage of the sort-merge algorithm, the only difference lies in the manner in which duplicate values in join attribute are treated, i.e., every pair with same value on join attribute must be matched
The number of block accesses:
Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory). Thus number of block accesses for merge-join is:
Blocks of STUDENT + Blocks of MARKS + the cost of sorting if relations are unsorted.
Hybrid Merge-Join
This is applicable only when the join is an equi-join or a natural join and one relation is sorted and the other has a secondary B+-tree index on the join attribute.
The algorithm is as follows:
Merge the sorted relation with the leaf entries of the B+-tree. Sort the result on the addresses of the unsorted relation’s tuples. Scan the unsorted relation in physical address order and merge with the previous results, to replace addresses by the actual tuples. Sequential scan in such cases is more efficient than the random lookup method.
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks
The merge-join is applicable to equi-joins and natural joins only. It has the following process:
1) Sort both relations on their join attribute (if not already sorted).
2) Merge the sorted relations to join them. The join step is similar to the merge stage of the sort-merge algorithm, the only difference lies in the manner in which duplicate values in join attribute are treated, i.e., every pair with same value on join attribute must be matched
The number of block accesses:
Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory). Thus number of block accesses for merge-join is:
Blocks of STUDENT + Blocks of MARKS + the cost of sorting if relations are unsorted.
Hybrid Merge-Join
This is applicable only when the join is an equi-join or a natural join and one relation is sorted and the other has a secondary B+-tree index on the join attribute.
The algorithm is as follows:
Merge the sorted relation with the leaf entries of the B+-tree. Sort the result on the addresses of the unsorted relation’s tuples. Scan the unsorted relation in physical address order and merge with the previous results, to replace addresses by the actual tuples. Sequential scan in such cases is more efficient than the random lookup method.
No comments:
Post a Comment