The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks
Let us show the algorithm for the given relations.
To compute the theta or equi-join
for each tuple s in STUDENT
{
for each tuple m in MARKS
{
test s.enrolno = m.enrolno to see if they satisfy the join condition if they do, output joined tuple to the result.
};
};
• In the nested loop join there is one outer relation and one inner relation.
• It does not use or require indices. It can be used with any kind of join condition. However, it is expensive as it examines every pair of tuples in the two relations.
• If there is only enough memory to hold one block of each relation, the number of disk accesses can be calculated as:
For each tuple of STUDENT, all the MARKS tuples (blocks) that need to be accessed.
However, if both or one of the relations fit entirely in the memory, block transfer will be needed only once, so the total number of transfers in such a case, may be calculated as:
= Number of blocks of STUDENT + Number of blocks of MARKS
= 100 + 500 = 600.
If only the smaller of the two relations fits entirely in the memory then use that as the inner relation and the bound still holds.
Cost for the worst case:
Number of tuples of outer relation × Number of blocks of inner relation + Number of blocks of outer relation.
2000 ∗ 500 + 100 = 1,000,100 with STUDENT as outer relation.
There is one more possible bad case when MARKS is on outer loop and STUDENT in the inner loop. In this case, the number of Block transfer will be:
10000 ∗ 100+500 = 1,000,500 with MARKS as the outer relation.
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks
Let us show the algorithm for the given relations.
To compute the theta or equi-join
for each tuple s in STUDENT
{
for each tuple m in MARKS
{
test s.enrolno = m.enrolno to see if they satisfy the join condition if they do, output joined tuple to the result.
};
};
• In the nested loop join there is one outer relation and one inner relation.
• It does not use or require indices. It can be used with any kind of join condition. However, it is expensive as it examines every pair of tuples in the two relations.
• If there is only enough memory to hold one block of each relation, the number of disk accesses can be calculated as:
For each tuple of STUDENT, all the MARKS tuples (blocks) that need to be accessed.
However, if both or one of the relations fit entirely in the memory, block transfer will be needed only once, so the total number of transfers in such a case, may be calculated as:
= Number of blocks of STUDENT + Number of blocks of MARKS
= 100 + 500 = 600.
If only the smaller of the two relations fits entirely in the memory then use that as the inner relation and the bound still holds.
Cost for the worst case:
Number of tuples of outer relation × Number of blocks of inner relation + Number of blocks of outer relation.
2000 ∗ 500 + 100 = 1,000,100 with STUDENT as outer relation.
There is one more possible bad case when MARKS is on outer loop and STUDENT in the inner loop. In this case, the number of Block transfer will be:
10000 ∗ 100+500 = 1,000,500 with MARKS as the outer relation.
No comments:
Post a Comment