IGNOU University Help Center and Study material : July 2014

Relational Database Design Tool and Mapping of 4NF And 5NF and ENHANCED ER TOOLS (EER) model Tool Generalisation

ENHANCED ER TOOLS

ER modeling concepts the EER model includes:
1) Subclass and Super class
2) Inheritance
3) Specialisation and Generalisation.
To describe the concepts of subclass and super class first let us revisit the concept of
‘entity’. The basic object that an E-R model represents is an entity, which is a “thing”
in the real world with an independent existence. An entity may be an object with a
physical existence or it may be an object with a conceptual existence. Each entity has
attributes (the particular properties that describe it).

Example : the entity vehicle describes the type (that is, the attributes and
relationship) of each vehicle entity and also refers to the current set of vehicle entities
in the showroom database. Some times to signify the database application various
meaningful sub-groupings of entity is done explicitly. For example, the members of
the entity vehicle are further meaningfully sub- grouped as: Car, Scooter, truck and so
on.
The set of entities in each of the groupings is a subset of the entities that belongs to
the entity set vehicle. In other words every sub-grouping must be vehicle. Therefore,
these sub-groupings are called a subclass of the vehicle entity type and the vehicle
itself is called the super class for each of these subclasses.

The relationship between a super class and any of its subclasses is called
class/subclass relationship. It is often called an IS-A or relationship because of the
way we refer to the concept, we say, “car is-a vehicle”. The member entity of the
subclass represents the same real world as the member entity of the super class. If
an entity is a member of a subclass, by default it must also become a member of
the super class whereas it is not necessary that every entity of the super class must
be a member of its subclass. From the discussion above on sub/super classes we
can say that an entity that is a member of a subclass inherits all the attributes of
9
Relational Database
Design
the entity as a member of the super class. Notice that the type of an entity is
defined by the attributes it possesses and the relationship types in which it
participates; therefore, the entity also inherits all the relationships in which the
super class participates. According to inheritance the subclass has its own
attributes and relationships together with all attributes and relationships it inherits
from the super class.

The process of defining the subclasses of an entity type is called specialisation,
where the entity type is called the super class of the specialisation. The above said
specialised set of subclasses are defined on the basis of some common but
distinguishing characteristics of the entities in the super class. For example, the set
of subclasses (car, scooter, truck) is a specialisation of the super class vehicle that
distinguished among vehicles entities based on the vehicle type of each entity. We
may have several other specialisations of the same entity type based on different
common but distinctive characteristics. Figure 2 shows how we can represent a
specialisation with the help of an EER diagram.

The subclasses that define a specialisation are attached by lines to a circle, which
is connected further with the super class. The circle connecting the super class
with the subclass indicates the direction of the super class/ subclass relationship.
The letter ‘d’ in the circle indicates that all these subclasses are disjoint
constraints.

Attributes that apply only to entities of a particular subclass – such as mileage of
car, stock of scooter and capacity of truck are attached to the rectangle
representing that subclass. Notice that an entity that belongs to a subclass
represents the same real-world entity as the entity connected to super class, even
though the same entity is shown twice − one in the subclass and the other in the
super class. A subclass is defined in order to group the entities to which these
attributes apply. The members of a subclass may still share the majority of their
attributes with the other members of the super class (as shown in Figure 3).

Hence the specialisation is a set of subclasses of an entity type, which establishes
additional specific attributes with each subclass and also establishes additional
specific relationship types between each subclass and other entity types or other
subclasses.

write short note on Merge-Join

The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks

The merge-join is applicable to equi-joins and natural joins only. It has the following process:

1) Sort both relations on their join attribute (if not already sorted).

2) Merge the sorted relations to join them. The join step is similar to the merge stage of the sort-merge algorithm, the only difference lies in the manner in which duplicate values in join attribute are treated, i.e., every pair with same value on join attribute must be matched

The number of block accesses:

Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory). Thus number of block accesses for merge-join is:

Blocks of STUDENT + Blocks of MARKS + the cost of sorting if relations are unsorted.

Hybrid Merge-Join
This is applicable only when the join is an equi-join or a natural join and one relation is sorted and the other has a secondary B+-tree index on the join attribute.
The algorithm is as follows:
Merge the sorted relation with the leaf entries of the B+-tree. Sort the result on the addresses of the unsorted relation’s tuples. Scan the unsorted relation in physical address order and merge with the previous results, to replace addresses by the actual tuples. Sequential scan in such cases is more efficient than the random lookup method.

write sort note on Indexed Nested-Loop Join

The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks

Index scans can replace file scans if the join is an equi-join or natural join, and an index is available on the inner relation’s join attribute.

For each tuple si in the outer relation STUDENT, use the index to look up tuples in MARKS that satisfy the join condition with tuple si. In a worst case scenarios, the buffer has space for only one page of STUDENT, and, for each tuple in MARKS, then we should perform an index lookup on MARKS index.

Worst case: Block transfer of STUDENT+ number of records in STUDENT * cost of searching through index and retrieving all matching tuples for each tuple of STUDENT.

If a supporting index does not exist than it can be constructed as and when needed.
If indices are available on the join attributes of both STUDENT and MARKS, then use the relation with fewer tuples as the outer relation.

Example of Index Nested-Loop Join Costs

Compute the cost for STUDENT and MARKS join, with STUDENT as the outer relation. Suppose MARKS has a primary B+-tree index on enroll no, which contains 10 entries in each index node. Since MARKS has 10,000 tuples, the height of the tree is 4, and one more access is needed to the actual data.

The STUDENT has 2000 tuples. Thus, the cost of indexed nested loops join as:

100 + 2000 * 5 = 10,100 disk accesses

Write Short note on Block Nested Loop Join

The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks

This is a variant of nested-loop join in which a complete block of outer loop is joined with the block of inner loop.
The algorithm for this may be written has:
for each block s of STUDENT
{
     for each block m of MARKS
      {
         for each tuple si in s
           {
              for each tuple mi in m
                {
                   Check if (si and mi) satisfy the join condition if they do output joined tuple to the result
                 };
           };
      };
};

Worst case of block accesses in this case = Number of Blocks of outer relation (STUDENT) ×Number of blocks of inner relation (MARKS) + Number of blocks of outer relation (STUDENT).

Best case: Blocks of STUDENT + Blocks of MARKS

Number of block transfers assuming worst case:

100*500 + 100 = 50,100 (much less than nested-loop join)

Number of block transfers assuming best case:

400 + 100 = 500 (same as with nested-loop join)

Improvements to Block Nested-Loop Algorithm
The following modifications improve the block Nested method:

Use M – 2 disk blocks as the blocking unit for the outer relation, where M = memory size in blocks.

Use one buffer block to buffer the inner relation.

Use one buffer block to buffer the output.

This method minimizes the number of iterations.

Write a short Note On Nested loop join

The choice of join algorithm is based on the cost estimate. Let us use the following information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks

Let us show the algorithm for the given relations.
To compute the theta or equi-join
for each tuple s in STUDENT
{
   for each tuple m in MARKS
     {
         test s.enrolno = m.enrolno to see if they satisfy the join condition if they do, output joined tuple to the   result.
};
};
• In the nested loop join there is one outer relation and one inner relation.
• It does not use or require indices. It can be used with any kind of join condition. However, it is expensive as it examines every pair of tuples in the two relations.
• If there is only enough memory to hold one block of each relation, the number of disk accesses can be calculated as:
For each tuple of STUDENT, all the MARKS tuples (blocks) that need to be accessed.

However, if both or one of the relations fit entirely in the memory, block transfer will be needed only once, so the total number of transfers in such a case, may be calculated as:
= Number of blocks of STUDENT + Number of blocks of MARKS
= 100 + 500 = 600.

If only the smaller of the two relations fits entirely in the memory then use that as the inner relation and the bound still holds.

Cost for the worst case:
Number of tuples of outer relation × Number of blocks of inner relation + Number of blocks of outer relation.
2000 ∗ 500 + 100 = 1,000,100 with STUDENT as outer relation.

There is one more possible bad case when MARKS is on outer loop and STUDENT in the inner loop. In this case, the number of Block transfer will be:

10000 ∗ 100+500 = 1,000,500 with MARKS as the outer relation.

SELECT OPERATION in Query Processing

The selection operation can be performed in a number of ways. Let us discuss the algorithms and the related cost of performing selection operations.

File scan: These are the search algorithms that locate and retrieve records that fulfil a selection condition in a file. The following are the two basic files scan algorithms for selection operation:

1) Linear search: This algorithm scans each file block and tests all records to see whether they satisfy the selection condition.
The cost of this algorithm (in terms of block transfer) is defined as:

Cost of searching records satisfying a condition = Number of blocks in a database = Nb.

Cost for searching a key attribute value = Average number of block transfer for locating the value
(on an average, half of the file needs to be traversed) so it is = Nb/2. Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices.

2) Binary search: It is applicable when the selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored continuously then, cost can be estimated as:

Cost = Cost of locating the first tuple by a binary search on the blocks + sequence of other blocks that continue to satisfy the condition

=  log2 (Nb) + (number average of tuples with the same value / Blocking factore Number of Tuples in a block of the relations)

These two values may be calculated from the statistics of the database.
Index scan: Search algorithms that use an index are restricted because the selection condition must be on the search-key of the index.

These two values may be calculated from the statistics of the database.

Index scan: Search algorithms that use an index are restricted because the selection condition must be on the search-key of the index

Represent the relational algebra expression in Query processing Optimisation

A relational algebra expression may have many equivalent expressions.
For example,

QUERY PROCESSING AND EVALUATION

The Query Language – SQL is one of the main reasons of success of RDBMS. A user just needs to specify the query in SQL that is close to the English language and does not need to say how such query is to be evaluated. However, a query needs to be evaluated efficiently by the DBMS. But how is a query-evaluated efficiently? This unit attempts to answer this question. The unit covers the basic principles of query evaluation, the cost of query evaluation, the evaluation of join queries, etc. in detail. It also provides information about query evaluation plans and the role of storage in query evaluation and optimisation. This unit thus, introduces you to the complexity of query evaluation in DBMS.

let us begin by defining query processing. Let us take a look at the Figure

In the first step Scanning, Parsing, and Validating is done to translate the query into its internal form. This is then further translated into relational algebra (an intermediate query form). Parser checks syntax and verifies relations. The query then is optimised with a query plan, which then is compiled into a code that can be executed by the database runtime processor.
We can define query evaluation as the query-execution engine taking a query-evaluation plan, executing that plan, and returning the answers to the query. The query processing involves the study of the following concepts:
• how to measure query costs?
• algorithms for evaluating relational algebraic operations.
• how to evaluate a complete expression using algorithms on individual operations?

IGNOU University Help Center and Study material

Pages

From Twitter