Computer Science Lecturer Test Preparation (Part-3)

Normalization:

Normalization rule are divided into following normal form.

First Normal Form
Second Normal Form
Third Normal Form
BCNF
As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key. For example consider a table which is not in First normal form

In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows. Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.

Second Normal Form

As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.

In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student column, which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the subjects into an independent table, and match them up using the student names as foreign keys.

Third Normal Form (3NF)

Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table and also the table must be in Second Normal form.

Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied.

E-R Diagram

ER-Diagram is a visual representation of data that describes how data is related to each other.

Components of E-R Diagram

The E-R diagram has three main components.

1) Entity

An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented using rectangles. Consider an example of an Organisation. Employee, Manager, Department, Product and many more can be taken as entities from an Organisation.

Weak Entity

Weak entity is an entity that depends on another entity. Weak entity doen’t have key attribute of their own. Double rectangle represents weak entity.

2) Attribute

An Attribute describes a property or characterstic of an entity. For example, Name, Age, Address etc can be attributes of a Student. An attribute is represented using eclipse.

Key Attribute

Key attribute represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with underlying lines represent Key Attribute.

Composite Attribute

An attribute can also have their own attributes. These attributes are known as Composite attribute.

3) Relationship

A Relationship describes relations between entities. Relationship is represented using diamonds.

There are three types of relationship that exist between Entities.

Binary Relationship
Recursive Relationship
Ternary Relationship

The sorting algorithms

Insertion sort

Insertion sort is good only for sorting small arrays (usually less than 100 items). In fact, the smaller the array, the faster insertion sort is compared to any other sorting algorithm. However, being an O(n²) algorithm, it becomes very slow very quick when the size of the array increases. It was used in the tests with arrays of size 100.

Shell sort

Shell sort is a rather curious algorithm, quite different from other fast sorting algorithms. It’s actually so different that it even isn’t an O(nlogn) algorithm like the others, but instead it’s something between O(nlog²n) and O(n^1.5) depending on implementation details. Given that it’s an in-placenon-recursive algorithm and it compares very well to the other algorithms, shell sort is a very good alternative to consider.

Heap sort

Heap sort is the other (and by far the most popular) in-place non-recursive sorting algorithm used in this test. Heap sort is not the fastest possible in all (nor in most) cases, but it’s the de-facto sorting algorithm when one wants to make sure that the sorting will not take longer than O(nlogn). Heap sort is generally appreciated because it is trustworthy: There aren’t any “pathological” cases which would cause it to be unacceptably slow. Besides that, sorting in-place and in a non-recursive way also makes sure that it will not take extra memory, which is often a nice feature.

One interesting thing I wanted to test in this project was whether heap sort was considerably faster or slower than shell sort when sorting arrays.

Merge sort

The virtue of merge sort is that it’s a truely O(nlogn) algorithm (like heap sort) and that it’s stable (iow. it doesn’t change the order of equal items like eg. heap sort often does). Its main problem is that it requires a second array with the same size as the array to be sorted, thus doubling the memory requirements.

In this test I helped merge sort a bit by giving it the second array as parameter so that it wouldn’t have to allocate and deallocate it each time it was called (which would have probably slowed it down somewhat, especially with arrays of the bigger items). Also, instead of doing the basic “merge to the second array, copy the second array to the main array” procedure like the basic algorithm description suggests, I simply merged from one array to the other alternatively (and in the end copied the final result to the main array only if it was necessary).

Quick sort

Quick sort is the most popular sorting algorithm. Its virtue is that it sorts in-place (even though it’s a recursive algorithm) and that it usually is very fast. On reason for its speed is that its inner loop is very short and can be optimized very well.

The main problem with quicksort is that it’s not trustworthy: Its worse-case scenario is O(n²) (in the worst case it’s as slow, if not even a bit slower than insertion sort) and the pathological cases tend to appear too unexpectedly and without warning, even if an optimized version of quicksort is used (as I noticed myself in this project).

I used two versions of quick sort in this project: A plain vanilla quick sort, implemented as the most basic algorithm descriptions tell, and an optimized version (called Median Hybrid Quick Sort in the source code below). The optimized version chooses the median of the first, last and middle items of the partition as its pivot, and it stops partitioning when the partition size is less than 16. In the end it runs insertion sort to finish the job.

The optimized version of quick sort is called “Quick2” in the bar charts.

The fourth number distribution (array is sorted, except for the last 256 items which are random) is very expectedly a pathological case for the vanilla quick sort and thus was skipped with the larger arrays. Very unexpectedly this distribution was pathological for the optimized quick sort implementation too, with larger arrays, and thus I also skipped it in the worst cases (because else it would have affected negatively the scale of the bar charts). I don’t have any explanation of why this happened.

Definition – What does Bootstrap mean?

A bootstrap is the program that initializes the operating system (OS) during startup. The term bootstrap or bootstrapping originated in the early 1950s. It referred to a bootstrap load button that was used to initiate a hardwired bootstrap program, or smaller program that executed a larger program such as the OS. The term was said to be derived from the expression “pulling yourself up by your own bootstraps,” starting small and loading programs one at a time while each program is “laced” or connected to the next program to be executed in sequence.

Definition – What does Relational Database Management System (RDBMS) mean?

A relational database management system (RDBMS) is a database engine/system based on the relational model specified by Edgar F. Codd–the father of modern relational database design–in 1970.

Most modern commercial and open-source database applications are relational in nature. The most important relational database features include an ability to use tables for data storage while maintaining and enforcing certain data relationships.

In 1970, Edgar F. Codd, a British computer scientist with IBM, published “A Relational Model of Data for Large Shared Data Banks.” At the time, the renowned paper attracted little interest, and few understood how Codd’s groundbreaking work would define the basic rules for relational data storage, which can be simplified as:

Data must be stored and presented as relations, i.e., tables that have relationships with each other, e.g., primary/foreign keys.
To manipulate the data stored in tables, a system should provide relational operators – code that enables the relationship to be tested between two entities. A good example is the WHERE clause of a SELECT statement, i.e., the SQL statement SELECT * FROM CUSTOMER_MASTER WHERE CUSTOMER_SURNAME = ’Smith’ will query the CUSTOMER_MASTER table and return all customers with a surname of Smith.

Codd later published another paper that outlined the 12 rules that all databases must follow to qualify as relational. Many modern database systems do not follow all 12 rules, but these systems are considered relational because they conform to at least two of the 12 rules.

Most modern commercial and open-source database systems are relational in nature and include well-known applications, e.g., Oracle DB (Oracle Corporation); SQL Server (Microsoft) and MySQL and Postgres (open source).

Relationships

A relationship, in the context of databases, is a situation that exists between two relational database tables when one table has a foreign key that references the primary key of the other table. Relationships allow relational databases to split and store data in different tables, while linking disparate data items.

ER Diagram/ER data model

Diagram (ERD)?

An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database. An entity in this context is a component of data. In other words, ER diagrams illustrate the logical structure of databases.

At first glance an entity relationship diagram looks very much like a flowchart. It is the specialized symbols, and the meanings of those symbols, that make it unique

The History of Entity Relationship Diagrams

Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin have added some slight refinements to the basic ERD principles.

Common Entity Relationship Diagram Symbols

An ER diagram is a means of visualizing how the information a system produces is related. There are five main components of an ERD:

Entities, which are represented by rectangles. An entity is an object or concept about which you want to store information. A weak entity is an entity that must defined by a foreign key relationship with another entity as it cannot be uniquely identified by its own attributes alone.
Actions, which are represented by diamond shapes, show how two entities share information in the database. In some cases, entities can be self-linked. For example, employees can supervise other employees.
Attributes, which are represented by ovals. A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee’s social security number might be the employee’s key attribute.
A multivalued attribute can have more than one value. For example, an employee entity can have multiple skill values. A derived attribute is based on another attribute. For example, an employee’s monthly salary is based on the employee’s annual salary.
Connecting lines, solid lines that connect attributes to show the relationships of entities in the diagram.
Cardinalityspecifies how many instances of an entity relate to one instance of another entity. Ordinality is also closely linked to cardinality. While cardinality specifies the occurrences of a relationship, ordinality describes the relationship as either mandatory or optional. In other words, cardinality specifies the maximum number of relationships and ordinality specifies the absolute minimum number of relationships

An entity-relationship diagram (ERD) is a graphical representation of an information system that shows the relationship between people, objects, places, concepts or events within that system. An ERD is a data modeling technique that can help define business processes and can be used as the foundation for a relational database.

While useful for organizing data that can be represented by a relational structure, an entity-relationship diagram can’t sufficiently represent semi-structured or unstructured data, and an ERD is unlikely to be helpful on its own in integrating data into a pre-existing information system.

Three main components of an ERD are the entities, which are objects or concepts that can have data stored about them, the relationship between those entities, and the cardinality, which defines that relationship in terms of numbers.

For example, an ER diagram representing the information system for a company’s sales department might start with graphical representations of entities such as the sales representative, the customer, the customer’s address, the customer’s order, the product and the warehouse. (See diagram) Then lines or other symbols can be used to represent the relationship between entities, and text can be used to label the relationships.

Finally, cardinality notations define the attributes of the relationship between the entities. Cardinalities can denote that an entity is optional (for example, a sales rep could have no customers or could have many) or mandatory (for example, the must be at least one product listed in an order.)

he three main cardinal relationships are:

One-to-one (1:1).For example, if each customer in a database is associated with one mailing address.
One-to-many (1:M).For example, a single customer might place an order for multiple products. The customer is associated with multiple entities, but all those entities have a single connection back to the same customer.
Many-to-many (M:N). For example,at a company where all call center agents work with multiple customers, each agent is associated with multiple customers, and multiple customers might also be associated with multiple agents.

While there are tools to help draw entity-relationship diagrams, such as CASE (computer-aided software engineering) tools, some relational database management systems also have design capabilities built in.

What is Regression Testing?

Regression testing a black box testing technique that consists of re-executing those tests that are impacted by the code changes. These tests should be executed as often as possible throughout the software development life cycle.

Types of Regression Tests:

Final Regression Tests: – A “final regression testing” is performed to validate the build that hasn’t changed for a period of time. This build is deployed or shipped to customers.
Regression Tests: – A normal regression testing is performed to verify if the build has NOT broken any other parts of the application by the recent code changes for defect fixing or for enhancement.

Selecting Regression Tests:

Requires knowledge about the system and how it affects by the existing functionalities.
Tests are selected based on the area of frequent defects.
Tests are selected to include the area, which has undergone code changes many a times.
Tests are selected based on the criticality of the features.

Regression Testing Steps:

Regression tests are the ideal cases of automation which results in better Return On Investment (ROI).

Select the Tests for Regression.
Choose the apt tool and automate the Regression Tests
Verify applications with Checkpoints
Manage Regression Tests/update when required
Schedule the tests
Integrate with the builds
Analyze the results

Testing techniques

Software Testing Types:

Black box testing – Internal system design is not considered in this type of testing. Tests are based on requirements and functionality.

White box testing – This testing is based on knowledge of the internal logic of an application’s code. Also known as Glass box Testing. Internal software and code working should be known for this type of testing. Tests are based on coverage of code statements, branches, paths, conditions.

Unit testing – Testing of individual software components or modules. Typically done by the programmer and not by testers, as it requires detailed knowledge of the internal program design and code. may require developing test driver modules or test harnesses.

Incremental integration testing – Bottom up approach for testing i.e continuous testing of an application as new functionality is added; Application functionality and modules should be independent enough to test separately. done by programmers or by testers.

Integration testing – Testing of integrated modules to verify combined functionality after integration. Modules are typically code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems.

Functional testing – This type of testing ignores the internal parts and focus on the output is as per requirement or not. Black-box type testing geared to functional requirements of an application.

System testing – Entire system is tested as per the requirements. Black-box type testing that is based on overall requirements specifications, covers all combined parts of a system.

End-to-end testing – Similar to system testing, involves testing of a complete application environment in a situation that mimics real-world use, such as interacting with a database, using network communications, or interacting with other hardware, applications, or systems if appropriate.

Sanity testing – Testing to determine if a new software version is performing well enough to accept it for a major testing effort. If application is crashing for initial use then system is not stable enough for further testing and build or application is assigned to fix.

Regression testing – Testing the application as a whole for the modification in any module or functionality. Difficult to cover all the system in regression testing so typically automation tools are used for these testing types.

Acceptance testing -Normally this type of testing is done to verify if system meets the customer specified requirements. User or customer do this testing to determine whether to accept application.

Load testing – Its a performance testing to check system behavior under load. Testing an application under heavy loads, such as testing of a web site under a range of loads to determine at what point the system’s response time degrades or fails.