£61.99
FREE Delivery in the UK.
Only 1 left in stock.
Dispatched from and sold by Amazon.
Gift-wrap available.
Database Solutions: A Ste... has been added to your Basket
Trade in your item
Get a £4.75
Gift Card.
Have one to sell?
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See this image

Database Solutions: A Step-by-Step Guide to Building Databases Paperback – 23 Oct 2003


See all formats and editions Hide other formats and editions
Amazon Price New from Used from
Paperback
"Please retry"
£61.99
£39.99 £12.73

Trade In Promotion


Frequently Bought Together

Database Solutions: A Step-by-Step Guide to Building Databases + Software Project Management
Price For Both: £103.38

Buy the selected items together


Trade In this Item for up to £4.75
Trade in Database Solutions: A Step-by-Step Guide to Building Databases for an Amazon Gift Card of up to £4.75, which you can then spend on millions of items across the site. Trade-in values may vary (terms apply). Learn more

Product details


More About the Author

Discover books, learn about writers, and more.

Product Description

From the Back Cover

Database Solutions: A step-by-step guide to building databases 2/e

Are you responsible for designing and creating the databases that keep your business running? Or are you studying for a module in database design? If so, Database Solutions is for you! This fully revised and updated edition will make the database design and build process smoother, quicker and more reliable.

Recipe for database success

  • Take one RDMS – any of the major commercial products will do: Oracle, Informix, SQL Server, Access, Paradox
  • Add one thorough reading of Database Solutions if you are an inexperienced database designer, or one recap of the methodology if you are an old hand
  • Use the design and implementation frameworks to plan your timetable, use a common data model that fits your requirements and adapt as necessary

Features

  • Includes hints and tips for success with comprehensive guidance on avoiding pitfalls and traps
  • Shows how to create data models using the UML design notation
  • Includes two full-length coded example databases written on Microsoft Access 2002 and Oracle 9i, plus 15 sample data models to adapt to your needs, chosen from seven common business areas

New for this edition!

  • New chapters on SQL (Structured Query Language) and QBE (Query by Example), plus a chapter on database administration and security. A new chapter on current and emerging trends in the area ensures that the book is up to date
  • The database design methodology has been improved and simplified
  • A companion website contains an implementation of the StayHome database that runs throughout the book, plus SQL scripts for 15 sample data models, lecture slides, sample solutions for all exercises from the book plus suggested exam questions and answers, and a variety of possible courseworks

The authors

Thomas Connolly was a designer of RAPPORT, the world’s first commercial portable DBMS, and of the LIFESPAN configuration management tool – a winner of the British Design Award. Carolyn Begg specializes in the application of database systems in biological research. They are both authors of the best selling Database Systems, also published by Pearson Education, which has sold nearly 200,000 copies since publication in 1995.

Excerpt. © Reprinted by permission. All rights reserved.

Chapter 1 Introduction In this chapter you will learn:

Some common uses of database systems.

The meaning of the term database.

The meaning of the term Database Management System (DBMS).

The major components of the DBMS environment.

The typical functions and services a DBMS should provide.

The advantages and disadvantages of DBMSs.

The database is now such an integral part of our day-to-day life that often we're not aware we are using one. To start our discussion of databases, we briefly examine some applications of database systems. For the purposes of this discussion, we consider a database to be a collection of related data and the Database Management System (DBMS) to be the software that manages and controls access to the database. We also use the more inclusive term database system to be a collection of application programs that interact with the database. We provide more accurate definitions in Section 1.2. Later in the chapter, we'll look at the typical functions of a modern DBMS and briefly review the main advantages and disadvantages of DBMSs. 1.1

Examples of the use of database systems Purchases from the supermarket When you purchase goods from your local supermarket, it's likely that a database is accessed. The checkout assistant uses a bar code reader to scan each of your #purchases. This is linked to an application program that uses the bar code to find the price of the item from a product database. The program then reduces the number of such items in stock and displays the price on the cash register. If the reorder level falls below a specified threshold, the database system may automatically place an order to obtain more stocks of that item. Purchases using your credit card When you purchase goods using your credit card, the assistant normally checks that you have sufficient credit left to make the purchase. This check may be carried out by telephone or it may be done automatically by a card reader linked to a computer system. In either case, there is a database somewhere that contains information about the purchases that you've made using your credit card. To check your credit, there is a database application program that uses your credit card number to check that the price of the goods you wish to buy, together with the sum of the purchases you've already made this month, is within your credit limit. When the purchase is confirmed, the details of your purchase are added to this database. The application program also accesses the database to check that the credit card is not on the list of stolen or lost cards before authorizing the purchase. There are other database application programs to send out monthly statements to each cardholder and to credit accounts when payment is received. Booking a holiday at the travel agents When you make inquiries about a holiday, the travel agent may access several databases containing holiday and flight details. When you book your holiday, the database system has to make all the necessary booking arrangements. In this case, the system has to ensure that two different agents don't book the same holiday or overbook the seats on the flight. For example, if there is only one seat left on the flight from London to New York and two agents try to reserve the last seat at the same time, the system has to recognize this situation, allow one booking to proceed, and inform the other agent that there are now no seats available. The travel agent may have another, usually separate, database for invoicing. Using the local library Whenever you visit your local library, there is probably a database containing details of the books in the library, details of the readers, reservations, and so on. There will be a computerized index that allows readers to find a book based on its title, or its authors, or its subject area. The database system handles reservations to allow a reader to reserve a book and to be informed by mail when the book is available. The system also sends out reminders to borrowers who have failed to return books on the due date. Typically, the system will have a bar code reader, similar to that used by the supermarket described earlier, which is used to keep track of books coming into and going out of the library.

These are only a few of the applications for database systems, and you'll no doubt be aware of plenty of others. Although we take many of these applications for granted, behind them lies some highly complex technology. At the center of this technology is the database itself. For the system to support the applications that the end-users want, in as efficient a manner as possible, requires a suitably structured database. Producing this structure is known as database design, and it's this important activity that we're going to concentrate on in this book. Whether the database you wish to build is small, or large like the ones above, database design isw a fundamental issue, and the methodology presented in this book will help you build your database correctly with relative ease. Having a well-designed database will allow you to produce a system that satisfies the requirements of the users and, at the same time, provides acceptable performance. 1.2

Database approach In this section, we provide a more formal definition of the terms database and Database Management System (DBMS) than we used in the last section. 1.2.1 The database Let's examine the definition of a database in detail to understand this concept fully. The database is a single, large repository of data, which can be used simultaneously by many departments and users. All data that is required by these users is integrated with a minimum amount of duplication. And importantly, the database is normally not owned by any one department or user but is a shared corporate resource.

As well as holding the company's operational data, the database also holds a description of this data. For this reason, a database is also defined as a self-describing collection of integrated records. The description of the data, that is the meta-data - the 'data about data', is known as the system catalog or data dictionary. It is the self-describing nature of a database that provides what's known as data independence. This means that if new data structures are added to the database or existing structures in the database are modified then the application programs that use the database are unaffected, provided they don't directly depend upon what has been modified. For example, if we add a new column to a record or create a new table, existing applications are unaffected. However, if we remove a column from a table that an application program uses, Database A shared collection of logically related data (and a description of this data), designed to meet the information needs of a company.System catalog Holds data about all objects in the database.then that application program is affected by this change and must be modified accordingly.

The final term in the definition of a database that we should explain is 'logically related'. When we analyze the company's information needs, we attempt to identify the important objects that need to be represented in the database and the logical relationships between these objects. The methodology we'll present for database design will give you guidelines for identifying these important objects and their logical relationships. 1.2.2 The Database Management System (DBMS) The DBMS is the software that interacts with the users, application programs, and the database. Among other things, the DBMS allows users to insert, update, delete, and retrieve data from the database. Having a central repository for all data and data descriptions allows the DBMS to provide a general inquiry facility to this data, called a query language. The provision of a query language (such as SQL) alleviates the problems with earlier systems where the user has to work with a fixed set of queries or where there is a proliferation of programs, giving major software management problems. We'll discuss the typical functions and services of a DBMS in the next section.

Figure 1.1 illustrates the database approach. It shows the Sales and Stock Control Departments using their application programs to access the database through the DBMS. Each set of departmental application programs handles data entry, data maintenance, and the generation of reports. The physical structure and storage of the data are managed by the DBMS. 1.2.3 Views With the functionality described above, the DBMS is an extremely powerful tool. However, as end-users are not too interested in how complex or easy a task is for the system, it could be argued that the DBMS has made things more complex because they may now see more data than they actually need or want to do their job. In recognition of this problem, a DBMS provides another facility known as a view mechanism, which allows each user to have his or her own view of the database, where a view is some subset of the database. As well as reducing Methodology covered in Chapters 7-16DBMS A software system that enables users to define, create, and maintain the database and also provides controlled access to this database.The Structured Query Language (SQL - pronounced 'S-Q-L' or sometimes 'See-Quel'), is the main query language for relational DBMSs.SQL covered in Section 2.4 and Chapter 17complexity by letting users see the data in the way they want to see it, views have several other benefits: n

Views provide a level of security. Views can be set up to exclude data that some users should not see. For example, we could create a view that allows a branch manager and the Payroll Department to see all staff data, including salary details. However, we could create a second view that excludes salary details, which other staff use. n

Views provide a mechanism to customize the appearance of the database. For example, the Stock Control Department may wish to call the Daily Rental Rate column for videos by the simpler name, Daily Rental. n

A view can present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed (for example, columns added or removed, relationships changed, data files split, restructured, or renamed). If columns are added or removed from a data file, and these columns are not required by the view, the view is not affected by this change. Thus, a view helps provide additional data independence to that provided by the system catalog, as we described in Section 1.2.1. Figure 1.1 Sales and Stock Control Departments using a DBMS. 1.2.4

Components of the DBMS environment We can identify five major components in the DBMS environment: hardware, software, data, procedures, and people: (1)

Hardware The computer system(s) that the DBMS and the application programs run on. This can range from a single PC, through a single mainframe, to a network of computers. (2)

Software The DBMS software and the application programs, together with the operating system, including network software if the DBMS is being used over a network. (3)

Data The data acts as a bridge between the hardware and software components and the human components. As we've already said, the database contains both the operational data and the meta-data (the 'data about data'). (4)

Procedures The instructions and rules that govern the design and use of the database. This may include instructions on how to log on to the DBMS, make backup copies of the database, and how to handle hardware or software failures. (5)

People This includes the database designers, database administrators (DBAs), application programmers, and end-users. 1.2.5

DBMS architectures Before the advent of the Web, generally a DBMS would be divided into two parts: n

a client program that handles the main business and data processing logic and interfaces with the user; n

a server program (sometimes called the DBMS engine) that manages and controls access to the database.

This is known as a two-tier client-server architecture. Figure 1.2 illustrates a simplified client-server architecture for a video rental company called StayHome that has offices throughout America. It shows a centralized database and server located at the company's headquarters in Seattle and a number of clients, located at some of the branches around the US.

In the mid-1990s, as applications became more complex and potentially could be deployed to hundreds or thousands of end-users, the client side of this architecture gave rise to two problems: n

A 'fat' client, requiring considerable resources on the client's computer to run effectively. This includes disk space, RAM, and CPU power. n

A significant client-side administration overhead.

By 1995, a new variation of the traditional two-tier client-server model appeared to solve these problems called the three-tier client-server architecture. This new architecture proposed three layers, each potentially running on a different platform: StayHome is used throughout this book and discussed more fully in Chapter 4Figure 1.2 Simplified two-tier client-server configuration for StayHome.(1)

The user interface layer, which runs on the end-user's computer (the client). (2)

The business logic and data processing layer. This middle tier runs on a server and is often called the application server. One application server is designed to serve multiple clients. (3)

A DBMS, which stores the data required by the middle tier. This tier may run on a separate server called the database server.

The three-tier design has many advantages over the traditional two-tier design, such as: n

A 'thin' client, which requires less expensive hardware. n

Simplified application maintenance, as a result of centralizing the business logic for many end-users into a single application server. This eliminates the concerns of software distribution that are problematic in the traditional two-tier client-server model. n

Added modularity, which makes it easier to modify or replace one tier without affecting the other tiers. n

Easier load balancing, again as a result of separating the core business logic from the database functions. For example, a Transaction Processing Monitor (TPM) can be used to reduce the number of connections to the database server. (A TPM is a program that controls data transfer between clients and servers in order to provide a consistent environment for Online Transaction Processing (OLTP).)

An additional advantage is that the three-tier architecture maps quite naturally to the Web environment, with a Web browser acting as the 'thin' client, and a Web server acting as the application server. The three-tier client-server architecture is illustrated in Figure 1.3. 1.3

Functions of a DBMS In this section, we briefly look at the functions and services we would expect a full-scale DBMS to provide nowadays. Data storage, retrieval, and update This is the fundamental function of a DBMS. From our earlier discussion, clearly in providing this functionality the DBMS should hide the internal physical implementation details (such as file organization and storage structures) from the user. A user-accessible catalog A key feature of a DBMS is the provision of an integrated system catalog to hold data about the structure of the database, users, applications, and so on. The simplified three-tier client-server configuration for StayHome.catalog is expected to be accessible to users as well as to the DBMS. The amount of information and the way the information is used vary with the DBMS. Typically, the system catalog stores: Figure 1.3 n

names, types, and sizes of data items; n

integrity constraints on the data; n

names of authorized users who have access to the data. Transaction support Some simple transactions for the StayHome video rental company might be to add a new member of staff to the database, to update the salary of a particular member of staff, or to delete a member from the register. A more complicated example might be to delete a manager from the database and to reassign the branch that he or she managed to another member of staff. In this case, there is more than one change to be made to the database. If the transaction fails during execution, perhaps because of a computer crash, the database will be in an inconsistent state: some changes will have been made and others not. For example, a branch is not allocated a new manager. Consequently, the changes that have been made will have to be undone to return the database to a consistent state again.

To overcome this, a DBMS should provide a mechanism that will ensure that either all the updates corresponding to a given transaction are made or that none of them are made. Concurrency control services One major objective in using a DBMS is to enable many users to access shared data concurrently, this is known as concurency control. Concurrent access is relatively easy if all users are only reading data, as there is no way that they can interfere with one another. However, when two or more users are accessing the database simultaneously and at least one of them is updating data, there may be interference that can result in inconsistencies. For example, consider two transactions T1 and T2 that are executing concurrently as illustrated in Figure 1.4.

T2 is withdrawing $20 from a StayHome member's account (with a balance, balx, currently $50) and T1 is crediting $5 to the same account. If these transactions were executed one after the other with no interleaving of operations, the final balance would be $35 regardless of which was performed first. Transactions T1 and T2 start at nearly the same time and both read the balance as $50. T2 decreases balx by $20 to $30 and stores the update in the database. Meanwhile, transaction T1 increases its copy of balx by $5 to $55 and stores this value in the database, overwriting the previous update and thereby 'losing' $20. Transaction An action, or series of actions, carried out by a single user or application program, which accesses or changes the contents of the database. When multiple users are accessing the database, the DBMS must ensure that interference like this cannot occur. Recovery services When discussing transaction support, we mentioned that if the transaction fails the database has to be returned to a consistent state, this is known as recovery control. This may be a result of a system crash, media failure, a hardware or software error causing the DBMS to stop, or it may be the result of the user detecting an error during the transaction and aborting the transaction before it completes. In all these cases, the DBMS must provide a mechanism to recover the database to a consistent state. Authorization services It's not difficult to envisage instances where we would want to protect some of the data stored in the database from being seen by all users. For example, we may want only branch managers and the Payroll Department to see salary-related information for staff and prevent all other users from seeing this data. Additionally, we may want to protect the database from unauthorized access. The term security refers to the protection of the database against unauthorized access, either intentional or accidental. We expect the DBMS to provide mechanisms to ensure the data is secure. Support for data communication Most users access the database from terminals. Sometimes, these terminals are connected directly to the computer hosting the DBMS. In other cases, the terminals are at remote locations and communicate with the computer hosting the DBMS over a network. In either case, the DBMS must be capable of integrating with networking/communication software. Even DBMSs for PCs should be capable of being run on a local area network (LAN) so that one centralized database can be established for users to share, rather than having a series of disparate databases, one for each user. Figure 1.4 The lost update problem.Security covered in Step 7 in Chapter 15 Integrity services Database integrity refers to the correctness and consistency of stored data. It can be considered as another type of database protection. While it's related to security, it has wider implications; integrity is concerned with the quality of data itself. Integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate. For example, we may want to specify a constraint that no member of StayHome can rent more than 10 videos at the one time. Here, we would want the DBMS to check when we assign a video to a member that this limit would not be exceeded and to prevent the rental from occurring if the limit has been reached. Services to promote data independence Data independence is normally achieved through a view mechanism, as we discussed in Section 1.2.3. There are usually several types of changes that can be made to the physical characteristics of the database without affecting the views, such as using different file organizations or modifying indexes. This is called physical data independence. However, complete logical data independence is more difficult to achieve. The addition of a new table or column can usually be accommodated, but not their removal. In some systems, any type of change to an existing table is prohibited. Utility services Utility programs help the DBA to administer the database effectively. Some examples of utilities are: n

import facilities, to load the database from flat files, and export facilities, to unload the database to flat files; n

monitoring facilities, to monitor database usage and operation.

The above discussion is intentionally brief but should be sufficient to provide a general overview of DBMS functionality. For more information, the interested reader is referred to Connolly and Begg (1999). The above discussion is general. The actual level of functionality offered by a DBMS differs from product to product. For example, a DBMS for a PC may not support concurrent shared access, and it may only provide limited security, integrity, and recovery control. However, modern, large multi-user DBMS products offer all the above functions and much more. Modern systems are extremely complex pieces of software consisting of millions of lines of code, with documentation comprising many volumes. 1.4

Database design Until now, we've taken it for granted that there is a structure to the data in the database. But how do we get this structure? The answer is quite simple: the structure of the database is determined during database design. However, carrying out database design can be extremely complex. To produce a system that will satisfy the company's information needs requires a data-driven approach, which means we think of the data first and the applications second. For the system to be acceptable to the end-users, database design is crucial. A poorly designed database will generate errors that may lead to bad decisions being made, which may have serious repercussions for the company. On the other hand, a well-designed database produces a system that provides the correct information for the decision-making process to succeed, in an efficient way.

We devote several chapters to the presentation of a complete methodology for database design (see Chapters 7-16). We present it as a series of simple-to-follow steps, with guidelines provided throughout. In these chapters, we use a case study based on a video rental company called StayHome. To help reinforce the methodology, in Chapters 18 and 19 we go through a second case study, this time a veterinary clinic called Perfect Pets. In addition, in Appendix D we provide a number of common business data models that you are likely to encounter in one form or another.

Unfortunately, database design methodologies are not very popular, which may be a major cause of failure in the development of database systems. Owing to the lack of structured approaches to database design, the time and resources required for a database project are typically underestimated, the databases developed are inadequate or inefficient in meeting the demands of applications, documentation is limited, and maintenance is difficult.

We hope the methodology presented in this book will help change this attitude. 1.5

Advantages and disadvantages of DBMSs The fact that you are reading this book probably means that you already know many of the advantages of DBMSs, such as: n

Control of data redundancy The database approach eliminates redundancy where possible. However, it does not eliminate redundancy entirely, but controls the amount of redundancy inherent in the database. For example, it's normally necessary to duplicate key data items to model relationships, and sometimes it's desirable to duplicate some data items to improve performance. The reasons for controlled duplication will become clearer when you read the chapters on database design. n

Data consistency By eliminating or controlling redundancy, we're reducing the risk of inconsistencies occurring. If a data item is stored only once in the database, any update to its value has to be performed only once and the new value is immediately available to all users. If a data item is stored more than once and the system is aware of this, the system can ensure that all copies of the item are kept consistent. Unfortunately, many of today's DBMSs don't automatically ensure this type of consistency. n

Sharing of data In a file-based approach, typically files are owned by the people or departments that use them. On the other hand, the database belongs to the entire company and can be shared by all authorized users. In this way, more users share more of the data. Furthermore, new applications can build on the existing data in the database and add only data that is not currently stored, rather than having to define all data requirements again. The new applications can also rely on the functions provided by the DBMS, such as data definition and manipulation, and concurrency and recovery control, rather than having to provide these functions themselves. n

Improved data integrity As we've already stated, database integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate. Constraints may apply to data items within a single record or they may apply to relationships between records. Again, integration allows us to define, and the DBMS to enforce, integrity constraints. n

Improved maintenance through data independence Since a DBMS separates the data descriptions from the applications, it helps make applications immune to changes in the data descriptions. This is known as data independence and its provision simplifies database application maintenance.

Other advantages include: improved security, improved data accessibility and responsiveness, increased productivity, increased concurrency, and improved backup and recovery services.

There are, however, some disadvantages of the database approach, such as: n

Complexity As we've already mentioned, a DBMS is an extremely complex piece of software, and all users (database designers and developers, DBAs, and end-users) must understand this functionality to take full advantage of it. n

Cost of DBMS The cost of DBMSs varies significantly, depending on the environment and functionality provided. For example, a single-user DBMS for a PC may only cost $100. However, a large mainframe multi-user DBMS servicing hundreds of users can be extremely expensive, perhaps $100,000 to $1,000,000. There is also the recurrent annual maintenance cost, which is typically a percentage of the list price. n

Cost of conversion In some situations, the cost of the DBMS and any extra hardware may be insignificant compared with the cost of converting existing applications to run on the new DBMS and hardware. This cost also includes the cost of training staff to use these new systems, and possibly the employment of specialist staff to help with the conversion and running of the system. This cost is one of the main reasons why some companies feel tied to their current systems and cannot switch to more modern database technology. The term 'legacy system' is sometimes used to refer to an older, and usually inferior, system (such as file-based, hierarchical, or network systems). n

Performance Typically, a file-based system is written for a specific application, such as invoicing. As a result, performance is generally very good. However, the DBMS is written to be more general, to cater for many applications rather than just one. The effect is that some applications may not run as fast any more. n

Higher impact of a failure The centralization of resources increases the vulnerability of the system. Since all users and applications rely on the availability of the DBMS, the failure of any component can bring operations to a complete halt until the failure is repaired. Chapter summary

A database is a shared collection of logically related data (and a description of this data), designed to meet the information needs of a company. A DBMS is a software system that enables users to define, create, and maintain the database, and also provides controlled access to this database.

All access to the database is through the DBMS. The DBMS provides facilities that allow users to define the database, and to insert, update, delete, and retrieve data from the database.

The DBMS environment consists of hardware (the computer), software (the DBMS, operating system, and applications programs), data, procedures, and people. The people include database administrators (DBAs), database designers, application programmers, and end-users.

The DBMS provides controlled access to the database. It provides security, integrity, concurrency and recovery control, and a user-accessible catalog. It also provides a view mechanism to simplify the data that users have to deal with.

Some advantages of the database approach include control of data redundancy, data consistency, sharing of data, and improved security and integrity. Some disadvantages include complexity, cost, reduced performance, and higher impact of a failure. Chapter 2 The relational model In this chapter you will learn:

What a data model is and what its uses are.

The terminology of the relational model.

How tables are used to represent data.

Properties of database relations.

How to identify candidate, primary, alternate, and foreign keys.

The meaning of entity integrity and referential integrity.

About SQL and QBE, the two most widely used relational languages.

The Relational Database Management System (often called an RDBMS for short) has become the dominant data-processing software in use today, with estimated sales of approximately $8-10 billion per year ($25 billion with tools sales included), and growing at a rate of about 25 per cent per year. This software represents the second generation of DBMS and is based on the relational data model proposed by Dr E.F. Codd in his seminal paper 'A Relational Model of Data for Large Shared Data Banks' in 1970. In the relational model, all data is logically structured within relations (tables). A great strength of the relational model is this simple logical structure. Yet, behind this simple structure is a sound theoretical foundation that is lacking in the first generation of DBMSs (the network and hierarchical DBMSs).

The design methodology we present in this book is based on the relational data model, as this is the one most of you will be using. In this chapter, we discuss the basic principles of the relational data model. Let's start by first looking at what a data model is. 2.1

What is a data model? A model is a representation of 'real world' objects and events, and their associations. It concentrates on the essential, inherent aspects of a company and ignores the accidental properties. A data model attempts to represent the company, or the part of the company, that you wish to model. It should provide the basic concepts and notations that will allow database designers and end-users to communicate their understanding of the company data unambiguously and accurately. A data model can be thought of as comprising three components: (1)

a structural part, consisting of a set of rules that define how the database is to be constructed; (2)

a manipulative part, defining the types of operations that are allowed on the data (this includes the operations that are used for updating or retrieving data and for changing the structure of the database); (3)

possibly a set of integrity rules, which ensures that the data is accurate.

The purpose of a data model is to represent data and to make the data understandable. If it does this, then it can be easily used to design a database. In the remainder of this chapter, we examine one such data model: the relational data model. 2.2

Terminology The relational model is based on the mathematical concept of a relation, which is physically represented as a table. Codd, a trained mathematician, used terminology taken from mathematics, principally set theory and predicate logic. In this section, we explain the terminology and structural concepts of the relational model. In Section 2.3, we'll discuss the integrity rules for the model and in Section 2.4 we'll examine the manipulative part of the model. 2.2.1

Relational data structure A relational DBMS requires only that the database be perceived by the user as tables. Data model An integrated collection of concepts for describing data, relationships between data, and constraints on the data used by a company. Relation A relation is a table with columns and rows. Note, however, that this perception applies only to the way we view the database; it does not apply to the physical structure of the database on disk, which we can implement using a variety of storage structures (such as a heap file or hash file).Storage structures discussed in Appendix C. In the relational model, we use relations to hold information about the objects that we want to represent in the database. We represent a relation as a table in which the rows of the table correspond to individual records and the table columns correspond to attributes. Attributes can appear in any order and the relation will still be the same relation, and therefore convey the same meaning.

For example, in the StayHome video rental company, the information on branches is represented by the Branch relation, with columns for attributes branchNo (the branch number), street, city, state, zipCode, and mgrStaffNo (the staff number corresponding to the manager of the branch). Similarly, the information on staff is represented by the Staff relation, with columns for attributes staffNo (the staff number), name, position, salary, and branchNo (the number of the branch the staff member works at). Figure 2.1 shows instances of the Branch and Staff relations. As you can see from this example, a column contains values for a single attribute; for example, the branchNo columns contain only numbers of branches.

Domains are an important feature of the relational model. Every attribute in a relational database is associated with a domain. Domains may be distinct for each attribute, or two or more attributes may be associated with the same domain. Figure 2.2 shows the domains for some of the attributes of the Branch and Staff relations.

The domain concept is important because it allows us to define the meaning and source of values that attributes can hold. As a result, more information is available to the system and it can reject operations that don't make sense. For example, it would not be sensible for us to compare a staff number with a branch number, even though the domain definitions for both these attributes are character strings. Unfortunately, you'll find that many RDBMSs don't currently support domains.

The elements of a relation are the tuples or records in the table. In the Staff relation, each record contains five values, one for each attribute. As with attributes, tuples can appear in any order and the relation will still be the same relation, and therefore convey the same meaning.

A relational database consists of tables that are appropriately structured. We refer to this appropriateness as normalization. We'll defer the discussion of normalization until Chapter 6. Attribute An attribute is a named column of a relation.StayHome is used throughout this book and discussed more fully in Chapter 4Domain A domain is the set of allowable values for one or more attributes.Note that, at any given time, typically there will be values in a domain that don't currently appear as values in the corresponding attribute. In other words, a domain describes possible values for an attribute. Tuple A tuple is a record of a relation Relational database A collection of normalized tables. Figure 2.1 An example of the Branch and Staff relations.Attributes (columns) Branch relation Tuples (records){branchNo

street city

state zipCode mgrStaffNo B001

8 Jefferson Way Portland

OR

97201 S1500 B002

City Center Plaza

Seattle WA

98122 S0010 B003

14 - 8th Avenue New York

NY

10012 S0415 B004

16 - 14th Avenue

Seattle WA

98128 S2250 Related columnsPrimary keysForeign keysstaffNo name

position

salary branchNo S1500 Tom Daniels

Manager 46000 B001 S0003 Sally Adams

Assistant

30000 B001 S0010 Mary Martinez Manager 50000 B002 S3250 Robert Chin

Supervisor

32000 B002 S2250 Sally Stern

Manager 48000 B004 S0415 Art Peters

Manager 41000 B003Tuples (records){Staff relation Figure 2.2 Domains for some attributes of the Branch and Staff relations.Alternative terminology The terminology for the relational model can be quite confusing. In this chapter, we've introduced two sets of terms: (relation, attribute, tuple) and (table, column, record). Other terms that you may encounter are file for table, row for record, and field for column. You may also find various combinations of these terms, such as table, row, and field. 2.2.2

Properties of relational tables A relational table has the following properties: n

The table has a name that is distinct from all other tables in the database. n

Each cell of the table contains exactly one value. (For example, it would be wrong to store several telephone numbers for a single branch in a single cell. In other words, tables don't contain repeating groups. A relational table that satisfies this property is said to be normalized or in first normal form.) n

Each column has a distinct name. n

The values of a column are all from the same domain. n

The order of columns has no significance. In other words, provided a column name is moved along with the column values, we can interchange columns. n

Each record is distinct; there are no duplicate records. n

The order of records has no significance, theoretically. (However, in practice, the order may affect the efficiency of accessing records, as we'll see in Chapter 13.) Attribute

Domain name

Meaning Domain definition branchNo

Branch_Numbers Set of all possible branch numbers.

Character: size 4, range B001-B999 street Street_Names

Set of all possible street names.

Character: size 60 staffNo Staff_Numbers Set of all possible staff numbers.

Character: size 5, range S0001-S9999 position

Staff_Positions Set of all possible staff positions.

One of Director, Manager, Supervisor,

Assistant, Buyer salary Staff_Salaries Possible values of staff salaries.

Monetary: 8 digits, range

$10 000.00-$100 000.00 From now on, we will tend to drop the formal terms of relation, tuple, and attribute, and instead use the more frequently used terms table, record, and column. 2.2.3 Relational keys As we've just stated, each record in a table must be unique. This means that we need to be able to identify a column or combination of columns (called relational keys) that provides uniqueness. In this section, we explain the terminology used for relational keys.

Since a superkey may contain additional columns that are not necessary for unique identification, we're interested in identifying superkeys that contain only the minimum number of columns necessary for unique identification.

A candidate key for a table has two properties: n

Uniqueness

In each record, the values of the candidate key uniquely identify that record. n

Irreducibility

No proper subset of the candidate key has the uniqueness property.

There may be several candidate keys for a table. When a key consists of more than one column, we call it a composite key. Consider the Branch table shown in Figure 2.1. For a given value of city, we would expect to be able to determine several branches (for example, Seattle has two branches). This column cannot be selected as a candidate key. On the other hand, since StayHome allocates each branch a unique branch number, then for a given value of the branch number, branchNo, we can determine at most one record, so that branchNo is a candidate key. Similarly, as no two branches can be located in the same zip code, zipCode is also a candidate key for Branch.

Now consider a table called Role, which represents the characters played by actors in videos. The table comprises an actor number (actorNo), a catalog number (catalogNo), and the name of the character played (character), as shown in Figure 2.3. For a given actor number, actorNo, there may be several different videos the actor has starred in. Similarly, for a given catalog number, catalogNo, there may be several actors who have starred in this video. Therefore, actorNo by itself or catalogNo by itself cannot be selected as a candidate key. However, the combination of actorNo and catalogNo identifies at most one record. Superkey A column, or set of columns, that uniquely identifies a record within a table.Candidate key A superkey that contains only the minimum number of columns necessary for unique identification.actorNo catalogNo

character A1002 207132 James Bond A3006 330553 Sean Archer A3006 902355 Jack Stranton A2019 330553 Castor Troy A2019 445624 Stanley Goodspeed A7525 634817 Captain Steve Hiller A4343 781132 Cruella De VilFigure 2.3 An example of the Role table. Be careful not to look at sample data and try to deduce the candidate key(s), unless you are certain the sample is representative of the data that will be stored in the table. Generally, an instance of a table cannot be used to prove that a column or combination of columns is a candidate key. The fact that there are no duplicates for the values that appear at a particular moment in time does not guarantee that duplicates are not possible. However, the presence of duplicates in an instance can be used to show that some column combination is not a candidate key. Identifying a candidate key requires that we know the 'real world' meaning of the column(s) involved so that we can decide whether duplicates are possible. Only by using this semantic information can we be certain that a column combination is a candidate key.

For example, from the data presented in Figure 2.1, we may think that a suitable candidate key for the Staff table would be name, the employee's name. However, although there is only a single value of Tom Daniels in this table at present, if a new member of staff with the same name joins the company, this would invalidate the choice of name as a candidate key. Since a table has no duplicate records, it's always possible to identify each record uniquely. This means that a table always has a primary key. In the worst case, the entire set of columns could serve as the primary key, but usually some smaller subset is sufficient to distinguish the records. The candidate keys that are not selected to be the primary key are called alternate keys. For the Branch table, if we choose branchNo as the primary key, zipCode would then be an alternate key. For the Role table, there is only one candidate key, comprising actorNo and catalogNo, so these columns would automatically form the primary key.

When a column appears in more than one table, its appearance usually represents a relationship between records of the two tables. For example, in Figure 2.1 the inclusion of branchNo in both the Branch and Staff tables is quite deliberate and links branches to the details of staff working there. In the Branch table, branchNo is the primary key. However, in the Staff table the branchNo column exists to match staff to the branch they work in. In the Staff table, branchNo is a foreign key. We say that the column branchNo in the Staff table targets the primary key column branchNo in the home table, Branch. Primary key The candidate key that is selected to identify records uniquely within the table. Foreign key A column, or set of columns, within one table that matches the candidate key of some (possibly the same) table.You may recall that we stated in Chapter 1 that one of the advantages of the DBMS approach was control of data redundancy. This is an example of controlled redundancy - these common columns play an important role in modeling relationships, as we'll see in later chapters. 2.2.4 Representing relational databases A relational database consists of one or more tables. The common convention for representing a description of a relational database is to give the name of each table, followed by the column names in parentheses. Normally, the primary key is underlined. The description of the relational database for the StayHome video rental company is: Figure 2.4 shows an example of the StayHome database. Branch (branchNo, street, city, state, zipCode, mgrStaffNo) Staff (staffNo, name, position, salary, branchNo) Video (catalogNo, title, category, dailyRental, price, directorNo) Director

(directorNo, directorName) Actor (actorNo, actorName) Role

(actorNo, catalogNo, character) Member (memberNo, fName, lName, address) Registration

(branchNo, memberNo, staffNo, dateJoined) RentalAgreement (rentalNo, dateOut, dateReturn, memberNo, videoNo) VideoForRent

(videoNo, available, catalogNo, branchNo) Figure 2.4 An example of the StayHome video rental database. BranchbranchNo street city

state zipCode mgrStaffNo B001

8 Jefferson Way Portland

OR

97201 S1500 B002

City Center Plaza

Seattle WA

98122 S0010 B003

14 - 8th Avenue New York

NY

10012 S0415 B004

16 - 14th Avenue

Seattle WA

98128 S2250 StaffstaffNo

name

position

salary branchNo S1500 Tom Daniels

Manager 46000 B001 S0003 Sally Adams

Assistant

30000 B001 S0010 Mary Martinez Manager 50000 B002 S3250 Robert Chin

Supervisor

32000 B002 S2250 Sally Stern

Manager 48000 B004 S0415 Art Peters

Manager 41000 B003 Figure 2.4 Continued Video catalogNo

title category

dailyRental

price directorNo 207132 Tomorrow Never Dies

Action 5.00

21.99 D1001 902355 Primary Colors Comedy 4.50

14.50 D7834 330553 Face/Off

Thriller

5.00

31.99 D4576 781132 101 Dalmatians Children

4.00

18.50 D0078 445624 The Rock

Action 4.00

29.99 D5743 634817 Independence Day

Sci-Fi 4.50

32.99 D3765Director directorNo

directorName D1001 Roger Spottiswoode D7834 Mike Nichols D4576 John Woo D0078 Stephen Herek D5743 Michael Bay D3765 Roland EmmerickActor

actorNo actorName A1002 Pierce Brosnan A3006 John Travolta A2019 Nicolas Cage A7525 Will Smith A4343 Glenn CloseRole actorNo catalogNo

character A1002 207132 James Bond A3006 330553 Sean Archer A3006 902355 Jack Stranton A2019 330553 Castor Troy A2019 445624 Stanley Goodspeed A7525 634817 Captain Steve Hiller A4343 781132 Cruella De VilMember memberNo

fName lName address M250178 Bob

Adams 57 - 11th Avenue, Seattle, WA, 98105 M166884 Art

Peters 89 Redmond Rd, Portland, OR, 97117 M115656 Serena Parker 22 W. Capital Way, Portland, OR, 97201 M284354 Don

Nelson 123 Suffolk Lane, Seattle, WA, 98117Registration branchNo

memberNo

staffNo dateJoined B002

M250178 S3250 1-Jul-98 B001

M166884 S0003 4-Sep-99 B001

M115656 S0003 12-May-97 B002

M284354 S3250 9-Oct-98Figure 2.4 (continued) RentalAgreement rentalNo

dateOut dateReturn

memberNo

videoNo R753461 4-Feb-00

6-Feb-00

M284354 245456 R753462 4-Feb-00

6-Feb-00

M284354 243431 R668256 5-Feb-00

7-Feb-00

M115656 199004 R668189 2-Feb-00

4-Feb-00

M115656 178643VideoForRent videoNo available

catalogNo

branchNo 199004 N

207132 B001 245456 Y

207132 B002 178643 N

634817 B001 243431 N

634817 B0022.3 Relational integrity In the previous section, we discussed the structural part of the relational data model. As we mentioned in Section 2.1, a data model has two other parts: a manipulative part, defining the types of operations that are allowed on the data, and a set of integrity rules, which ensure that the data is accurate. In this section we discuss the relational integrity rules, and in the following section we discuss the main relational manipulation languages.

Since every column has an associated domain, there are constraints (called domain constraints) in the form of restrictions on the set of values allowed for the columns of tables. In addition, there are two important relational integrity rules, which are constraints or restrictions that apply to all instances of the database. The two principal rules for the relational model are known as entity integrity and referential integrity. Before we define these terms, we need first to understand the concept of nulls. Domains defined in Section 2.2.1 Null Represents a value for a column that is currently unknown or is not applicable for this record. 2.3.1

Nulls A null can be taken to mean 'unknown'. It can also mean that a value is not applicable to a particular record, or it could just mean that no value has yet been supplied. Nulls are a way to deal with incomplete or exceptional data. However, a null is not the same as a zero numeric value or a text string filled with spaces; zeros and spaces are values, but a null represents the absence of a value. Therefore, nulls should be treated differently from other values.

For example, suppose it was possible for a branch to be temporarily without a manager, perhaps because the manager has recently left and a new manager has not yet been appointed. In this case, the value for the corresponding mgrStaffNo column would be undefined. Without nulls, it becomes necessary to introduce false data to represent this state or to add additional columns that may not be meaningful to the user. In this example, we may try to represent the absence of a manager with the value 'None at present'. Alternatively, we may add a new column 'currrentManager?' to the Branch table, which contains a value Y (Yes), if there is a manager, and N (No), otherwise. Both these approaches can be confusing to anyone using the database.

We're now in a position to define the two relational integrity rules. 2.3.2 Entity integrity The first integrity rule applies to the primary keys of base tables.

From an earlier definition, we know that a primary key is a minimal identifier that is used to identify records uniquely. This means that no subset of the primary key is sufficient to provide unique identification of records. If we allow a null for any part of a primary key, we're implying that not all the columns are needed to distinguish between records, which contradicts the definition of the primary key. For example, as branchNo is the primary key of the Branch table, we should not be able to insert a record into the Branch table with a null for the branchNo column. 2.3.3 Referential integrity The second integrity rule applies to foreign keys.

In Figure 2.1, branchNo in the Staff table is a foreign key targeting the branchNo column in the home table, Branch. It should not be possible to create a staff Entity integrity In a base table, no column of a primary key can be null.A base table is a named table whose records are physically stored in the database. This is in contrast to a view, which we mentioned in Section 1.2.3. A view is a 'virtual table' that does not actually exist in the database but is generated by the DBMS from the underlying base tables whenever it's accessed.Referential integrity If a foreign key exists in a table, either the foreign key value must match a candidate key value of some record in its home table or the foreign key value must be wholly null.record with branch number B300, for example, unless there is already a record for branch number B300 in the Branch table. However, we should be able to create a new staff record with a null branch number to allow for the situation where a new member of staff has joined the company but has not yet been assigned to a particular branch. 2.3.4 Business rules It's also possible for users to specify additional constraints that the data must satisfy. For example, if StayHome has a rule that a member can only rent a maximum of 10 videos at any one time, then the user must be able to specify this rule and expect the DBMS to enforce it. In this case, it should not be possible for a member to rent a video if the number of videos the member currently has rented is 10. Unfortunately, the level of support for relational integrity varies from system to system. We'll discuss the implementation of relational integrity in Chapters 12 and 19. 2.4

Relational languages In Section 2.1, we stated that one part of a data model is the manipulative part, which defines the types of operations that are allowed on the data. This includes the operations that are used for updating or retrieving data from the database, and for changing the structure of the database. The two main languages that have emerged for relational DBMSs are: n

SQL (Structured Query Language) and n

QBE (Query-by-Example).

SQL has been standardized by the International Standards Organization (ISO), making it both the formal and de facto standard language for defining and manipulating relational databases. The main characteristics of SQL are: n

It's relatively easy to learn. n

It's a non-procedural language: you specify what information you require, rather than how to get it. n

Like most modern languages, SQL is essentially free-format, which means that parts of statements don't have to be typed at particular locations on the screen. n

The command structure consists of standard English words such as CREATE TABLE, INSERT, SELECT. For example, using the tables defined in Figure 2.4, we could list the titles and daily rental rate of all videos in the Action category, sorted according to the video title, using the following SQL statement: Business rules Additional rules specified by the users or database administrators of a database.SELECT title, dailyrental FROM video WHERE category = 'Action' ORDER BY title ASC; n

SQL can be used by a range of users including Database Administrators (DBAs), management personnel, application programmers, and many other types of end-users.

However, SQL contains only commands to define and manipulate the database; it does not currently contain flow of control commands, such as IF : THEN : ELSE, GO TO, DO : WHILE. Owing to this lack of computational completeness, SQL can be used in two ways. The first way is to use SQL interactively by entering the statements at a terminal. The second way is to embed SQL statements in a procedural language, such as Visual Basic, Delphi, C, C++, Java, COBOL, Fortran, Ada, or Pascal. In the latter case, a popular approach nowadays is to use Microsoft's ODBC (Open Database Connectivity) standard, which provides a DBMS-independent approach to embedding SQL in a programming language, thus making the code more portable between DBMSs.

QBE is an alternative, graphical-based, 'point-and-click' way of querying the database, which is particularly suited for queries that are not too complex, and can be expressed in terms of a few tables. QBE has acquired the reputation of being one of the easiest ways for non-technical users to obtain information from the database. QBE provides a visual means for querying the data through the use of templates. Querying the database is achieved by illustrating the query to be answered. The screen display is used instead of typing in column names and formats; however, we must indicate the columns we want to see and specify data values that we want to use to restrict the query. Languages like QBE can be a highly productive way to query or update the database interactively. Figure 2.5 shows how we would create the SQL query illustrated above using Microsoft Access QBE. Figure 2.5 Sample QBE query.

Unfortunately, unlike SQL, there is no official standard for these languages. However, the functionality provided by vendors is generally very similar and the languages are usually more intuitive to use than SQL. We'll return to SQL and QBE in Chapter 17. Chapter summary

The Relational Database Management System (RDBMS) has become the dominant data-processing software in use today, with estimated sales of approximately $8-10 billion per year ($25 billion with tools sales included), and growing at a rate of about 25 per cent per year. This software represents the second generation of DBMS and is based on the relational data model proposed by Dr E.F. Codd.

Relations are physically represented as tables, with the records corresponding to individual tuples and the columns to attributes.

Properties of relational tables are: each cell contains exactly one value, column names are distinct, column values come from the same domain, column order is immaterial, record order is immaterial, and there are no duplicate records.

A superkey is a column, or set of columns, that identifies records of a table uniquely, while a candidate key is a minimal superkey. A primary key is the candidate key chosen for use in identification of records. A table must always have a primary key. A foreign key is a column, or set of columns, within one table that is the candidate key of another table.

A null represents a value for a column that is unknown at the present time or is not defined for this record.

Entity integrity is a constraint that states that in a base table no column of a primary key can be null. Referential integrity states that foreign key values must match a candidate key value of some record in the home table or be wholly null.

The two main languages for accessing relational databases are SQL (Structured Query Language) and QBE (Query-by-Example). --This text refers to an out of print or unavailable edition of this title.


Inside This Book (Learn More)
Browse Sample Pages
Front Cover | Copyright | Table of Contents | Excerpt | Index | Back Cover
Search inside this book:

Customer Reviews

There are no customer reviews yet on Amazon.co.uk.
5 star
4 star
3 star
2 star
1 star

Most Helpful Customer Reviews on Amazon.com (beta)

Amazon.com: 1 review
2 of 2 people found the following review helpful
Everything to begin 6 Jan. 2006
By Arlan Purdy - Published on Amazon.com
Format: Paperback
I first became interested in databases while trying to make something helpful for a university library reserve department where I worked. Experimentation and consultation with the people at utteraccess.com led me to a good conception of what I would need to accomplish. Unfortnately, it was more than I could do in my spare time before graduation.

This book covers everything I learned and ties in with what I learned from my Abstract Data Structures course. Anyone who finds themselves working on a database without any prior experience should read this book; it's an exellent introduction to the concepts and methods of database design.

One thing I wondered at was the authors' failure to even mention the argument against using real data as primary keys. On utteraccess they make a good case for why primary keys should be only identification numbers for the database and not object attributes. Even if Connolly and Begg have a different view, I think they should have discussed the controversy and the reasons for their preference.

In all other respects I found this a thorough, helpful book that will not overwhelm the beginner with technicalities, but will establish a solid foundation for database work.
Was this review helpful? Let us know


Feedback