Database Systems: The Complete Book- P4

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

Thêm vào BST

Báo xấu

227
lượt xem 7
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Database Systems: The Complete Book- P4: Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments. Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Database Systems: The Complete Book- P4

276 CH.4PTER 6. THE DATABASE LANGUAGE SQL 6.4. FULL-RELATION OPERATIONS StarsIn(movieTitle, movieyear, starname) 6.4 Full-Relation Operations biovieStar(name, address, gender, birthdate) ~ovieExec(name, address, cert#, networth) In this section we shall study some operations that act on relations as a whole, Studio(name, address, presC#) rather than on tuples individually or in small numbers (as do joins of several relations, for instance). First, we deal with the fact that SQL uses relations that describe the tuples that would appear in the following SQL expressions: are bags rather than sets, and a tuple can appear more than once in a relation. We shall see how to force the result of an operation to be a set in Sectiori 6.4.1, a) Studio CROSS JOIN MovieExec; and in Section 6.4.2 we shall see that it is also possible to prevent the elimination b) StarsIn NATURAL FULL OUTER JOIN MovieStar; of duplicates in circumstances where SQL systems \~ould normally eliminate them. c) StarsIn FULL OUTER JOIN MovieStar ON name = starName; Then, we discuss how SQL supports the grouping and aggregation operator y that we introduced in Section 5 4.4. SQL has aggregation operators and *! Exercise 6.3.8 : Using the database schema a GROUP-BY clause. There is also a "HAVING" clause that allows selection of Product (maker, model, type) certain groups in a way that depends on the group as a whole, rather than on PC(mode1, speed, ram, hd, rd, price) individual tuples. Laptop(mode1, speed, ram, hd, screen, price) Printer(mode1, color, type, price) 6 4 1 Eliminating Duplicates .. write an SQL query that will produce information about all products - PC'\. AS mentioned in Section 6.3.4, SQL's notion of relations differs from the abstract laptops, and printers - including their manufacturer if available, and whatever notion of relations presented in Chapter 3. A relation, being a set, cannot have information about that product is relevant (i.e.. found in the relation for that more than one copy of any given tuple. When an SQL query creates a new type of product). relation, the SQL system does not ordinarily eliminate duplicates. Thus. the SQL response to a query may list the same tuple several times. Exercise 6.3.9 : Using the two relations Recall from Section 6.2.1 that one of several equivalent definitions of the Classes(class, type, country, numGuns, bore, displacement) meaning of an SQL select-from-where query is that we begin lvith the Carte- v Ships(name, class, launched) sian product of the relations referred to in the FROM clause. Each tuple of the product is tested by the condition in the WHERE clause. and the ones that pass from our database schema of Exercise 5.2.4, mite an SQL query that will pro- tb the test are given t,he output for projection according to the SELECT clause. duce all available information about ships, including that information available This projection may cause the same tuple to result from different tuples of t,he in the Classes relation. You need not produce information about classes if product, and if so, each copy of the resulting tuple is printed in its turn. Fur- there are no ships of t,hat class mentioned in Ships. ther, since there is nothing wrong with an SQL relation having duplicates, the relations from ~vhich Cartesian product is formed may have duplicates. and the ! Exercise 6.3.10: Repeat Exercise 6.3.9, but aleo include in the result, for an!- each identical copy is paired with the tuples from the other relations, yielding class C that is not nientioned in Ships, inforniation about the ship that has a proliferation of duplicates in the product. the same nanle C as its class. If we do not rvish duplicates in the result, then \ye may follow the key- --ord SELECT by the keyword DISTINCT. That word tells SQL to produce only ! Exercise 6.3.11 : The join operators (other than outerjoin) lye learned in thi- one copy of any tuple and is the SQL analog of applying the 6 operator of section arc redundant. in the sense that they call always be replaced by sclcct- Section 3.4.1 to the result of the query. from-x~hcre csprc,ssions. Explain how to write expressions of the follo~ing o r m f using s e l c r t - f r o ~ n - ~ h ~ ~ ~ : Example 6.27 : Let us reconsider the query of Fig. 6.9: where we asked for the * a) R CROSS JOIN S; producers of Harrison Ford's movies using no subqueries. .Is written, George Lucas will appear many times in the output. If \ye want only to see each b) R NATURAL JOIN S; producer once: n-e may change line (1) of the query to c) R JOIN S ON C ; where C is an SQL condition. : 1) SELECT DISTINCT name Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
278 CHAPTER 6. THE DATABASE LANGUAGE SQL 6.4. FULL-RELATION OPER4TIONS 279 listed in StarsIn (so the movie appeared in three different tuples of StarsIn), The Cost of Duplicate Elimination then that movie's title and year would appear four times in the result of the One might be tempted to place DISTINCT after every SELECT,on the theory that it is harmless. In fact, it is very expensive to eliminate duplicates from As for union, the operators INTERSECT ALL and EXCEPT ALL are intersection a relation. The relation must be sorted or partitioned so that identical and difference of bags. Thus, if R and S are relations, then the result of tuples appear next to each other. These algorithms are discussed starting in Section 15.2.2. Only by grouping the tuples in this way can we determine whether or not a given tuple should be eliminated. The time it takes to R INTERSECT ALL S sort the relation so that duplicates may be eliminated is often greater than the time it takes to execute t.he query itself. Thus, duplicate elimination is the relation in which the number of times a tuple t appears is the minimum should be used judiciously if we want our queries to run fast. of the number of times it appears in R and the number of times it appears in The result of expression Then, t.hc list of producers will have duplicate occurrences of names elirilinated before printing. R EXCEPT ALL S Incidentally, the query of Fig. 6.7, where we used subqueries, does not nec- essarily suffer from the problem of duplicate answers. True, the subquery at has tuple t as many times as the difference of the number of times it appears in line (4) of Fig. 6.7 will produce the certificate number of George Lucas several R minus the number of times it appears in S1provided the difference is positive. times. However, in the "main" query of line (I), we examine each tuple of Each of these definitions is what we discussed for bags in Section 5.3.2. MovieExec once. Presumably, there is only one tuple for George Lucas in that relation, and if so, it is only this tuple that satisfies the WHERE clause of line (3). 6.4.3 Grouping and Aggregation in SQL Thus, George Lucas is printed only once. In Section 5.4.4, we introduced the grouping-and-aggregation operator y for our extended relational algebra. Recall that this operator allo\\-s us to partition 6.4.2 Duplicates in Unions, Intersections, and Differences the tuples of a relation into "groups," based on the values of tuples in one or more attributes, as discussed in Section .3.4.3. lye are then able to aggregate Unlike the SELECT statement, which preserves duplicates as a default and only certain other columns of the relation by applying "aggregation" operators to eliminates them when instructed to by the DISTINCT keyword. the union. inter- those columns. If there are groups, t,hen the aggregation is done separately for section, and difference operations, which tve introduced in Sectio~l 6.2.3: nor- each g o u p . SQL provides all the capability of the 7: operator tlirough the use mally eliminate duplicates. That is, bags are converted to sets, and the set of aggregation operators in SELECT clauses and a special GROUP BY clause. \-c,rsion of the operation is applied. In order to prevent t,he eliminat,ionof dupli- cates, 13-emust follow the operator UNION, INTERSECT,or EXCEPT by the keyn-ord ALL. If we do, then we get the bag semantics of these operators as was discussed 6.4.4 Aggregation Operators in Section 5.3.2. SQL uses the five aggregation operators SUM, AVG. MIN. MAX. and COUNT that rve Exanlpie 6.28 : Consider again the union expression fro111 Esanlple 6.13. but niet in Section 5.4 2. These operators are used by applying them to a scalar- ilo\\- add the kq~vord ALL, as: valued espression. typically a colu~iin nanie. in a SELECT clause. One exception , is the expression COUNT(*). 11-hich counts all the tuples in the relation that is (SELECT title, year F O Movie) RM constructed from the F O clause and WHERE clause of the query. RM UNION ALL In addition, 11-e have the option of eliminating duplicates from the column (SELECT movieTitle AS title, movieyear AS year F O StarsIn); RM before applying the aggregation operator by using the keyrx-ord DISTINCT. That is, an expression such as COUNT(DIST1NCT x) counts the number of distinct ~o~~~ title and year will appear as many times in the result as it appears in a values in column x. \Ve could use any of the other operators in place of COUNT each of the relations Movie and StarsIn put toget,her. For instance, if a movie here, but expressions such as SUM(D1STINCT x) rarely make sense: since it asks appeared once irl the Movie relation and there ~i-ere three stars for that movie us to sum the different values in colunin s. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
280 CHAPTER 6. THE DAT.4BASE LANGUAGE SQL 6.4. FliLL-RELtlTION OPERATIONS E x a m p l e 6.29 : The following query finds the average net worth of all movie E x a m p l e 6.31 : The problem of finding, from the relation executives: M o v i e ( t i t l e , y e a r , l e n g t h , i n c o l o r , s t u d i o l a m e , producerC#) SELECT AVG(netWorth1 F O MovieExec; RM the sum of the lengths of all movies for each studio is expressed by Note that there is no W E E clause a t all, so the keyword W E E is properly HR HR SELECT studioName, SUM(1ength) omitted. This query examines the n e t w o r t h column of the relation F O Movie RM MovieExec(name, a d d r e s s , c e r t # , networth) GROUP B studioName; Y sums the values found there, one value for each tuple (even if the tuple is a We may imagine t h a t the tuples of relation Movie arc reorganized and grouped duplicate of some other tuple), and divides the sum by the number of tuples. so that all the tuples for Disney studios are together, all those for MGM are If there are no duplicate tuples, then this query gives the average net worth together, and so on, as was suggested in Fig. 5.17. The sums of the length as we expect. If there were duplicate tuples, then a movie executive whose components of all the tuples in each group are calculated, and for each group, tuple appeared n times would have his or her net worth counted n times in the the studio name is printed along with that sum. average. Observe in Example 6.31 how the SELECT clause has t ~ v o kinds of terms. E x a m p l e 6.30 : The following query: 1. Aggregations, where a n aggregate operator is applied to a n attribute or SELECT C U T (*) ON expression involving attributes. As mentioned, these terms are evaluated FO StarsIn; RM on a per-group basis. counts the number of tuples in the S t a r s I n relation. The similar query: 2. Attributes, such as studioName in this example, that appear in the GROUP B clause. In a SELECT clause that has aggregations, only those attributes Y SELECT C U T (starName) ON that are mentioned in the GROUP B clause may appear unaggregated in Y FO StarsIn; RM the SELECT clause. counts the number of values in the starName column of the relation. Since, While queries il~volvi~ig GROUP BY generally have both grouping attributes duplicate values are not eliminated when we project onto the starName coltimn and aggregations in the SELECT clause, it is technically not necessary to have in SQL, this count should be the same as the count produced by the query with both. For example, we could m i t e C U T (*) . ON If we want to be certain that we d o not count duplicate values more than SELECT studioName once, we can use the keyword DISTINCT before the aggregated attribute. as: F O Movie RM SELECT COUNT(DIST1NCT starName) GROUP B studioName; Y FO StarsIn; RM This query rvould group the tuples of Movie according t o their studio name and Sox~\., star is counted once, no matter in how many movies they appearcc!. each then print the studio name for each group, no matter how many tuples there are with a gii-en studio name. Thus, the above query has the same effect as SELECT DISTINCT studioName 6.4.5 Grouping F O Movie; RM To group tuples, vie use a GROUP B clause; follo~ving W E E clause. The Y the H R It is also possible to use a GROUP B clause in a query about several relations. Y ke~~l-ordsOUP BY are followed by a list of grouping attributes. In tlle simplest G R Such a query is interpreted by the following sequence of steps: situation, there is only one relation reference in the F O clause, and t,his relation RM has its tuples grouped according to their values in the grouping attributes. 1. Evaluate the relation R expressed by the F O and W E E clauses- That RM HR li-hateyer aggregation operators are used in the SELECT clause are applied only is, relation R is the Cartesian product of the relations mentioned in the within groups. FROM clause. t o which the selection of the W E E clause is applied. HR Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
282 CHAPTER 6. THE DATABASE LANGUAGE SQL 6.4. FULL-RELATION OPER~~TIOIVS 283 - -- - -- - 2. Group the tuples of R according to the attributes in the GROUP BY clause. Grouping, Aggregation, and Nulls 3. Produce as a result the attributes and aggregations of the SELECT clause. as if the query were about a stored relation R. When tuples have nulls, there are a few rules we must remember: Example 6.32 : Suppose we wish to print a table listing each producer's total The value NULL is ignored in any aggregation. It does not contribute lcngth of film produced. l i e need to get information from the two relations to a sum, average, or count, nor can it be the minimum or masi- mum in its column. For example, COUNT(*) is always a count of the Movie(title, year, length, incolor, studioName, producerC#) number of tuples in a relation, but COUNT(A1 is the number of t~iples MovieExec(name, address, certtt, networth) with non-NULL values for attribute A. so we begin by taking their theta-join, equating the certificate numbers from On the other hand, NULL is treated as an ordinary value in a grouped the two relations. That step gives us a relation in which each MovieExec tuple attribute. For example, SELECT a, AVG(b) FROM R GROUP BY a is paired with the Movie tuples for all the movies of that producer. Note that will produce a tuple with NULL for the value of a and the aI7erage an executive who is not a producer will not be paired with any movies: and value of b for the tuplcs with a = NULL, if there is at least one tuple therefore will not appear in the relation. Now, we can group the selected tuplcs in R with a component NULL. of this relation according to the name of the producer. Finally, we sum the lengths of the movies in each group. The query is shown in Fig. 6.13. HAVING MIN(year) < 1930 SELECT name, SUM (length) FROM MovieExec, Movie The resulting quer3; shown in Fig. 6.14, ~vouldremove froin the grouped relation W E R E producerC# = cert# all those groups in which every tuple had a year component 1930 or lliglier. GROUP BY name; Figure 6.13: Computing the length of movies for each produce1 SELECT name, SUM(1ength) FROM MovieExec, Movie 6.4.6 HAVING Clauses WHERE producerC# = cert# GROUP BY name Suppose that we did not wish to include all of the producers in our table of HAVING MIN(year) < 1930; Example 6.32. We could restrict the tuples prior to grouping in a way that \\-ould make undesired groups empty. For instance, if we only wanted the total length of movies for producers with a net worth of more than $10.000,000. we Figure 6.14: Computing the total length of film for early producers could change the third line of Fig. 6.13 to WHERE producerC# = cert# AND networth > 1OOOOOOO There are several rules we must remember about HAVING clauses: Ho~ve\-cr:sometinies we want to choose our groups based on some aggrt.gatt3 * i n aggregation in a HAVING clause applies only to the tuples of the group Property of the group itself. Then we follo117 the GROUP BY clause xvith a HAVING being tested. clause. The latter clausc consists of the keyword HAVING followed by a conditioll about the group. Any attribute of relations in the FROM clause may be aggregated in the HAVING clause, but only those attribut,es that are in the GROUP BY list Example 6-33: Suppose we want to print the total film length for only thosc may appear unaggregated in the HAVING clause (the same rule as for the producers who made at least one film prior to 1930. I r e may append to Fig. 6.13 SELECT clause). the clause Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
284 CHAPTER 6. THE DATABASE ,CAlVGUAGE SQL 6.4. FULL-RELATION OPER4T10ArS 285 - *! f) Find for each manufacturer, the average screen size of its laptops. Order of Clauses in SQL Queries ! g) Find the manufacturers that make at least three different models of PC. Me have now met all six clauses that can appear in an SQL "select-from- 7 where" query: SELECT, F O , W E E G O P BY, HAVING, and O D R BY. R M HR, R U RE ! h) Find for each manufacturer who sells PC's the maximum price of a PC. Only the first two are required, but you can't use a HAVING clause without a G O P BY clause. Whichever additional clauses appear must be in the RU *! i) Find, for each speed of PC above 800, the average price. order listed above. !! j) Find the average hard disk size of a PC for all those manufacturers that make printers. 6.4.7 Exercises for Section 6.4 Exercise 6.4.7 : Write the following queries, based on the database schema Exercise 6.4.1: Write each of the queries in Exercise 5.2.1 in SQL, making Classes ( c l a s s , type, country, numGuns , bore, displacement) sure that duplicates are eliminated. Ships(name, c l a s s , launched) B a t t l e s (name, date) Exercise 6.4.2: Write each of the queries in Exercise 5.2.4 in SQL, making Outcomes (ship, b a t t l e , r e s u l t ) sure that duplicates are eliminated. ! Exercise 6.4.3: For each of your answers to Exercise 6.3.1, determine whether of Exercise 5.2.4, and evaluate your queries using the data of that exercise. or not the result of your query can have duplicates. If so, rewrite the query a) Find the number of battleship classes. to eliminate duplicates. If not, write a query without subqueries that has the same, duplicate-free answer. b) Find the average number of guns of battleship classes. ! Exercise 6.4.4: Repeat Exercise 6.4.3 for your answers to Exercise 6.3.2. ! c) Find the average number of guns of battleships. Xote the difference be- *! Exercise 6.4.5 : In Example 6.27, we mentioned that different versions of the t~veen(b) and (c); do 11-e weight a class by the number of ships of that query "find the producers of Harrison Ford's movies" can hare different answers class or not'? as bags, even though they yield the same set of answers. Consider the version ! d) Find for each class the year in which the first ship of that class was of the query in Example 6.22, where we used a subquery in the F O clause. RM launched. Does this version produce duplicates, and if so, why? ! e) Find for each class the number of ships of that class sunk in battle. Exercise 6.4.6: Write the following queries, based on the database schema !! f) Find for each class with at least three ships the number of ships of that Product (maker, model, type) class sunk in battle. PC(mode1, speed, ram, hd, r d , price) Laptop(mode1, speed, ram, hd, screen, price) !! g) The n-eight (in pounds) of the shell fired from a naval gun is approximately Printer(mode1, color, type, price) one half the cube of the bore (in inches). Find the average weight of the shell for each country's ships. of Exercise 3.2.1. and evaluate your queries using the data of that exercise. * a) Find the average speed of PC's. Exercise 6.4.8 : In Example 5.23 Xe gave an example of the query: "find?for v each star ~ h has appeared in at least threc movies, the earliest year in which o 1)) Find the at-erage speed of laptops costing over $2000. they appeared." \\e wrote this query as a y operation. Write it in SQL. c) Find the average price of PC's made by manufacturer "A." *! Exercise 6.4.9 : The y operator of estended relational algebra does not have ! d) Find the average price of PC's and laptops made by manufacturer '.D..' a feature that corresponds to the HAVING clause of SQL. Is it ~ossible mimic to an SQL query n-ith a HAVING clause in relational algebra? If so, how n'ould we e) Find, for each different speed the average price of a PC. do it in general? Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
286 CHAPTER 6. THE DATABASE LANGUAGE SQL -5. DATABASE AIODIFIC.4TIOiS 287 6.5 Database Modifications ~ f as in Example 6.34, we p r o ~ i d e , values for all attributes of the relation, n we may omit. the list of attributes that follows the relation name. That is, To t.his point, we have focused on the normal SQL query form: the select-from- where st,atement. There are a number of other statement forms that do not return a result, but rather change the state of the database. In this section, we INSERT INTO S t a r s I n shall focus on three types of st.atements that allow us to VALUES('The Maltese F a l c o n ' , 1942, 'Sydney G r e e n s t r e e t ' ) ; 1. Insert tuples into a relation. Howvever, if we take t,his option, we must b e sure t h a t the order of the values 2. Delete certain tuples from a relation. is the same as the standard order of attributes for the relation. We shall see in Section 6.6 how relation schemas are declared, and we shall see that as we d o so 3. Update values of certain components of certain existing tuples. we provide an order for the attributes. This order is assumed when matching values t o attributes, if the list of attributes is missing from a n INSERT statement. We refer to these three types of operations collectively as modifications. If you are not sure of t h e standard order for the attributes, it is best to 6.5.1 Insertion list them in the INSERT clause in the order you choose for their values in the VALUES clause. The basic form of insertion statement consists of: 1. The keywords INSERT INTO, The simple INSERT described above only puts one tuple into a relation. Instead of using explicit values for one tuple, we can compute a set of tuples to 2. The name of a relation R, . be inserted, using a subquery. This subquery replaces t,he keyrvord VALUES and the tuple expression in the INSERT statement form described above. 3. A parenthesized list of attributes of the relation R, 4. The keyword VALUES, and E x a m p l e 6.35 : Suppose we want to add t o the relation 5. A tuple expression, that is, a parenthesized list of concrete values, one for Studio(name, address, presC#) each attribute in the list (3). all movie studios that are mentioned in the relation That is, the basic insertion form is M o v i e ( t i t l e , y e a r , l e n g t h , i n c o l o r , studioName, producerC#) INSERT INTO R(.41,. . . , A,) VALUES (vl;. . . ,v,) ; but do not appear in Studio. Since there is no way t o determine a n address or A tuple is created using the value vi for attribute Ai,for i = 1,2,. . . ,n. I f a president for such a studio, we shall have to be content with value NULL for the list of attributes does not include all attributes of the relation R , then the attributes a d d r e s s and presC# in the inserted S t u d i o tuples. -4 Ivay t o make tuple created has default values for all missing attributes. The most common this insertion is shown in Fig. 6.15. default wlue is NULL, the null value, but there are other options to be discussed in Sect,ion 6.6.4. .1) INSERT INTO Studio(name) E x a m p l e 6.34: Suppose we wish to add Sydney Greenstreet to t,he list of SELECT DISTINCT studioName stars of The hfaltese Falcon. IVe say: F O Movie RM 1) INSERT INTO StarsIn(movieTitle, movieyear, starName) W E E studioName N T I N HR O 2 ) VALUES('The Maltese Falcon',1942, 'Sydney G r e e n s t r e e t ' ) ; (SELECT name F O Studio) ; RM The effect of executing this statement is that a tuple with the three components on line (2) is inserted into the relation S t a r s I n . Since all attributes of S t a r s I n are mentioned on line (I), there is no need to add default components. The Figure 6.1.5: Xdding new studios values on line (2) are matched with the attributes on line (1) in the order given, so 'The Maltese Falcon' becomes the value of the component for attribute Like most SQL statements with nesting, Fig. 6.1.5 is easiest t o examine from movieTitle, and so on. 0 the inside out. Lines (5) and (6) generate all the studio names in the relation Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
288 CHAPTER 6. T H E DATABASE LANGUAGE SQL 6.5. D..IT..1BASE AIODIFICATIOArS That is, the form of a deletion is The Timing of Insertions DELETE FROM R WHERE ; Figure 6.15 illustrates a subtle point about the semantics of SQL state- The effect of executing this statement is that every tuple satisfying the condition ments. In principle, the evaluation of the query of lines (2) through ( 6 ) should be accomplished prior to executing the insertion of line (1). Thus? (4) will be deleted from relation R. there is no possibility that new tuples added to Studio a t line (1) will Example 6.36 : We can delete from relation affect the condition on line (4). However, for efficiency purposes, it is pos- sible that an implementation will execute this statement so that changes S t a r s I n ( m o v i e T i t l e , movieyear, starName) t o Studio are made as soon as new studios are found, during the execution of lines (2) through (6). the fact that Sydney Greenstreet w s a star in The Maltese Falcon by the SQL a In this particular example, it does not matter whether or not inser- tions are delayed until the query is completely evaluated. However, there are other queries where the result can be changed by varying the timing DELETE F O S t a r s I n RM of insertions. For example, suppose DISTINCT were removed from line (2) W E E movieTitle = 'The Maltese Falcon' AND HR of Fig. 6.15. If we evaluate the query of lines (2) through (6) before doing movieyear = 1942 AND any insertion, then a new studio name appearing in several Movie tuples starName = 'Sydney G r e e n s t r e e t ' ; would appear several times in the result of this query and therefore would Notice that unlike the insertion statement of Example 6.34, we cannot sirnply be inserted several times into relation Studio. However, if we inserted specify a tuple t o b e deleted. Rather, we must describe the tuple exactly by a new studios into Studio as soon as we found them during the evaluation W E E clause. HR of the query of lines (2) through (6), then the same new studio would not be inserted twice. Rather, as soon as the new studio was inserted once, its Example 6.37: Here is another example of a deletion. This time, we delete name would no longer satisfy the condition of lines (4) through (6), and from relation it would not appear a second time in the result of the query of lines (2) through (6). MovieExec(name , a d d r e s s , c e r t # , networth) several tuples at once by using a condition that can be satisfied by more than one tuple. The statement Studio. Thus, line (4) tests that a studio name from the Movie relation is none of these studios. DELETE F O MovieExec RM Now, we see that lines (2) through (6) produce the set of studio names W E E networth < 10000000; HR found in Movie but not in Studio. The use of DISTINCT on line (2) assures that each studio will appear only once in this set, no matter how many movies it deletes all movie eseciltives whose net worth is low - less than ten million 0'-ns. Finally, line (1) inserts each of these studios, with NULL for the attributes dollars. address and presC#, into relation Studio. 0 6.5.3 Updates 6.5.2 Deletion U-hile we migllt think of both insertions and deletions of tuples as "updates" to the d a t a b a ~ r .an ~lprlatein SQL is a very specific kind of change to the -4 deletion statement consists of: database: olle or lllore t,lplcs that alreatly esist in thc database have some of 1. The keywords DELETE FROM, their colnponcIits changed. The general form of an update statement is: 2. The name of a relation, say R, 1. The keyword UPDATE, 3. The keyword WHERE, and 2. .A relation name, say I?, 1. A condition. 3. The key\\-ord SET, Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
290 CHAPTER 6. THE DATABASE LANGUAGE SQ . DriTABASE MODIFIC-4TIOXS 291 4. A list of formulas that each set an attribute of the relation R equal to til Exercise 5.2.1. Describe the effect of the modifications on the data of that value of a n expression or constant, 5. The keyword WHERE, and a) Using two INSERT statements store in the database the fact that P C model 1100 is made by manufacturer C, has speed 1800, RAM 256, hard disk 6. A condition. 80, a 20x DVD, and sells for $2499. That is, the form of an update is ) Insert the facts that for every P C there is a laptop with the same manu- UPDATE R SET
- - 292 CHAPTER 6. T H E DATABASE LANGUAGE SQL DEFI;I'IXTG 4 RELATION SCHEAM IN SQL 293 6.6 Defining a Relation Schema in SQL of bits permitted may be less, depending on the inlplementation (as with the types i n t and s h o r t i n t in C). In this section we shall begin a discussion of data definition, the portions of SQL that involve describing the structure of information in the database. In contrast, 5. Floating-point numbers can be represented in a variety of ways. We may the aspects of SQL discussed previously - queries and modifications - are use the type FLOAT or REAL (these are synonyms) for typical floating- often called data m a n i p ~ l a t i o n . ~ point numbers. A higher precision can be obtained with the type DOUBLE The subject of this section is declaration of the schemas of stored relations. PRECISION; again the distinction between these types is as in C. SQL also We shall see how to describe a new relation or table as it is called in SQL. has types that are real numbers with a fixed decimal point. For exam- Section 6.7 covers the declaration of "views," which are virtual relatiorls thar ple, DECIMAL(n,d) allolvs values that consist of n decimal digits, with the are not really stored in the database, while some of the more complex issues decimal point assumed to be d positions from the right. Thus, 0123.45 regarding constraints on relations are deferred to Chapter 7. is a possible value of type DECIMAL(6,2). NUMERIC is almost a syllollym for DECIMAL, although there are possible implementation-dependent dif- 6.6.1 Data Types 6. Dates and times can be represented by the data types DATE and TIME, To begin, let us introduce the principal atomic'data types that are supported respectively. Recall our discussion of date and time values in Section by SQL systems. All attributes must have a data type. 6.1.4. These values are essentially character strings of a special form. itre may, in fact, coerce dates and times to string types, and we may do the 1. Character strings of fixed or varying length. The type CHAR(n) dcnoies a fixed-length string of n characters. That is, if an attribute has type reverse if the string "makes sense" as a dabe or time. CHAR(n1, then in any tuple the component for this attribute will be a string of n characters. VARCHAR(n1 denotes a string of u p t o n characters. 6.6.2 Simple Table Declarations Components for a n attribute of this type will he strings of between 0 and n characters. SQL permits reasonable coercions between values of The simplest form of declaration of a relation schema consists of the keyrl-ords character-string types. Sormally, a string is padded by trailing bl;lnks CREATE TABLE follo\$:ed by the name of the relation and a parenthesized list of if it becomes the value of a component t,hat is a fixed-length st,ring of the attribute names and their types. greater length. For example, the string f o o ' , if it became the value of Example 6.39: The relation schema for our example Moviestar relation, a component for an attribute of type CHAR(5), would assume the valiie 'foo ' (with two blanks following the second 0). The padding blanks which ,\-as described informally in Section 5.1, is expressed in SQL as in Fig. 6.16. The first two attributes, name and address, have each been declared t o be can then be ignored if the value of this conlponent were compared (see Section 6.1.3) with another string. character strings. However, with the name, we have made the decision t o use a fixed-length string of 30 characters: padding a name out with blanks a t the end 2. Bit strings of fixed or varying length. These strings are analogous to fised if necessary and truncating a name to 30 characters if it is longcr. In contrast, and varying-length character st,rings, but their values are strings of bits ti-e have declared addresses t o be variable-length character strings of up to 255 rather than characters. The type BIT(n) denotes bit strings of length n. c h a r a ~ t e r s . ~ is not clear that these two choices are the best possible, but we It while B I T VARYING(^) denotes bit.strings of length up to n. use them to illustrate two kinds of string dat,a types. The gender attribute has values that are a single letter, M or F. Thus: we 3. The type BOOLEAN denotes an attribute ~i-hose value is logical. The po.4- can safe1)- use a single character as the type of this attribute. Fi~lally.the ble values of such an attribute are TRUE. FALSE, and - although it ~~-oulrl b i r t h d a t e attribute naturally deserves the data type DATE. If this type w r e surprise George Boole - U K O N NNW. not available ill a system that did not conforrn to the SQL standard, we could use CHAR(10) instead, since all DATE values arc actual1:- strings of 10 characters: The type INT or INTEGER (these nanies are synonj-ms) denotes typical eight digits and two hyphens. integer values. The type SHORTINT also denotes integers, but the number SThe number 255 is not the result of some weird notion of what typical addresses look like. the material of this section is in the realm of database design, and thus should r\ single byte can store integers between 0 and 255, so it is ~ o s s i b l e represent a v a ~ i n g - to have been 'Overed earlier in the book, like the analogous ODL for object-oriented databases. length character string of rip to 255 bytes by a single byte for the count of characters pills the H"'vever7 there are good reasons to group all SQL study together, so we took the liberty of bytes t o store the string itself. Commercial systems generally support longer varying-length \-iolating our own organization. strings, howe\-er. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.6. DEFDTIiYG. RELATIO;LrS.CHEiII.4 ILVSQL 4 295 294 CH.4PTER 6. THE DAT4BASE LAhTGUAGESQL 6.6.4 Default Values 1) CREATE TABLE MovieStar ( 2) name CHAR(BO), When we create or modify tuples, we sometimes do not have values for all 3) address VARCHAR(255) , components. For example, we mentioned in Example 6.40 that when s-e add 4) gender CHAR( 1) , a column to a relation schema, the esisting tuples do not have a known value, 5) birthdate D T AE and it was suggested that N L could be used in place of a "real" wlue. Or, UL n-e suggested in Example 6.35 that we could insert new tuples into the Studio 1; relation knowing only the studio name and not the address or president's cer- tificate number. Again, it would be necessary to use some value that says "I Figure 6.16: Declaring the relation schema for the MovieStar relation don't know" in place of real values for the latter two attributes. To address these problems, SQL provides the N L wlue, which becomes UL the value of any component whose value is not specified, with the exception 6.6.3 Modifying Relation Schemas of certain situations where the N L value is not permitted (see Section 7.1). UL However, there are times when we ~vould prefer to use another choice of default We can delete a relation R by the SQL statement: value, the value that appears in a column if no other value is known. In general, any place lye declare an attribute and its data type, we may add D O TABLE R; RP the keyword DEFAULT and an appropriate value. That value is either N L or UL a constant. Certain other values that are provided by the system, such as the Relation R is no longer part of the database schema, and we can no longer current time, may also be options. access any of its tuples. Xlore frequently than we would drop a relation that is part of a long-lived Example 6.41: Let us consider Esample 6.39. We might wish to use the database, we may need to modify the schema of an existing relation. These character ? as the default for an unknown gender, and n-e might also wish to - - modifications are done by a statement that begins with the key~vordsA T RLE use t,he earliest possible date. D T '0000-00-00' for an unknown b i r t h d a t e . AE T B E and the name of the relation. \Ve then have several options, the most AL We could replace lines (4) and (5) of Fig. 6.16 by: important of which are 4) gender CHAR(1) DEFAULT I ? ' , 1. ADD followed by a column name and its data type. 5) birthdate D T DEFAULT D T JOOOO-OO-OO' AE AE 2. D O follolved by a column name. RP -4s another esample. n-e could have declared the default value for new at- tribute phone to be ' u n l i s t e d J when 11-eadded this attribute in Example 6.10. Example 6.40 : Thus, for instance, we could modify the MovieStar relation The alteration statement m-ould then look like: by adding an attribute phone with A T R TABLE MovieStar ADD phone CHAR(16) DEFAULT J u n l i s t e d ' ; LE A T R TABLE Moviestar ADD phone CHAR(16); LE 0 .is a result, the Moviestar schema now has five attributes: the four mentioned ' in Fig. 6.16 and the attribute phone, which is a fised-length string of 16 bytes. 6.6.5 Indexes In the actual relation, tuples ~vould have con~potients phone, but xx-e knoty all for An index on an attribute .-I a relation is a data structure that makes it of of no phone numbers to put there. Thus, the value of each of these components efficient to find those tuples that have a fixed value for attribute -4. Iildexes ~vouid IIULL. In Section 6.6.1: we shall see how it is possible to choose another be usually help with queries in ~vhichtheir attribute .-lcompared with a constant: is "default" value to be used instead of N L for unknown values. UL for instance -4 = 3, or even -4 5 3. The technology of implementing indexes -4s another example, we could delete the b i r t h d a t e attribute by on large relations is of central importance in the implementation of DBMS's. Chapter 13 is devoted to this topic. A T R TABLE Moviestar DROP b i r t h d a t e ; LE When relations are very large, it becomes expensive to scan all the tuples of a relation to find those (perhaps very few) tuples that match a given condition. For example, consider the first query we examined: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
296 CHAPTER 6. THE DATABASE L-WGUAGE SQL 6.6. DEFINIXG A RELATION SCHES4-4 I;V SQL 297 SELECT * than a year for a movie, then we would prefer to order t h e attributes as above; F O Movie RM if a year were more likely t o be specified, then we would ask for a n index on W E E studioName = 'Disney' AND year = 1990; HR (year, t i t l e ) . from Example 6.1. There might be 10,000 Movie tuples, of which only 200 were If we wish to delete the index, we sirnply use its name in a statement like: made in 1990. The naive way t o implement this query is to get all 10,000 tuples and test DROP INDEX YearIndex; the condition of the W E E clause on each. It would be much more efficient if we HR had some way of getting only the 200 tuples from the year 1990 and testing each 6.6.6 Introduction to Selection of Indexes of them to see if the studio was Disney. It would be even more efficient if n-e could obtain directly only the 10 or so tuples that satisfied both the conditions Selection of indexes requires a trade-off by the database designer, and in prac- of the WHERE clause - that the studio be Disney and the year be 1990; see the tice, this choice is one of the principal factors that influence whether a database discussion of "multiattribute indexes," below. design is acceptable. Two important factors t o consider are: Although the creation of indexes is not part of any SQL standard up to The existence of an index on a n attribute greatly speeds up queries in and including SQL-99, most commercial systems have a way for the database which a value for that attribute is specified. and in some cases can speed designer to say that the system should create an index on a certain attribute up joins involving that attribute as well. for a certain relation. The following syntax is typical. Suppose we want to have an index on attribute y e a r for the relation Movie. Then we say: On the other hand, every index built for a n attribute of some relation makes insertions, deletions, and updates t o that relation more complex CREATE INDEX YearIndex ON ~ o v i e ( y e a r ) ; and time-consuming. The result will be that a n index whose name is YearIndex ~vill created on be Index selection is one of the hardest parts of database design, since it requires attribute year of the relation Movie. Henceforth, SQL queries that specify a estimating what the typical mix of queries and other operations on the database year may be executed by the SQL query processor in such a way that only those will be. If a relation is queried much more frequently than it is modified, then tuples of Movie with the specified year are ever esamined: there is a resulting indexes on the attributes that are most frequently specified in queries make decrease in the time needed t o answer the query. sense. Indexes are useful for attributes that tend to be compared with constants Often, a DBMS allows us to build a single index on multiple attribute> in WHERE clauses of queries, but indeses also are useful for attributes that appear This type of index takes values for several attributes and efficiently finds the frequently in join conditions. tuples with the given values for these attributes. E x a m p l e 6.43 : Recall Figure 6.3. ~vhere suggested a n exhaustive pairing we Example 6.42 : Since t i t l e and year form a key for Movie, we might expect of tuples to compute a join. .in index on M o v i e . t i t l e would help us find it to be common that values for both these attributes will be specified, or neithcr the Movie tuple for Star Tf~'ars q~~ickly, then. after finding its producer- and will. The following is a typical declaration of an index on these two attributes: certificate-number. an index on MovieExec. c e r t # ~ o u l d help us quickly find that person in the MovieExec relation. CREATE INDEX KeyIndex ON M o v i e c t i t l e , y e a r ) ; If modifications are the predominant action. then we should be very con- Since ( t i t l e : year) is a key, then when 1-e are given a title and year. n(' servative about creating indeses. Even then. it may be a n efficiency gain t o know the index will find only one tuple. and that will be the desired tuple. 111 create a n indes on a frequently used attribute. In fact. since some modification contrast. if the query specifies both the title and year, but only YearIndex ic commands involve querying the datahasc (e.g.. a n INSERT tvith a select-from- available. then the best the system can do is retrieve all the movies of that year where subquery or a DELETE with a condition) one must be very careful h o ~ v and cheek through them for the giren title. one estimates the relative frequency of modifications and queries. If: as is often the case, the key for the multiattribute index is really the We d o not yet have the details - how data is typically stored and how concatenation of the attributes in some order, then we can even use this index indexes are implemented - that are needed t o see the complete picture. HOW- to find all the tuples with a given value in the first of the the attributes. Thus. ever, we can see part of the problem in the follo\ving example. We should be Part of the design of a multiattribute index is the choice of the order in ~vhich aware that the typical relation is stored over many disk blocks. and the prin- the attributes are listed. For instance, if we were more likely t o specify a title cipal cost of a query or modification is often the number of disk blocks that Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
298 CHAPTER 6. THE DATABASE LANGUAGE SQL 6.6. DEFIfi-IXG A REL-4TION SCHESIA IN SQL 299 need to be brought to main memory (see Section 11.4.1). Thus, indexes that 3. Since the tuples for a given star or a given movie are likely to be spread let us find a tupIe without examining the entire relation can save a lot of time. over the 10 disk blocks of StarsIn, even if we have an index on starName However, the indexes themselves have to be stored, at least partially, on disk, or on the combination of movieTitle and movieyear, it will take 3 disk so accessing and modifying the indexes themselves cost disk accesses. In fact, accesses to find the (average of) 3 tuples for a star or movie. If me have no modification, since it requires one disk access to read a block and another disk index on the star or movie, respectively, then 10 disk accesses are required. access to write the changed block, is about twice as expensive as accessing the 1. One disk access is needed to read a block of the index every t,ime we use index or the data in a query. that index to locate tuples with a given value for the indexed attribute(s). If the index block must be modified (in the case of an insertion), then Example 6.44 : Let us consider the relation another disk access is needed to write back the modified block. StarsIn(movieTit;le, movieyear, starlame) 5. Likewise, in the case of an insertion, one disk access is needed to read a block on which the new tuple will be placed, and another disk access is Suppose that there are three database operations that we sometimes perform needed to write back this block. \Ye assume that, even without an index, on this relation: scanning the entire relation. Q1: Uic look for the title and year of movies in which a given star appeared. That is, we execute a query of the form: SELECT movieTitle, movieyear FO StarsIn RM W E E starName = S ; HR for some constant s. Q2: \?'e look for the stars that appeared in a given movie. That is, we esecut? Figure 6.li: Costs associated with the three actions, as a function of which a query of the form: indexes are selected Figure 6.17 gives the costs of each of the three operations: Q1 (query given a SELECT starName star), 9 2 (query given a movie), and I (insertion). If there is no index, then we FROM StarsIn nlust scan the entire relation for Q1 or Qz (cost 10): while an insertion requires W E E movieTitle = t AND movieyear = y ; HR merely that we access a block with free space and relyrite it with the new t,uple (cost of 2, since n-e assume that block can be found n-itllout an indes). These for constants t and y. observations esplain the column labeled -So Index." I: \Ye insert a new tuple into StarsIn. That is, we execute an insertio~lof If there is an index on stars only, then Qg still requires a scan of the entire the form: relation (cost 10). Howeyer, Q1 can be answered by accessing one index block to find the tliree tuples for a given st:ar and then making three more accesses to find those tuples. Ilisertion I requires that n-e read and m i t e both a disk block INSERT INTO S t a r s I n VALUES(t, ?/,s); for the indes and a disk block for the data. for a total of 1 disk accesses. The case \\-here there is an indes on movies o1i1y is 5:-mmetric to the case for constants t : y, and s. for stars only. Finally. if there are irideses on both stars and movies. then it takes 4 disk accesses to ansxver either Q1 or Q2. I*on-ever. insertion I requires Let us make the following assumpt,ions about the data: that we read and write t~vo index blocks as n-ell as a data block, for a total of 1. StarsIn is stored in 10 disk blocks, so if we need to exanline the entire 6 disk accesses. That observation explains the last column in Fig. 6.17. relation the cost is 10. The final roTv in Fig. 6.17 gives the average cost of an action, on the as- sumption that the fraction of the time \ye do Q1 is pl and the fraction of the 2. On the average, a star has appeared in 3 niovies and a movie has 3 stars. time we do Q yis p p : therefore, the fraction of the time 11.e do I is 1 - pl - p2. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
300 CHAPTER 6. THE DATABASE LANGUAGE SQL Depending on pl and pz, any of the four choices of indexlno index can yield * f) An alteration to your Laptop schema from (c) t o add the attribute cd. the best average cost for the three actions. For example, if pl = pz = 0.1. then Let the default value for this attribute be 'none' if the laptop does not + the expression 2 8p1 f 8p2 is the smallest, so we would prefer not to create any have a CD reader. indexes. That is, if we are doing mostly insertion, and very few queries, then we don't want an index. On the other hand, if pl = p.2 = 0.4, then the formula Exercise 6.6.3 : Here is the informal schema from Exercise 5.2.4. 6 - 2pl - 2pz turns out t o b e the smallest, so we would prefer indexes on both starName and on the (movieTitle, movieyear) combination. Intuitively, if C l a s s e s ( c l a s s , t y p e , country, numGuns , bore, displacement) we are doing a lot of queries, and the number of queries specifying mo\-ies and S h i p s (name, c l a s s , launched) stars are roughly equally frequent, then both indexes are desired. Battles(name, d a t e ) If we have pl = 0.5 and pz = 0.1, then it turns out that an index 011 Outcomes(ship, b a t t l e , r e s u l t ) + stars only gives the best average value, because 4 6p2 is the formula with the Write the following declarations: smallest value. Likewise, pl = 0.1 and pz = 0.5 tells us t o create an index on only movies. The intuition is that if only one type of query is frequent, create a ) A suitable schema for relation Classes. only the index that helps that type of query. C] b) A suitable schema for relation Ships 6 6 7 Exercises for Section 6.6 .. c) .A suitable schema for relation B a t t l e s . * Exercise 6.6.1: In this section, we gave a formal declaration for only the d) A suitable schema for relation Outcomes. relation Moviestar among the five relations of our running example. Give suitable declarations for the other four relations: e) An alteration t o your Classes relation from (a) to delete t h e attribute bore. Movie(title, y e a r , l e n g t h , i n c o l o r , studioName, producercit) StarsIn(movieTitle, movieyear, starName) f) An alteration t o your Ships relation from (b) to include the attribute MovieExec(name, a d d r e s s , c e r t # , networth) yard giving the shipyard rvhere the ship was built. Studio(name, a d d r e s s , presC#) E x e r c i s e 6.6.4 : Explain the difference between the statement DROP R and the Exercise 6.6.2: Below we repeat once again the informal database scllc.nl;i statement DELETE F O R. RM from Exercise 5.2.1. E x e r c i s e 6.6.5 : Suppose that the relation S t a r s I n discussed in Exanlple 6.44 required 100 bloclcs rather than 10, but all other assu~llptions that exanlple of Product (maker, model, t y p e ) continued to hold. Give formulas in terms of pl and p.2 to measure the cost of PC(mode1, speed, ram, hd, r d , p r i c e ) Laptop(mode1, speed, ram, hd, s c r e e n , p r i c e ) queries Q1 and Q1 and illsertioll I. under the four combinations of index/no in- d e s discussed there. Printer(mode1, c o l o r , t y p e , p r i c e ) \\rite the following declarations: 6.7 View Definitions a) A suitable schema for relation Product. Relations that are defined with a CREATE TABLE statement actually esist in the database. That is. an SQL systeln stores tables in some physical organization. 11) -4 suitable schema for relation PC. Thev are r)ersistent. in the sense that thev can be expected to esist indefinitely * c) -4 suitable schenla for relation Laptop. and not t o change unless they ale explicitly told t o change by an INSERT or one of the other modification statements 1 - discussed in Section 6.5. 1e d ) -4 suitable schema for relation P r i n t e r . There is another class of SQL relationsl called views: that d o not esist physically. Rather, they are defined by an expression much like a query. Vie~t-s~ e, A n to your P r i n t e r schema from (d) to delete the attribute in turn, can be queried as if they existed physically, and in some cases, lve can color. el-en ~nodifyv i e w . Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
302 CHAPTER 6. THE DATABASE L A N G U U 4 GSQL, ~ 303 6.7.1 Declaring Views The simplest form of view definition is Relations, Tables, and Views 1. The keywords CREATE VIEW, SQL programmers tend t o use the term "table" instead of "relation." The reason is that it is important to make a distinction between stored rela- 2. The name of the view, tions, which are "tables," and virtual relations, which are "views." Now 3. The keyword AS, and that we know the distinction between a table and a view, we shall use "re- lationv only where either a table or view could be used. When we want t o 4. A query Q. This query is the definition of the view. Any tirne we query emphasize that a relation is stored, rather than a view, we shall sometimes the view, SQL behaves as if Q were executed a t that time and the cluer!. use t h e term "base relation" or '.base table." were applied to the relation produced by Q. There is also a third kind of relation, one that is neither a view nor stored permanently. These relations are temporary results, as might be That is, a simple view declaration has the form constructed for some subquery. Temporaries will also be referred t o as "relations" subsequently. CREATE VIEW A ; S E x a m p l e 6.45: Suppose we want to have a view that is a part of the M o v i e ( t i t l e , y e a r , l e n g t h , i n c o l o r , studioName, producerC#) The definition of the view ParamountMovieis used t o turn the query above into a new query that addresses only the base table Movie. We shall illustrate how relation, specifically, the titles and years of the movies made by Paramoullt to convert queries on views to queries on base tables in Section 6.7.5. Hon-erer, Studios. We can define this view by in this simple case it is riot hard to deduce what the example query about the view means. We observe that ParamountMovie differs from Movie in only t ~ v o 1) CREATE VIEW ParamountMovie A S 2) SELECT t i t l e , y e a r 3) FROMMovie 1. Only attributes t i t l e and year are produced by ParamountMovie. 4) W E E studioName = 'Paramount' ; HR 2. The condition studioName = 'Paramount' is part of any W E E clause HR First, the name of the view is ParamountMovie, as we see from line (1).Tlir about ParamountMovie. attributes of the view are those listed in line (2), namely t i t l e and year. T!ic definition of the view is the query of lines (2) through (4). Since our query xvants only the t i t l e produced, (1)does not, present a problem. For (2): we need only t o introduce the condit,ion studioName = 'Paramount' into the W E E clause of our query. Then, we can use Movie in place of HR 6.7.2 Querying Views ParamountMovie in the F O clause. assured that the meaning of our query RM Relation ParamountMovie does not contain tuples in the usual sense. Rathcr. if is preserved. Thus, the query: lve query ParamountMovie, the appropriate tuples are obtained from the hiis(? table Hovie, so the query can be answered. As a result, we can ask the s;l;li{' SELECT t i t l e query about ParamountMovie twice and get different answcrs. The reas011 i , ~ F O Movie RM that. even though we have not changed the definition of view ParamountMovie. W E E studioName = 'Paramount' AND year = 1979; HR the base table Movie may have changed in the interim. is a query about the base table Movie that has the same effect as our origi~lal Example 6-46 : 11-e may query the view ParamountMovie just as if it \,-ere a quer>- about the vielv ParamountMovie. Sote that it is the job of the SQL stored table, for instance: system t o do this translation. We show the reasoning process only t o indicate what a query about a view means. SELECT t i t l e F O ParamountMovie RM E x a m p l e 6.47: It is also possible to write queries inrrolving both views and W E E year = 1979; HR base tables. An example is Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
304 CHAPTER 6. T H E DATABASE LAhTGUAGESQL 6.7, V I E W DEFINITIONS' SELECT DISTINCT starName CREATE VIEW MovieProd(movieTitle, ~ r o d ~ a m eAS ) FROM ParamountMovie, StarsIn SELECT title, name WHERE title = movieTitle AND year = movieyear; FROM Movie, MovieExec WHERE producerC# = cert#; This query asks for the name of all stars of movies made by Paramount. Sote that the use of DISTINCT assures that stars will be listed only once, even if they The view is the same, but its columns are headed by attributes movieTitle appeared in several Paramount movies. mid prodName instead of title and name. Example 6.48: Let us consider a more complicated query used to define a view. Our goal is a relation MovieProd with movie titles and the names of their 6.7.4 Modifying Views producers. The query defining the view involves both relation In limited circumstances it is possible t o execute an insertion, deletion, or up- date to a view. At first, this idea makes no sense a t all, since the view does not Movie(title, year, length, incolor, studioName, producerC#) exist the way a base table (stored relation) does. What could it mean, say, t o insert a new tuple into a view? Where would the tuple go, and how would the from which we get a producer's certificate number, and the relation database system remember that it !&-assupposed to be in the view? MovieExec(name, address, cert#, networth) For many views, the anslrer is simply "you can't do that." However: for sufficiently simple views, called updatable views, it is possible to translate the where we connect the certificate t o the name. We may mite: modification of the view into an equivalent modification on a base table: and the modification can be done to the base table instead. SQL provides a for- CREATE VIEW Movieprod AS ma1 definition of when modifications to a view are permitted. The SQL rules SELECT title, name are complex, but r o u g h l ~ they permit modifications on views that are defined FROM Movie, MovieExec by selecting (using SELECT, not SELECT DISTINCT) some attributes from one WHERE producerC# = cert#; relation R (which may itself be an updatable view). TITOimportant technical \Ve can query this view as if it were a stored relation. For instance, to find the producer of Gone With the Wind, ask: R The WHERE clause must not i~irolve in a subquery. SELECT name The list in the SELECT clause must include enough attributes that for every FROM Movieprod tuple inserted into the \rie\v: \ve can fill the other attributes out with NULL WHERE title = 'Gone With the Wind'; values or the proper default and have a tuple of the base relation that will yield the inserted tuple of the view. AS with any view, this query is treated as if it were an equivalent query ovcr the base tables alone, such as: E x a m p l e 6.49 : Suppose we try to insert into view ParamountMovie of Exam- ple G.43 a tuple like: SELECT name FROM Movie, MovieExec INSERT INTO ParamountMovie WHERE producerC# = cert#.AND title = 'Gone With the Wind'; VALUES('Star Trek', 1979) ; \.ie~vParamountMovie ahnost meets the SQL uptlatability conditions, since the view asks only for sorne components of some tuples of one base table: 6-73 Renaming Attributes Movie(title, year, length, incolor, studioName, ~roducerc#) Solnetinles, we might prefer t o give a viexv's attributes names of our own choos- ing, rather than use the names that come out of the query defining the view. The only problem is that since attribute studioName of Movie is not an at- may specify the attributes of the view by listing them, surrounded by paren- tribute of the view, the tuple we insert into Movie ~vouldhave NULL rather theses, after the name of the view in the CREATE VIEW statement. For instance. than 'Paramount as its value for studioName. That tuple docs not meet the we could rewrite the view defi1lition of Elample 6.48 as: condition that its studio be Paramount. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
306 CHAPTER 6. THE DATABASE LANGUAGE SQL ~ h u s t,o make the view ParamountMovie updatable, we shall add attribute studioName to its SELECT clause, even though it is obvious to us that the studio Why Some Views Are ~ o Updatable t name will be Paramount. The revised definition of view ParamountMovie is: Consider the view MovieProd of Example 6.48, which relates movie titles CREATE VIEW ParamountMovie AS and producers' names. This view is not updatable according to the SQL SELECT studiolame, t i t l e , y e a r definition, because there are two relations in the F O clause: Movie and RM F O Movie RM MovieExec. Suppose ~ v e tried to insert a tuple like W E E studioName = 'Paramount'; HR ( ' G r e a t e s t Show on E a r t h ' , ' C e c i l B. DeMille') Then, we write the insertion into updatable view ParamountMovie as: We would have to insert tuples into both Movie and MovieExec. \ire could use the default value for attributes like l e n g t h or address, but INSERT INTO ParamountMovie what could bc done for the two equated attributes producerC# and c e r t # VALUES('Paramount', ' S t a r T r e k ' , 1979); that both represent the unknown certificate number of Dehlille? We could use NULL for both of these. However, when joining relations with NULL'S, To effect the insertion, we invent a Movie tuple that yields the inserted view SQL does not recognize two NULL values as equal (see Section 6.1.5). tuple when the view definition is applied t o Movie. For the particular insertion Thus. ' G r e a t e s t Show on E a r t h ' would not be connected with ' C e c i l above, the studioName component is 'Paramount', the t i t l e component is B. DeMille' in the MovieProd view, and our insertion would not have ' S t a r Trek', and the year component is 1979. been done correctly. T h e other three attributes that do not appear in the view - length. i n c o l o r , and producerC# - surely exist in the inserted Movie tuple. Ho\vevcr. we cannot deduce their values. As a result, the new Movie tuple must have in the components for each of these three attributes the appropriate default value: is the resulting delete statement. . either NULL or some other default that was declared for an attribute. For ex- ample. if thc default value 0 was declared for attribute l e n g t h , but the other Similarly. an update on an updatable view is passed through to the under- t11-o use NULL for thc default, then the resulting inserted Movie tuple would he: lying relation. The view update thus has the effect of updating all tuples of the underlying relation that give rise in the view t o updated view tuples. title I year I length I inColor I studioName I producerC# ' S t a r Trek' 1 1979 1 0 I NULL I 'Paramount' I NULL Example 6.51 : The view update UPDATE ParamountMovie SET year = 1979 \Ye nlay also delete from an updatable view. The deletion, like the insertion. is passed through to the underlying relation R and causes the deletion of ever! W E E t i t l e = ' S t a r Trek t h e MovieJ; HR tuplc of R that gives rise to a deleted tuple of the ricw. is turned into the base-table update Example 6.50: Suppose we wish to delete from the updatable Paramount- Movie view all movies with "Trek" in their titles. L\'e may issue the deletion UPDATE Movie statement SET year = 1979 W E E t i t l e = ' S t a r Trek t h e Movie' AND HR DELETE F O ParamountMovie RM studioName = 'Paramount'; W E E t i t l e LIKE '%Trek%'; HR This deletion is translated into an equivalent deletion on the Movie base table: the 0111~difference is that the condition defining the view ParamountMovie is -4final liind of modification of a vie\\- is to delete it altogether. This mod- added to the conditions of the W E E clause. HR ification ma!- be done whether or not the view is updatable. -4typical DROP statement is DELETE F O Movie RM W E E t i t l e LIKE '%Trek%' AND studioName = 'Paramount'; HR DROP VIEW ParamountMovie; Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
308 CHAPTER 6. T H E DATABASE LANGUAGE SQL 6.7. VIEW DEFINITIONS title, yeor Note that this statement deletes t,he definition of the view, so we may no longer make queries or issue modification commands involving this view. However dropping the view does not affect any tuples of the underlying relation Movie. In contrast, ' I ~nrdioName = ' Paramount ' DROP TABLE Movie would not only make the Movie table go away. It would also make the view Movie ParamountMovie unusable, since a query that used it would indirectly refer to the nonexistent relation Movie. Figure 6.19: Expression tree for view ParamountMovie 6.7.5 Interpreting Queries Involving Views SELECT t i t l e We can get a good idea of what view queries mean by following the way a query F O ParamountMovie RM involving a view would be processed. The matter is taken up in more generality W E E year = 1979; HR in Section 16.2, when nre examine query processing in general. asking for the Paramount movies made in 1979. This query has the expression The basic idea is illustrated in Fig. 6.18. A query Q is there represented tree shown in Fig. 6.20. Sote that the one leaf of this tree represents the view by its expression tree in relational algebra. This expression tree uses as leaves ParamountMovie. some relations that are views. We have suggested two such leaves, the view V and W. To interpret Q in terms of base tables, we find the definition of the views V and W. These definitions are also expressed as expression trees of relational algebra. Figure 6.20: Expression tree for the query Figure 6.18: Substituting view definit,ions for view references \re therefore interpret the query by substituting the tree of Fig. 6.19 for the leaf ParamountMovie in Fig. 6.20. The resulting tree is shown in Fig. 6.21. To form the query over base tables, we substitute, for each leaf in the tree The tree of Fig. 6.21 is an acceptable interpretation of the query. However, for Q that is a view, the root of a copy of the tree that defines that view. it is expressed in an unnecessarily complex way. .An SQL system would apply Thus. in Fig. 6.18 we have shown the leaves labeled V and 1.V replaced by the transformations to this tree in order to make it look like the expression tree for definitions of these views. The resulting tree is a query over base tables that i q the query ~ v esuggested in Example 6.46: equiralerit to the original query about views. SELECT t i t l e Example 6.52 : Let us consider the view definition and qurry of Example 6.46. F O Movie RM Recall thc definition of view ParamountMovie is: W E E studioName = 'Paramount' AND year = 1979; HR CREATE VIEW ParamountMovie AS SELECT t i t l e , year For example, ne can move the projection xtitles year above the selection F O Movie RM Uyear=lo;e The reason is that delaying a projection until after a selection can W E E studioName = 'Paramount'; HR never change the result of an expression. Then, we have two projections in a row, first onto t i t l e and year and then onto t i t l e alone. Clearly the first of -in expression tree for the query that defines this view is shown in Fig. 6.19. these is redundant, and we can eliminate it. Thus: the two projections can be The query of Example 6.46 is replaced by a single projection onto t i t l e . Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
6.7. VIEW DEFINITIONS Moviestar (name, address, gender, b i r t h d a t e ) MovieExec(name, address, c e r t # , networth) Studio(name, address, presC#) Construct the following views: * a) A view RichExec giving the name, address, certificate number and net worth of all executives with a net worth of a t least $10,000,000. b) A view StudioPres giving the name, address, and certificate number of all executives who are studio presidents. c) A view Executivestar giving the name, address, gender, birth date, cer- tificate number, and net worth of all individuals who are both executives and stars. Exercise 6.7.2 : Which of the views of Exercise 6.7.1 are updatable? Figure 6.21: Exercise 6.7.3: Write each of the queries below, using one or more of the views from Exercise 6.7.1 and no base tables. The two selections can also be combined. In general, two consecutive se- lections can be replaced by one selection for the AND of their conditions. The a) Find the names of females who are both stars and executives. resulting expression tree is shown in Fig. 6.22. It is the tree that we would * b) Find the names of those executives who are both studio presidents and obtain from the query worth at least $10,000,000. SELECT t i t l e ! c) Find the names of studio presidents who are also stars and are worth at F O Movie RM least $50,000,000. W E E studioName = 'Paramount' AND year = 1979; HR *! Exercise 6.7.4 : For the view and query of Example 6.48: directly. 0 I a) Show the expression tree for the view Movieprod. I b) Show the expression tree for the query of that example. I c) Build from your answers to (a) and (b) an expression for the query in terms of base tables. (T year = 1979 AND smdioName = ' Paramount ' I Movie I d) Explain how to change your expression from (c) so it is an equivalent expression that matches the suggested solution in Example 6.48. ! Exercise 6.7.5 : For each of the queries of Exercise 6.7.3, express the query and views as relational-algebraexpressions, substitute for the uses of the view in the Figure 6.22: Simplifying the query over base tables query expression, and simplify the resulting expressions as best you can. Write SQL queries corresponding to your resulting expressions on the base tables. Exercise 6.7.6 : Using the base tables 6.7.6 Exercises for Section 6.7 Classes(class, type, country, numGuns, bore, displacement) Exercise 6.7.1 : From the following base tables of our running example Ships (name, c l a s s , launched) Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
312 CHAPTER 6. THE DATABASE LAfVG U.4GE SQL 6.9. REFERENCES FOR CHAPTER 6 313 from Exercise 5.2.4: + Outerjoins: SQL provides an OUTER JOIN operator that joins relations but also includes in the result dangling tuples from one or both relations; a) Define a view BritishShips that gives for each ship of Great Britain its the dangling tuples are padded with NULL'S in the resulting relation. class, type, number of guns, bore, displacement, and year launched. + The Bag Model of Relations: SQL actually regards relations as bags of b) Write a query using your view from (a) asking for the number of guns and tuples, not sets of tuples. We can force elimination of duplicate tuples displacements of all British battleships launched before 1919. with the keyword DISTINCT, while keyword ALL alloxvs the result to be a bag in certain circumstances where bags are not the default. ! c) Express the query of (b) and view of (a) as relational-algebra exprt%sions, substitute for the uses of the view in the query expression, and simplify + Aggregations: The values appearing in one column of a relation can be the resulting expressions as best you can. summarized (aggregated) by using one of the keywords SUM, AVG (average value), MIN, MAX, or COUNT. Tuples can be partitioned prior to aggregation ! d) Write an SQL query corresponding to your expression from (c) on the with the keywords GROUP BY. Certain groups can be eliminated with a base tables Classes and Ships. clause introduced by the keyword HAVING. + Modification Statements: SQL allo~vsus to change the tuples in a relation. 6.8 Summary of Chapter 6 We may INSERT (add new tuples), DELETE (remove tuples)? or UPDATE (change some of the existing tuples); by writing SQL statements using + SQL: The language SQL is the principal query language for relational one of these three keywords. database systems. The current standard is called SQL-99 or SQL3. Com- mercial systems generally wry from this standard. + Data Definition: SQL has statements to declare elements of a database schema. The CREATE TABLE statement allows us to declare the schema for + Select-From-Where Queries: The most common form of SQL query has stored relations (called tables), specifying the attributes and their types, the form select-from-where. It allows us to take the product of several and default values. relations (the FROM clause), apply a condition to the tuples of the rcsult (t,heWHERE clause), and produce desired components (the SELECT rlausc). + Altering Schemas: TVe can change aspects of the database schema with an ALTER statement. These changes include adding and removing attributes + Subqueries: Select-from-where queries can also be used as subqucric+ from relation schemas and changing the default value associated with an within a WHERE clause or FROM clause of another query. The operator> attribute or domain. T e may also use a DROP statement to completely V EXISTS, IN, ALL, and ANY may be used to express boolean-valued con- eliminate relations or other schema element,^. ditions about the relations that are the result of a subquery in a WHERE clause. + Indexes: While not part of the SQL standard, comn~ericalSQL systems allow the declaration of indexes on attributes; these indexes speed up + Set Operations on Relations: We can take the union, intersection, or certain queries or modifications that involve specification of a value for difference of relations by connecting the relations, or connecting queries the indexed attribute. defining the relations, with the keywords UNION, INTERSECT,and EXCEPT. respectively. + Views: -1view is a definition of how one relation (the view) nlay be constructed from tables stored in the database. T'iews may be queried as 4 Join Expressions: SQL has operators such as NATURAL JOIN that may be if they were stored relations, and an SQL svstem modifies queries about a applied to relations, either as queries by themselves or to define relation. view so the query is instead about the base tables that are used to define in a FROM clause. the view. + l h l l Values: SQL provides a special value NULL that appears in compo- nents of tuples for which no concrete value is available. The arithmetic 6.9 References for Chapter 6 and logic of NULL is unusual. Comparison of any value to NULL, even another NULL, gives the truth value UNKNOWN. That truth value, in turn. The SQL2 and SQL-99 standards are published on-line via anonymous FTP. behaves in boolean-valued expressions as if it were halfway between TRUE The primary site is f tp: //j erry .ece .umassd. edu/isowg3, with mirror sites and FALSE. at ftp: //math0 .rnath.ecu.edu/isowg3 and ftp: //tiu. ac. jp/iso/wg3. In Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
314 each case the subdirectory is dbl/BASEdocs. As of the time of the printing of this book, not all sites were accepting F T P requests. ?fie shall endeavour to keep the reader up t o date on the situation through this book's iVeb site (set the Preface). Several books are available that give more details of SQL programming. Some of our favorites are [2], [4], and [6]. [5] is an early exposition of the recent SQL-99 standard. SQL was first defined in [3]. I t was implemented as part of System R [I], one of the first generation of relational database prototypes. 1. Astrahan, 14.h,1. et a]., "System R: a relational approach to data manage- ment," ACM Trcmsactions on Database Systems 1:2, pp. 97-137, 1976. 2. Celko, J., SQL for Smarties, Morgan-Icaufmann, San Francisco, 1999. Constraints and Triggers 3. Chamberlin, D. D., e t a]., "SEQUEL 2: a unified approach to data defi- nition, manipulation, and control," IBhl Journal of Research and Devel- opment 20:6, pp. 560-575, 1976. In this chapter we shall cover those aspects of SQL that let us create "active" 3. Date, C. J. and H. Darwen, A Guide to the SQL Standard, .4dtlisc,ll- elements. An active element is an expression or statement that we write once, Wesley, Reading, SIA, 1997. store in the database, and expect t h e element to execute a t appropriate times. The time of action might be when a certain event occurs, such as a n insertion 3. Gulutzan, P. and T. Pelzer, SQL-99 Complete, Really, R&D Books, La\\-- into a particular relation, or it might be whenever the database changes s o that rence, I