I am not alone in
thinking that the treatment of view updating within the overall theory of
relational databases has always been rather unsatisfactory for one reason or
another. As Nat Goodman puts it:
"... there is no theory behind any of this. Each [rule] seems intuitively correct, but
there is no overall framework. It would
be better to have a general rule that states what a correct view update
algorithm has to do, and then derive the special rules for each case from that
general rule. Without this, the [rules]
feel like a crazy patchwork of exceptions and special notes. In some cases, two or more [rules] are
equally sensible ..." (and so on).
In the paper that follows, we (i.e., David McGoveran and
myself) present such a general rule. We
then derive in this and subsequent sections the special rules for unions,
intersections, differences, joins and the other relational operators from that
general rule.
It is perhaps of interest to say that these proposals grew
out of my attempts to come up with a view updating scheme that would treat the
semantically equivalent views V1 = A INTERSECT B and V2
= A MINUS (A MINUS B) equivalently--in particular, taking
into account the fact that the definition of V1 is symmetric in A
and B while the definition of V2 is not. The key observation (obvious, after the
fact!) turned out to be that at all times, every table T must satisfy the
table predicate that applies to T.
It follows (to spell it out in detail) that when an update is performed
on any table T:
Before the update, table T cannot include any
row(s) that cause the table predicate for T to be violated.
After the update, table T cannot include any
row(s) that cause the table predicate for T to be violated.
Thus, for example, if a row r is presented for
insertion into V = A MINUS B, then row r:
Ø
Must satisfy the predicate for A (for otherwise it
cannot appear in A and hence cannot appear in A MINUS B a fortiori);
Ø
Must not satisfy the predicate for B (for otherwise it
might already appear in B, in which case inserting it into A will cause it not
to appear in A MINUS B);
Ø
Must not already appear in A (for otherwise we will be
attempting to insert a duplicate row);
Ø
Cannot already appear in B (because it does not satisfy
the predicate for B).
Note: The
foregoing statements are slightly simplified; as the body of the paper
explains, it's not really rows that have to satisfy predicates, it's tables. Also, regarding the second of the bullet
items above, the alert reader might suggest that if r does already
appear in B, the desired effect (of inserting r into A
MINUS B) might be achieved by deleting it from B before
(if necessary) inserting it into A.
We reject this possibility for reasons explained in the body of the
paper (see in the section "Further Principles," especially Principle
No. 7).
Portions of this material previously appeared in somewhat
different form elsewhere. Also--to repeat from the "Comments on Republication"--I
am grateful to David McGoveran for allowing me to republish the material in the
present book.
Introduction
Historically, the question of view updatability has typically
been treated in a fairly ad hoc manner.
Certainly this is the case:
Ø
In the SQL standard
Ø
In today's commercial SQL products
Ø
Even--to some extent--in the more research-oriented
technical literature
It is not at all
unusual, for example, to find that a given DBMS will
Ø Prohibit
updates on a view that is logically updatable, or
Ø
Permit updates on a view that is logically not
updatable, or
Ø
Implement view updates in a logically incorrect way,
or--most likely in practice!--do all of the above, depending
on circumstances.
The lack of a
systematic approach to the problem is further underscored by the undue emphasis
that has historically been laid on restriction, projection, and join
views. Union, intersection, and
difference views, which are--or should be--at least as important in practice,
have received comparatively little attention.
The authors of this article have recently developed a
systematic and formal approach to the view updating problem. In this first part we present an informal
introduction (emphasis on "informal"!) to that formal approach. We also illustrate the approach as it
applies to union, intersection, and difference views specifically. The second
part will deal with joins and other kinds of views.
1. Integrity Constraints
Before we can get into the specifics of our approach, we need
to lay some groundwork. The first
point--and this one is absolutely crucial--is that every table has an
interpretation or meaning. In
order to explain this point, we must first digress for a moment to consider the
general issue of integrity constraints.
For the purposes of this discussion, it is convenient to classify such
constraints into four kinds, namely domain constraints, column constraints,
table constraints, and database constraints, as follows (this classification is
slightly different (but not dramatically so) from that previously given by
Date.
A domain constraint
is just the definition of the set of values that go to make up the domain in
question; in other words, it effectively just enumerates the values in the
domain (We mention domain constraints merely for completeness--they play no
part in the view updating scheme that is the primary focus of this paper).
A column constraint states that the values appearing in a
specific column must be drawn from some specific domain. For example, consider the employees base
table (see Fig. 1 for a sample tabulation).
EMP
{EMP#,ENAME,DEPT#,SAL}
The columns of that
table are subject to the following column constraints:
+----------------------------+
¦ EMP# ¦ ENAME ¦ DEPT# ¦ SAL ¦
+------+-------+-------+-----¦
¦ E1
¦ Lopez ¦ D1 ¦ 25K ¦
¦ E2 ¦ Cheng ¦ D1 ¦ 42K ¦
¦ E3
¦ Finzi ¦ D2 ¦ 30K ¦
¦ E4
¦ Saito ¦ D2 ¦ 45K ¦
+----------------------------+
Fig. 1: Base table EMP (sample
values)
e.EMP# IN EMP#_DOM
e.ENAME IN NAME_DOM
e.DEPT# IN DEPT#_DOM
e.SAL IN US_CURRENCY_DOM
Here e represents an arbitrary row of the table and
EMP#_DOM, NAME_DOM, etc., are the names of the relevant domains (throughout
this paper we use a modified form of conventional SQL syntax, for reasons of
simplicity and explicitness.)
A table constraint states that a specific table must satisfy
some specific condition, where the condition in question refers solely
to the table under consideration--i.e., it does not refer to any other table,
nor to any domain. For example, here
are two table constraints for the base table EMP:
1. IF e.DEPT# = 'D1' THEN e.SAL < 44K
2. IF e.EMP# = f.EMP# THEN e.ENAME =
f.ENAME
AND e.DEPT# = f.DEPT#
AND e.SAL = f.SAL
The first of these says that employees in department D1 must
have a salary less than 44K. The second
says that if two rows e and f have the same EMP# value, then they
must also have the same ENAME value, the same DEPT# value, and the same SAL
value--in other words, they must be the same row (this is just a longwinded way
of saying that EMP# is a candidate key).
Note: Observe that
we talk of table constraints in general, not just base table constraints. The point is, all tables, base or
otherwise, are subject to table constraints, as we will see later.
For the purposes of this paper we define a database to be
some user-defined--or DBA-defined--collection of named tables (base tables
and/or views). A database constraint,
then, states that the database in question must satisfy some specific
condition, where the condition in question can refer to as many named tables as
desired. For example, suppose the
database containing base table EMP were extended to include a departments base
table DEPT. Then the referential
constraint from EMP to DEPT would be a database constraint (referring, as it
happens, to exactly two base tables).
Here is another example of a database constraint (also
referring to the same two base tables):
d.BUDGET > 2 * SUM (e WHERE e.DEPT# = d.DEPT#,SAL)
Here d and e represent an arbitrary DEPT row
and an arbitrary EMP row, respectively.
The constraint says that every department has a budget that is at least
twice the sum of all salaries for employees in that department.
Note: Unlike column constraints and base table constraints, which can
always be checked immediately (i.e., after each individual update
operation), database constraints must--at least conceptually--be deferred
(i.e., checked at end-of-transaction).
In practice there will be many cases where database constraints too can
be checked immediately, but such "early" checking should be regarded
as nothing more than an optimization. [Ed. Note:
Chris Date has changed his mind and now believes that all
checking should be immediate. See an allusion to this in the series on Constraints
and Predicates Parts 1, 2, and 3.
(The authors would
like to thank Nagraj Alur, Hugh Darwen, Fabian Pascal, and Paul Winsberg for
their helpful comments on earlier drafts of this article.)
Comments On Republication: Originally published in Database
Programming & Design 7, No. 6 (June 1994) and published as a two-part
article in RELATIONAL
DATABASE WRITINGS1991-94. It is republished here by permission of David
McGoveran, Miller Freeman Inc. and Pearson
Education, Inc. © All rights reserved by C.J. Date. Research has shown
that certain detail level corrections might be needed, which we may undertake
in the future. However, we still believe strongly that the overall approach is
sound.
(Continued in Part 2)
Posted
11/15/02
[ABOUT]
[QUOTES]
[LINKS]