MORE ON POFN AND POOD
with Fabian Pascal and Hugh Darwen

 

 

 

From: Michael Osuna

Date: 28 Feb 2006

 

In response to Hugh Darwen's second response to Brian Selzer in On POFN and POOD [...], I believe that all the confusion is merely do to the way you denote relations.  When you write

 

ENROLLMENT (StudentId,Name,CourseId )

 

you actually imply a whole slew of constraints. If we choose an alternative notation we can make these explicit ... A domain is a set of values.  A function is a mapping from one set of values to another ...

 

 

From: Fabian Pascal

To: Michael Osuna

Cc: Hugh Darwen

 

I will let Hugh respond. I will just note that work is being done along these lines. Stay tuned to dbdebunk.

 

 

From: Hugh Darwen

To: Michael Osuna

 

I agree ... Indeed, you are echoing one of the discomforts (with the way normalization theory is presented) I have had myself for many years; and I thought I had at least tacitly expressed this discomfort in my reply to Brian Selzer.

 

I'll stay with my example, but excuse me for reverting to my (UK) spelling of it (what right did you have to change it? he asks indignantly! :-):

 

ENROLMENT { StudentId, Name, CourseId }

 

A decomposition into CALLED { StudentId, Name } and IS_ENROLLED_ON { StudentId, CourseId} requires some constraints to be expressed to maintain logical equivalence with ENROLMENT, exactly as Michael Osuna details.

 

But the teaching of normalization has always included teaching the problem referred to as "update anomaly", and how normalization addresses such anomalies.  An update anomaly pertaining to ENROLMENT is that "deleting the last of the courses on which a particular student is enrolled deletes all record of that student's name".  Another (really the same one) is that "we can't record a student's name until that student enrolls on some course".  Given those "anomalies", I always wanted to ask, screamingly, "In that case, why did we propose ENROLMENT in the first place?  If a student can be named without being enrolled, then ENROLMENT is not a true reflection of that part of our enterprise that we wish our database to record information about; it expresses constraints that are not constraints out there in the real world!"

 

That said, if it really is the case that we want to avoid recording names of students who are not enrolled, then we will very likely still want to decompose, to address the other problem that normalization addresses: redundancy.

 

Hence my preferred approach to (relational) database design: start with 6NF (guaranteed to avoid the kind of redundancy the NFs address, also guaranteed to avoid "update anomalies", also very easy to understand) and then consider opportunities for expressing certain constraints conveniently by denormalizing! 

 

E.g., if every student that has a student id also has a name, and every student that has a student id and a birth date also has a name, and every student that has a student id and a name also has a birth date, then we can save ourselves a lot of writing—and probably the system a lot of work—by joining the three 6NF relvars together into one.

 

But the old advice would remain, in general: in so denormalizing, do not go so far as to violate 5NF.

 

 

From: Fabian Pascal

To: Hugh Darwen

 

Well, if you read my writings on normalization, you will see that I keep repeating that correct database design that accurately reflects the conceptual model will yield fully normalized (5NF) databases. Normalization as such is pertinent only as a repair of poorly designed databases.

 

And also (to reiterate): update anomalies and redundancy are not the only drawbacks of databases that are not fully normalized (that is, poorly designed). I counted several others: database application bias, harder to understand databases, more complex queries, and harder to interpret results.

 

 

From: Michael Osuna

To: Fabian Pascal

 

I disagree with the statement: "correct database design that accurately reflects the conceptual model will yield fully normalized (5NF) databases".  If you take "correct" and "accurately" to mean that the database design is logically equivalent to the conceptual  model.

 

I think a counter example to your claim would be that this:

 

IS_ENROLLED ( StudentId,CourseId )

IS_NAMED ( StudentId,Name )

 

is logically equivalent to this:

 

IS_ENROLLED ( StudentId,Name,CourseId )

IS_NAMED ( StudentId,Name )

 

with the additional constraint that IS_ENROLLED's Name and IS_NAMED's Name are equal when joined.  Of course the latter example is almost certainly "poor" design intuitively, and is definitely "poor" design if being in 5NF is the definition of "good" design.  But it is correct and accurate.

 

A relation is essentially shorthand for specifying a number constraints (and other things).  In fact, I argue that the normalization of relations into simpler relations is logically equivalent to explicitly specifying the constraints that being a relation implies.

 

In summary, I argue that the Normal Forms are a formal way to say this design is "good" and that design is "poor", but that it does not necessarily follow that a "correct" and "accurate" design will be a "good" design. Further, I argue that other formalizations of "good" can be made, and they could include models that measure the logical cost of certain operations on the database and seek to minimize them.  In some of these models my "poor" design may be the "good" one.

 

 

From: Fabian Pascal

To: Michael Osuna

 

Well, I could not in an email expand on what I meant by "correct" and "accurate". It is explained, however, in my papers, particularly #2, which is now being updated.

 

Ed. Note: I agree with most of the message, but I do not know what the last paragraph means. An explanation is not invited, though.

 

 

From: Hugh Darwen

To: Fabian Pascal

 

Yes, I agree with all those too, Fabian, of course.   But I don't agree that 5NF is "fully normalized".  As a matter of fact, I'm not even sure that 6NF is!

 

Consider

 

S {S#,City,Status,Name }

 

In 6NF this becomes

 

SC { S#,City }

SS { S#,Status }

SN { S#,Name }

 

But what if a supplier can be registered—i.e., allocated a number—who (as yet) has no name, no status, and no city?  (Silly example, but one can think of better ones to illustrate the point.)

 

I know what a proper n-ary relation represents in the real world. I do not know what SC, SS and SN do. I know what you will say they represent, but it is a convincing and practical way of representing the real world.

 

What do you mean by "proper n-ary"?  What's an improper one?  I hope you don't mean that n must be >2 to be proper!

 

More importantly, what do you propose in lieu of my 6NF design in the case where those various properties of suppliers really can exist independently of each other over time?  And why isn't the 6NF design practical under those circumstances?

 

 

From: Fabian Pascal

To: Hugh Darwen

 

Sorry, I misspoke. I prefer R-table to relation (explained in my paper #1, Truly Relational: What It Really Means), and I meant to say a proper R-table—a table that obeys the relational discipline.

 

Perhaps it would be best to wait for the update of my paper #2, Conceptual Modeling and Database Design that I am currently working on to see what we propose. I will just say that (a) a collection of properties defines a class of possible entities, subsets of which occur at various times (b) entities are collections of values of those properties which are not independent. Within this framework I do not know what "properties of suppliers existing independently of each other" means.

 

[If the example involves entities with unknown property values] the solution is described in my paper #4, The Final NULL in the Coffin. [If your example involves entities with variable collections of properties at different times] the solution is proper entity super/sub types [a paper on that may be forthcoming].

 

Ed. Note: the SC/SS/SN design violates the POOD by splitting an entity into three R-tables and duplicating key values.

 

 

From: Hugh Darwen

To: Fabian Pascal

 

No, I don't talk about properties having UNKNOWN values..

 

I consider "atomic" predicates, such as:

 

"Supplier S# is registered."

"Supplier S# is named SNAME"

"Supplier S# is located in CITY"

 

"Student SID is enrolled on course CID"

"Staff member TID is assigned to student SID on course CID".

 

A predicate is atomic if its corresponding relation is irreducible.  A 6NF relation is irreducible.

 

Irreducible relvars can conveniently be combined (via join) only if the constraints we have been talking about hold true.  Note "conveniently".  They don't have to be combined if proper relational technology is available (it isn't).

 

Of course we can have SNC { S#, Sname, City } even if not every registered supplier has both a name and a city, but then that relvar will be usable only for those suppliers who do have both.  We will have to have additional relvars for those who have a name but no city, those who have a city but no name, and those who have neither name nor city.  That does not strike me as being very convenient, for obvious reasons.

 

I don't see how these observations can possibly be controversial, even in an SQL environment where alternative approaches are available—alternatives that are antirelational and troublesome, of course.

 

As you know, I did not agree with the approach you outlined (but did not flesh out) in the paper you mention.

 

 

From: Fabian Pascal

To: Hugh Darwen

 

Well, you and Chris eschew entities. I sympathize, but I don't think they can be avoided. [After what exactly (a) are the subjects of propositions (b) what exactly do key values uniquely identify? (c) whose are the attributes?]

Your design violates the POOD, but you don't accept it. [Ed. Note: Irreducibility or not, POOD violations introduce certain redundancy: the existence of an employee is stated multiple times].

 

Not exactly. Proper RDBMS support of entity type hierarchies would handle that situation transparently, without inconvenient user intervention.

 

Since I don't agree with your approach either, we will have to agree to disagree and wait until the formal foundation for our approach (David’s actually) can be published.

 

 

Posted 4/21/06

© Fabian Pascal 2000-2006 All Rights Reserved