ON NORMALIZATION, PERFORMANCE AND INTEGRITY
with Fabian Pascal

 

 

 

From: MG

To: Editor

 

I'd like to comment on your other recent articles: on denormalization. Of course you prove that denormalization does not improve performance, because you pay for it by maintaining integrity. But, when people say that de-normalization improves performance, they usually mean just on one side. For example, I can merge DEPT and EMP tables into a third table DE and achieve a better query performance by replacing a join by a simple select from the new table. If this is the most frequent and most important operation in my application (vs. updates, inserts, deletes), then my overall performance will be improved (and that's what usually happens in DW). But if the opposite is true, then performance will suffer. I didn't see these considerations in your articles.

 

 

From: Fabian Pascal

To: MG

 

The article makes it clear that the reason you may get better performance is only because you ignore integrity, you trade it off for performance. Now, if you do it consciously, by accepting the risk it's one thing; but the problem is that most practitioners are not even aware that there's a risk, they all think that there is no price to be paid for denormalization.

 

Poor performance is due to physical implementation factors: physical design, DBMS, hardware, network loads, etc. Normalization by definition has nothing to do with it, it's impossible. The point is that users should stop blaming normalization and trying to avoid it and should demand better products and learn proper design. As long as they think the way you do, they will never get anything better.

 

This new technology I mentioned demonstrates why (a) it is wrong to blame normalization for performance [a correct implementation can have much better performance then current products, without sacrificing normalization and integrity] and [(b) if users continue to confuse the logical and physical levels] and accept denormalization as a valid solution, they will never get it deployed. I cannot say more about it yet due to legal reasons.

 

 

From: MG

MIME-Version: 1.0

Content-type: text/plain; charset=us-ascii

Status: U

X-UIDL: PUWyB9HkIfdPEQE

 

Many people, yes, but not nobody. I always considered the effects of denormalization. I know many people in this field that do the same; however, I do agree with you that many people, especially those "younger" ones learning from more "modern" books on database design, especially those in the OO field, are not aware, and what's worse, don't even want to be aware.

 

That's exactly how I always thought and when I had discussions with people, that's what I always said to them (not that it made a big difference in their thinking). However, when I read your articles on this topic, I had another thought. As you always say (and again, I fully agree with you on this), we must always separate logical and physical. I always considered denormalization as one of the things done at the physical level. So, denormalization shouldn't even be your concern, because it has nothing to do with the relational model. The rule I always follow is that whatever I do at the physical level, it should not destroy my logical model, which must stay normalized. If I denormalize to achieve some performance gains for a selected set of functions, then I do pay for it by writing additional logic to preserve the integrity and by creating views that represent the entities on my logical model, which I had to "destroy". So as long as I separate these two levels, I don't think I'm in any conflict with the relational model. Of course if DBMS gave me more options in physical design while protecting the integrity of my logical model, I wouldn't have to do this myself.

 

Theoretically, I think the way you do, and that's why I enjoy reading your columns. But I also have to deliver practical results to my users. Unfortunately, I can't go to my users and tell them that their response time is slow because of Oracle's technology. And I don't believe screaming at Oracle will do me any good either (and yes I know what you will say to this). So until that mysterious technology you mentioned many times is implemented, I have to do what I can.

 

 

From: Fabian Pascal

 

No, it is not [that important]. The important points are to realize that (a) it's physical implementation and not logical design that's the source of trouble; and (b) implementations forces you to trade integrity for performance. As it is, they think it's normalization that's the source of the problem and they are unaware of the cost or of the correct solution. Emphasizing the point you insist on would not achieve my objective of getting them to understand the real problem. I want people to realize that they are forced to trade integrity for performance and that the problem is products, not normalization or relational theory.

 

The vast majority doesn’t know and I have demonstrated this with dozens of articles where I quote both users and vendors (watch this space for a recent exchange that proves it yet again). Moreover, I have challenged in writings, lectures and seminars to specify the additional constraints for controlling redundancy and nobody has ever done so. They have no clue, because they don’t know relational. Were you able to come up with the join constraint in the article before you saw my explanation and constraint formulation?

 

Data independence has been a main objective of RM since day one, but that is another thing practitioners are ignorant of, because SQL products don't support it very well and induces them to confuse the two levels without realizing it.

 

You are correct in principle, however I would not use the term "denormalize at the physical level": Normalization is a purely logical concept and there is no such thing as physical normalization--another way to get people confused. Simply say that you must have complete freedom to choose any physical design and change it at will to maximize performance, without bastardizing your logical design and incur integrity risk. To the extent that you must do the latter, know that it's the product and other implementation details that are the culprits.

 

You cannot generalize from yourself. You are in a minuscule minority.

 

 

Posted 09/06/02

 

 

 

[ABOUT] [QUOTES] [LINKS]