Slide 1 - PDF 免费下载

Chapter 15 Outline Informal Design Guidelines for Relation Schemas Functional Dependencies Normal Forms Based on Primary Keys General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form

Chapter 15 Outline (cont d.) Multivalued Dependency and Fourth Normal Form Join Dependencies and Fifth Normal Form

Introduction Levels at which we can discuss goodness of relation schemas Logical (or conceptual) level (how users interpret the relation schemas and the meaning of their attributes.) Implementation (or physical storage) level (how the tuples in a base relation( which will be physically stored as file ) are stored and updated) a base relation ) Approaches to database design: Bottom-up or top-down

Informal Design Guidelines for Relation Schemas Measures of quality Making sure attribute semantics are clear Reducing redundant information in tuples Reducing NULL values in tuples Disallowing possibility of generating spurious tuples these measures are not always independent of one another( 相互独立 ),as we shall see.

Imparting Clear Semantics to Attributes in Relations Semantics of a relation Meaning resulting from interpretation of attribute values in a tuple Easier to explain semantics of relation Indicates better schema design IN general, the easier it is to explain the semantics of the relation,the better the relation schema design will be. To illustrate this consider the following fig 15.1

Continues

Guideline 1 Design a relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation Example of violating Guideline 1: Figure 15.3

Guideline 1 (cont d.)

Redundant Information in Tuples and Update Anomalies Grouping attributes into relation schemas Significant effect on storage space Storing natural joins of base relations leads to update anomalies Types of update anomalies: Insertion Deletion Modification Compare fig 15.2 with fig 15.4

Redundant Information in Tuples and Update Anomalies Insertion anomalies can be diffenrentiated into two types : to insert a new employee tuple into emp_dept,we must include either the attribute values for the department that the employee works for,or null, (consistent with values for department 5 in other tuples It is difficult to insert a new department that has no employees as yet in the emp_dept relation. Deletion anomalies. If we delete from emp_dept an employee tuple that happens to represent the last employee working for a particular department,the information concerning that department is lost from the database.(which is related to the second insertion anomally situation discussed earlier Modification anomalies :in emp_dept,if we change the value of one of the attributes of a particular department-say, the manage of department 5 we must update the tuples of all employees who work in that department ;otherwise,the database will become inconsistent.

Guideline 2 Design base relation schemas so that no insertion, deletion or modification(update (anomalies are present in the relations If any anomalies are present: Note them clearly Make sure that the programs that update the database will operate correctly Note :It is important to note that these guidelines may sometimes have to be violated in order to improve the performance of certain queries.(an important query concering the department of an employee along with employee attributes,the emp_dept schema may be used as a base relation)

NULL Values in Tuples In some design, we may group many attributes together into a fat relation Can end up with many NULLs Problems with NULLs Wasted storage space Problems with understanding meaning

Guideline 3 Avoid placing attributes in a base relation whose values may frequently be NULL() If NULLs are unavoidable: Make sure that they apply in exceptional cases only, not to a majority of tuples

Generation of Spurious Tuples Figure 15.5(a) Relation schemas EMP_LOCS and EMP_PROJ1(can be used instead of the single emp_proj(fig15.3b) NATURAL JOIN Result produces many more tuples than the original set of tuples in EMP_PROJ(fig15.6) Called spurious tuples Represent spurious information that is not valid

Guideline 4 Design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign keys in a way that Guarantees that no spurious tuples are generated Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

Summary and Discussion of Design Guidelines Anomalies cause redundant work to be done Waste of storage space due to NULLs Difficulty of performing operations and joins due to NULL values Generation of invalid and spurious data during joins

Functional Dependencies Formal tool for analysis of relational schemas Enables us to detect and describe some of the above-mentioned problems in precise terms Theory of functional dependency

关系模式的形式化定义关系模式由五部分组成, 即它是一个五元组 : R(U, D, DOM, F) R: 关系名 U: 组成该关系的属性名集合 D: 属性组 U 中属性所来自的域 DOM: 属性向域的映象集合 F: 属性间数据的依赖关系集合

关系模式的简化关系模式 R(U, D, DOM, F) 简化为一个三元组 : R(U, F) 当且仅当 U 上的一个关系 r 满足 F 时,r 称为关系模式 R(U, F) 的一个关系

关系数据库逻辑设计问题构造几个关系模式? 每个关系由哪些属性组成?

Definition of Functional Dependency A FD is a constraint between two sets of attributes from the database we also say there is a functional dependency from X to Y,or that Y is functionally dependent on X A FD is a property of semantics or meaning of the attributes Legal relation states Satisfy the functional dependency constraints

函数依赖规范化设 R(U) 是属性集 U 上的关系模式,X, Y U, r 是 R(U) 上的任意一个关系, 如果成立对 t, s r, 若 t[x] = s[x], 则 t[y] = s[y] 那么称 X 函数决定 Y, 或 Y 函数依赖于 X, 记作 X Y 称 X 为决定因素, 即对于关系模式 R 中的属性子集 X 的每一个值, 任何时刻只有一个确定的 Y 值与之对应如 S# SN, (S#,C#) G

规范化函数依赖是语义范畴的概念只能根据语义来确定一个函数例如姓名年龄这个函数依赖只有在没有同名人的条件下成立如果允许在同一关系中有相同姓名存在则年龄就不再依赖于姓名了如果设计人员限定不允许相同姓名出现则姓名年龄函数依赖就成立实际上可以通过分析属性之间的联系来确定函数的依赖关系在关系模式R中设X和Y为属性集U 的子集

Definition of Functional Dependency (cont d.) Given a populated relation Cannot determine which FDs hold and which do not Unless meaning of and relationships among attributes known Can state that FD does not hold if there are tuples that show violation of such an FD

规范化如果X和Y之间的联系是1 1 则存在函数依赖X Y和Y X 例如课程号C#与课程名CN 如果X和Y之间的联系是m 1 则它们之间只存在函数依赖X Y 例如一个教师可上几门课则C#与教师Tname之间是m 1 所以C# Tname 如果X和Y之间的联系是m n 则它们之间不存在函数依赖例有一关系模式R 学号姓名出生日期系编号系负责人在R的关系r中存在着如下函数依赖学号姓名每个学号只能对应一个姓名学号出生日期每个学生只能对应一个出生日

几种函数依赖函数依赖平凡函数依赖与非平凡函数依赖如果 X Y, 但 Y X, 则称其为非平凡的函数依赖, 否则称为平凡的函数依赖如 (S#,SN) SN 是平凡的函数依赖对于任一关系模式, 平凡函数依赖都是必然成立的, 它不反映新的语义, 因此若不特别声明, 我们总是讨论非平凡函数思考 : 一个关系模式有 n 个属性, 那么在它上面成立的所有可能的函数依赖有多少个? 非平凡的函数依赖又有多少个?

函数依赖完全函数依赖与部分函数依赖在 R(U) 中, 如果 X Y, 且对于任意 X 的真子集 X, 都有, 则称 Y 对 X 完全函数依赖, 记 X Y 作 X f Y (S#,C#) (S#,C#) f p G SN

函数依赖例 : 有一关系模式 S( 学号, 姓名, 系名称, 出生日期 ), 在 S 中存如下完全函数依赖 : 学号系名称学号 f f 出生日期若无重名则存在学号姓名函数依赖

函数依赖定义在关系模式中如果X Y X Y p X Y 则称X对Y部分函数依赖记作例有一关系模式SC 学号课程号成绩教师编号 f 学号课程号成绩在SC中因为学号课程号教师编号课程号教师编号 p 教师编号因此学号课程号

函数依赖传递函数依赖在 R(U) 中, 如果 X Y,Y Z,Y X, 且 Z Y T 则称 Z 对 X 传递函数依赖 :X Z S# SD,SD DEAN T S# DEAN

函数依赖例 : 在关系模式 R( 学号, 姓名, 出生日期, 系编号, 系负责人 ) 中, 有如下函数依赖 : 学号系编号, 系编号学号, 系编号系负责人, 因此, 在 R 中存在传递函数依赖, 学号 T 系负责人

范例关系模式 S(S#, SN, SD, DEAN, C#, G) 主码 :(S#,C#) 函数依赖 : (S#,C#) f S# SN,(S#,C#) SN G p p S# SD,(S#,C#) SD SD DEAN

Armstrong 公理系统推理规则系统正确的完备的推理规则集公理定理推论 ( 如欧几里德集合 ) Armstrong 公理设有关系模式 R(U,F), 属性集 X,Y,Z,W U 自反律 (reflexivity): 若 Y X, 则 X Y 增广律 (augmentation): 若 X Y, 则 XZ YZ 传递律 (transitivity): 若 X Y,Y Z, 则 X Z

Armstrong 公理系统由 Armstrong 公理导出的推理规则合并律 (union rule) 若 X Y,X Z, 则 X YZ 分解律 (decomposition rule) 若 X YZ, 则 X Y,X Z 伪传递律 (pseudotransitivity rule) 若 X Y,WY Z, 则 WX Z

Normal Forms Based on Primary Keys Normalization process Approaches for relational schema design Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations Design relations based on external knowledge derived from existing implementation of files or forms or reports Following either of these approaches, it is then useful to evaluate the relations for goodness and decompose them further as needed to achieve higher normal forms,using the normalization theory presented in this chapter and the next.

Normalization of Relations The normalization process,takes a relation schema through a series of tests Certify whether it satisfies a certain normal form The process which proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms ( 遵循 / 按照标准 )and decomposing relations as necessary, can thus be considered as relational design by analysis

Normalization of Relations(cont;) Normalization of data: can be looked upon as a process of analyzing the given relation schemas based on their FDs and primariy keys to achieve the desirable properties of (1) minimizing redundancy (2) minimizing the insertion,deletion and update anomalies discussed before Normal form tests: unsatisfactory relation schemas that don t meet certain conditions, are decomposed into smaller relation schemas that meet the tests and hence posses the desirable properties.

Normalization of Relations(cont) The normalization procedure provides database designers with the following : a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes a series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree.

Normal forms don t guarantee a good database design,the process of normalizaion through decomposition must also confirm the existence of additional properties,taken toghter,should possess Properties that a good database design for the relational schemas should have: The lossless join or Nonadditive join property( 非加连接性质 )which guarantees that the spurious tuple generation problem discussed before doesn t occur with respect to the relation schemas created after decomposition Dependency preservation property( 依赖保持属性 ) which ensures that each functional dependency is represented in some individual relation resulting after decompositon The Nonadditive join property is extremely critical, and must be achieved at any cost whereas, the dependency preservation property although desirable is sometimes sacrificed for other factors

Practical Use of Normal Forms Normalization is carried out in practice Resulting designs are of high quality and meet the desirable properties stated previously Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF Another point worth noting is that the database designers need not to normalize to the highest possible normal form

Definitions of Keys and Attributes Participating in Keys( 键定义和键中的属性 ) Definition of superkey and key Candidate key If more than one key in a relation schema One is primary key Others are secondary keys

First Normal Form Part of the formal definition of a relation in the basic (flat) relational model(1nf disallows relations within relations ) Only attribute values permitted are single atomic (or indivisible) values Techniques to achieve first normal form Remove attribute and place in separate relation (eg dnumber,dlocation) Expand the key(eg fig 15.9 c ) Use several atomic attributes(a maximum is known, dlocation1,dlocation2.,dlocation3,[null,how to qurey,spuriou

First Normal Form (cont d.) Does not allow nested relations Each tuple can have a relation within it(eg fig 15.10 a) To change to 1NF: Remove nested relation attributes into a new relation and propagate the primary key into it This procedure can be applied recursively Unnest relation into a set of 1NF relations Eg person (ss#,{car_lic#},{phone#}),the set braces {} indentify the attributes as multivalued

Second Normal Form Based on concept of full functional dependency Versus partial dependency Second normalize into a number of 2NF relations Nonprime attributes are associated only with part of primary key on which they are fully functionally dependent

Third Normal Form Based on concept of transitive dependency Problematic FD Left-hand side is part of primary key Left-hand side is a nonkey attribute

General Definitions of Second and Third Normal Forms

General Definitions of Second and Third Normal Forms (cont d.) Prime attribute Part of any candidate key will be considered as prime Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a relation

General Definition of Second Normal Form

General Definition of Third Normal Form

Boyce-Codd Normal Form Every relation in BCNF is also in 3NF A relation in 3NF is not necessarily in BCNF Difference: Condition which allows A to be prime is absent from BCNF Most relation schemas that are in 3NF are also in BCNF

Ideally,relational database design should strive to achieve BCNF or 3NF for every relation schema. Achieving the normalization status of just 1NF or 2NF is not considered adequate, since they are developed historically as stepping stones to 3NF and BCNF.

Another example

Another example FD1 : {student,course} Instructor FD2: Instructor course each instructor teaches one course Decomposition of this relation schema into two schemas is not straightforwad because it may be decomposed into one of the three following possible pairs: 1 {student,course} and {student, Instructor} 2 {course, Instructor} and {course,student} 3 {Instructor, course,} and {Instructor,student} All three decompositions lose the functional dependency FD1,The desirable decomposition of those just shown is 3,because it will not generate spurious tuples after a join.

Multivalued Dependency and Fourth Normal Form Multivalued dependency (MVD) Consequence of first normal form (1NF)

Multivalued Dependency and Fourth Normal Form (cont d.) Relations containing nontrivial MVDs All-key relations Fourth normal form (4NF) Violated when a relation has undesirable multivalued dependencies

Join Dependencies and Fifth Normal Form Join dependency Multiway decomposition into fifth normal form (5NF) Very peculiar semantic constraint Normalization into 5NF is very rarely done in practice

Join Dependencies and Fifth Normal Form (cont d.)

5.2.8 关系模式的规范化关系模式规范化的基本步骤如下图所示 1NF 消除决定属性消除非主属性对关键字的部分函数依赖集非码的非平 2NF 凡函数依赖消除非主属性对关键字的传递函数依赖 3NF 消除主属性对关键字的部分和传递函数依赖 BCNF 消除非平凡且非函数依赖的多值依赖 4NF 消除不是由候选键所蕴含的连接依赖 5NF

5.2.8 关系模式的规范化 ①对1NF关系进行投影消除原关系中非主属性对关键字的部分函数依赖将1NF关系转换为若干个2NF关系 ②对2NF关系进行投影消除原关系中非主属性对关键字的传递函数依赖从而产生一组3NF ③对3NF关系进行投影消除原关系中主属性对关键字的部分函数依赖和传递函数依赖也就是使决定属性成为投影的候选键得到一组 BCNF

5.2.8 关系模式的规范化以上三步也可以合并一步对原关系进行投影消除决定属性不是候选键的任何函数依赖 ④对BCNF关系进行投影消除原关系中非平凡且非函数依赖的多值依赖从而产生一组4NF关系 ⑤对4NF关系进行投影消除原关系中不是由候选键所蕴含的连接依赖即可得到一组5NF 5NF是最终范式规范化程度过低的关系可能会存在插入异常删除异常修改复杂数据冗余等问题需要对其进行规范化转换成高级范式但并不意味着规范化程度越高的关系模式就越好在设计数据库模式结构时必须对现实世界的实际情况和用户需求作进一步分析确定一个合适的能够反映现实世界的模式这也就说上面的规范化步骤可以在其中任何一步终止

盘点任何一个二目关系模式 R(A,B) 一定属于 BCNF 吗? 一定属于 4NF 吗? 关系模式 R(A,B,C),AB C 一定成立吗? 一个全是主属性的关系模式一定可以达到第几范式? 一个全码的关系模式一定可以达到第几范式?

Summary Informal guidelines for good design Functional dependency Basic tool for analyzing relational schemas Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF