Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
Chapter 15 Outline Informal Design Guidelines for Relation Schemas Functional Dependencies Normal Forms Based on Primary Keys General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form
Chapter 15 Outline (cont d.) Multivalued Dependency and Fourth Normal Form Join Dependencies and Fifth Normal Form
Introduction Levels at which we can discuss goodness of relation schemas Logical (or conceptual) level (how users interpret the relation schemas and the meaning of their attributes.) Implementation (or physical storage) level (how the tuples in a base relation( which will be physically stored as file ) are stored and updated) a base relation ) Approaches to database design: Bottom-up or top-down
Informal Design Guidelines for Relation Schemas Measures of quality Making sure attribute semantics are clear Reducing redundant information in tuples Reducing NULL values in tuples Disallowing possibility of generating spurious tuples these measures are not always independent of one another( 相互独立 ),as we shall see.
Imparting Clear Semantics to Attributes in Relations Semantics of a relation Meaning resulting from interpretation of attribute values in a tuple Easier to explain semantics of relation Indicates better schema design IN general, the easier it is to explain the semantics of the relation,the better the relation schema design will be. To illustrate this consider the following fig 15.1
Continues
Guideline 1 Design a relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation Example of violating Guideline 1: Figure 15.3
Guideline 1 (cont d.)
Redundant Information in Tuples and Update Anomalies Grouping attributes into relation schemas Significant effect on storage space Storing natural joins of base relations leads to update anomalies Types of update anomalies: Insertion Deletion Modification Compare fig 15.2 with fig 15.4
Redundant Information in Tuples and Update Anomalies Insertion anomalies can be diffenrentiated into two types : to insert a new employee tuple into emp_dept,we must include either the attribute values for the department that the employee works for,or null, (consistent with values for department 5 in other tuples It is difficult to insert a new department that has no employees as yet in the emp_dept relation. Deletion anomalies. If we delete from emp_dept an employee tuple that happens to represent the last employee working for a particular department,the information concerning that department is lost from the database.(which is related to the second insertion anomally situation discussed earlier Modification anomalies :in emp_dept,if we change the value of one of the attributes of a particular department-say, the manage of department 5 we must update the tuples of all employees who work in that department ;otherwise,the database will become inconsistent.
Guideline 2 Design base relation schemas so that no insertion, deletion or modification(update (anomalies are present in the relations If any anomalies are present: Note them clearly Make sure that the programs that update the database will operate correctly Note :It is important to note that these guidelines may sometimes have to be violated in order to improve the performance of certain queries.(an important query concering the department of an employee along with employee attributes,the emp_dept schema may be used as a base relation)
NULL Values in Tuples In some design, we may group many attributes together into a fat relation Can end up with many NULLs Problems with NULLs Wasted storage space Problems with understanding meaning
Guideline 3 Avoid placing attributes in a base relation whose values may frequently be NULL() If NULLs are unavoidable: Make sure that they apply in exceptional cases only, not to a majority of tuples
Generation of Spurious Tuples Figure 15.5(a) Relation schemas EMP_LOCS and EMP_PROJ1(can be used instead of the single emp_proj(fig15.3b) NATURAL JOIN Result produces many more tuples than the original set of tuples in EMP_PROJ(fig15.6) Called spurious tuples Represent spurious information that is not valid
Guideline 4 Design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign keys in a way that Guarantees that no spurious tuples are generated Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
Summary and Discussion of Design Guidelines Anomalies cause redundant work to be done Waste of storage space due to NULLs Difficulty of performing operations and joins due to NULL values Generation of invalid and spurious data during joins
Functional Dependencies Formal tool for analysis of relational schemas Enables us to detect and describe some of the above-mentioned problems in precise terms Theory of functional dependency
关系模式的形式化定义 关系模式由五部分组成, 即它是一个五元组 : R(U, D, DOM, F) R: 关系名 U: 组成该关系的属性名集合 D: 属性组 U 中属性所来自的域 DOM: 属性向域的映象集合 F: 属性间数据的依赖关系集合
关系模式的简化 关系模式 R(U, D, DOM, F) 简化为一个三元组 : R(U, F) 当且仅当 U 上的一个关系 r 满足 F 时,r 称为关系模式 R(U, F) 的一个关系
关系数据库逻辑设计问题 构造几个关系模式? 每个关系由哪些属性组成?
Definition of Functional Dependency A FD is a constraint between two sets of attributes from the database we also say there is a functional dependency from X to Y,or that Y is functionally dependent on X A FD is a property of semantics or meaning of the attributes Legal relation states Satisfy the functional dependency constraints
函数依赖 规范化 设 R(U) 是属性集 U 上的关系模式,X, Y U, r 是 R(U) 上的任意一个关系, 如果成立 对 t, s r, 若 t[x] = s[x], 则 t[y] = s[y] 那么称 X 函数决定 Y, 或 Y 函数依赖于 X, 记作 X Y 称 X 为决定因素, 即对于关系模式 R 中的属性子集 X 的每一个值, 任何时刻只有一个确定的 Y 值与之对应 如 S# SN, (S#,C#) G
规范化 函数依赖是语义范畴的概念 只能根据语义来确定 一个函数 例如 姓名 年龄 这个函数依 赖只有在没有同名人的条件下成立 如果允许在 同一关系中有相同姓名存在 则年龄就不再依赖 于姓名了 如果设计人员限定不允许相同姓名出 现 则 姓名 年龄 函数依赖就成立 实际上可以通过分析属性之间的联系来确定函数的 依赖关系 在关系模式R中 设X和Y为属性集U 的子集
Definition of Functional Dependency (cont d.) Given a populated relation Cannot determine which FDs hold and which do not Unless meaning of and relationships among attributes known Can state that FD does not hold if there are tuples that show violation of such an FD
规范化 如果X和Y之间的联系是1 1 则存在函数依赖X Y和Y X 例如 课程号C#与课程名CN 如果X和Y之间的联系是m 1 则它们之间只存 在函数依赖X Y 例如 一个教师可上几门课 则C#与教师Tname之间是m 1 所以C# Tname 如果X和Y之间的联系是m n 则它们之间不存 在函数依赖 例 有一关系模式R 学号 姓名 出生日期 系 编号 系负责人 在R的关系r中 存在着如下函数依赖 学号 姓名 每个学号只能对应一个姓名 学号 出生日期 每个学生只能对应一个出生日
几种函数依赖 函数依赖 平凡函数依赖与非平凡函数依赖 如果 X Y, 但 Y X, 则称其为非平凡的函数依赖, 否则称为平凡的函数依赖 如 (S#,SN) SN 是平凡的函数依赖 对于任一关系模式, 平凡函数依赖都是必然成立的, 它不反映新的语义, 因此若不特别声明, 我们总是讨论非平凡函数 思考 : 一个关系模式有 n 个属性, 那么在它上面成立的所有可能的函数依赖有多少个? 非平凡的函数依赖又有多少个?
函数依赖 完全函数依赖与部分函数依赖 在 R(U) 中, 如果 X Y, 且对于任意 X 的真子集 X, 都有, 则称 Y 对 X 完全函数依赖, 记 X Y 作 X f Y (S#,C#) (S#,C#) f p G SN
函数依赖 例 : 有一关系模式 S( 学号, 姓名, 系名称, 出生日期 ), 在 S 中存如下完全函数依赖 : 学号系名称 学号 f f 出生日期 若无重名则存在学号 姓名函数依赖
函数依赖 定义 在关系模式中 如果X Y X Y p X Y 则称X对Y部分函数依赖 记作 例 有一关系模式SC 学号 课程号 成 绩 教师编号 f 学号 课程号 成绩 在SC中 因为 学号 课程号 教师编号 课程号 教师编号 p 教师编号 因此 学号 课程号
函数依赖 传递函数依赖 在 R(U) 中, 如果 X Y,Y Z,Y X, 且 Z Y T 则称 Z 对 X 传递函数依赖 :X Z S# SD,SD DEAN T S# DEAN
函数依赖 例 : 在关系模式 R( 学号, 姓名, 出生日期, 系编号, 系负责人 ) 中, 有如下函数依赖 : 学号 系编号, 系编号 学号, 系编号 系负责人, 因此, 在 R 中存在传递函数依赖, 学号 T 系负责人
范例 关系模式 S(S#, SN, SD, DEAN, C#, G) 主码 :(S#,C#) 函数依赖 : (S#,C#) f S# SN,(S#,C#) SN G p p S# SD,(S#,C#) SD SD DEAN
Armstrong 公理系统 推理规则系统 正确的 完备的推理规则集 公理 定理 推论 ( 如欧几里德集合 ) Armstrong 公理 设有关系模式 R(U,F), 属性集 X,Y,Z,W U 自反律 (reflexivity): 若 Y X, 则 X Y 增广律 (augmentation): 若 X Y, 则 XZ YZ 传递律 (transitivity): 若 X Y,Y Z, 则 X Z
Armstrong 公理系统由 Armstrong 公理导出的推理规则 合并律 (union rule) 若 X Y,X Z, 则 X YZ 分解律 (decomposition rule) 若 X YZ, 则 X Y,X Z 伪传递律 (pseudotransitivity rule) 若 X Y,WY Z, 则 WX Z
Normal Forms Based on Primary Keys Normalization process Approaches for relational schema design Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations Design relations based on external knowledge derived from existing implementation of files or forms or reports Following either of these approaches, it is then useful to evaluate the relations for goodness and decompose them further as needed to achieve higher normal forms,using the normalization theory presented in this chapter and the next.
Normalization of Relations The normalization process,takes a relation schema through a series of tests Certify whether it satisfies a certain normal form The process which proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms ( 遵循 / 按照标准 )and decomposing relations as necessary, can thus be considered as relational design by analysis
Normalization of Relations(cont;) Normalization of data: can be looked upon as a process of analyzing the given relation schemas based on their FDs and primariy keys to achieve the desirable properties of (1) minimizing redundancy (2) minimizing the insertion,deletion and update anomalies discussed before Normal form tests: unsatisfactory relation schemas that don t meet certain conditions, are decomposed into smaller relation schemas that meet the tests and hence posses the desirable properties.
Normalization of Relations(cont) The normalization procedure provides database designers with the following : a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes a series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree.
Normal forms don t guarantee a good database design,the process of normalizaion through decomposition must also confirm the existence of additional properties,taken toghter,should possess Properties that a good database design for the relational schemas should have: The lossless join or Nonadditive join property( 非加连接性质 )which guarantees that the spurious tuple generation problem discussed before doesn t occur with respect to the relation schemas created after decomposition Dependency preservation property( 依赖保持属性 ) which ensures that each functional dependency is represented in some individual relation resulting after decompositon The Nonadditive join property is extremely critical, and must be achieved at any cost whereas, the dependency preservation property although desirable is sometimes sacrificed for other factors
Practical Use of Normal Forms Normalization is carried out in practice Resulting designs are of high quality and meet the desirable properties stated previously Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF Another point worth noting is that the database designers need not to normalize to the highest possible normal form
Definitions of Keys and Attributes Participating in Keys( 键定义 和键中的属性 ) Definition of superkey and key Candidate key If more than one key in a relation schema One is primary key Others are secondary keys
First Normal Form Part of the formal definition of a relation in the basic (flat) relational model(1nf disallows relations within relations ) Only attribute values permitted are single atomic (or indivisible) values Techniques to achieve first normal form Remove attribute and place in separate relation (eg dnumber,dlocation) Expand the key(eg fig 15.9 c ) Use several atomic attributes(a maximum is known, dlocation1,dlocation2.,dlocation3,[null,how to qurey,spuriou
First Normal Form (cont d.) Does not allow nested relations Each tuple can have a relation within it(eg fig 15.10 a) To change to 1NF: Remove nested relation attributes into a new relation and propagate the primary key into it This procedure can be applied recursively Unnest relation into a set of 1NF relations Eg person (ss#,{car_lic#},{phone#}),the set braces {} indentify the attributes as multivalued
Second Normal Form Based on concept of full functional dependency Versus partial dependency Second normalize into a number of 2NF relations Nonprime attributes are associated only with part of primary key on which they are fully functionally dependent
Third Normal Form Based on concept of transitive dependency Problematic FD Left-hand side is part of primary key Left-hand side is a nonkey attribute
General Definitions of Second and Third Normal Forms
General Definitions of Second and Third Normal Forms (cont d.) Prime attribute Part of any candidate key will be considered as prime Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a relation
General Definition of Second Normal Form
General Definition of Third Normal Form
Boyce-Codd Normal Form Every relation in BCNF is also in 3NF A relation in 3NF is not necessarily in BCNF Difference: Condition which allows A to be prime is absent from BCNF Most relation schemas that are in 3NF are also in BCNF
Ideally,relational database design should strive to achieve BCNF or 3NF for every relation schema. Achieving the normalization status of just 1NF or 2NF is not considered adequate, since they are developed historically as stepping stones to 3NF and BCNF.
Another example
Another example FD1 : {student,course} Instructor FD2: Instructor course each instructor teaches one course Decomposition of this relation schema into two schemas is not straightforwad because it may be decomposed into one of the three following possible pairs: 1 {student,course} and {student, Instructor} 2 {course, Instructor} and {course,student} 3 {Instructor, course,} and {Instructor,student} All three decompositions lose the functional dependency FD1,The desirable decomposition of those just shown is 3,because it will not generate spurious tuples after a join.
Multivalued Dependency and Fourth Normal Form Multivalued dependency (MVD) Consequence of first normal form (1NF)
Multivalued Dependency and Fourth Normal Form (cont d.) Relations containing nontrivial MVDs All-key relations Fourth normal form (4NF) Violated when a relation has undesirable multivalued dependencies
Join Dependencies and Fifth Normal Form Join dependency Multiway decomposition into fifth normal form (5NF) Very peculiar semantic constraint Normalization into 5NF is very rarely done in practice
Join Dependencies and Fifth Normal Form (cont d.)
5.2.8 关系模式的规范化 关系模式规范化的基本步骤如下图所示 1NF 消除决定属性 消除非主属性对关键字的部分函数依赖 集非码的非平 2NF 凡函数依赖 消除非主属性对关键字的传递函数依赖 3NF 消除主属性对关键字的部分和传递函数依赖 BCNF 消除非平凡且非函数依赖的多值依赖 4NF 消除不是由候选键所蕴含的连接依赖 5NF
5.2.8 关系模式的规范化 ①对1NF关系进行投影 消除原关系中非主属性 对关键字的部分函数依赖 将1NF关系转换为若 干个2NF关系 ②对2NF关系进行投影 消除原关系中非主属性 对关键字的传递函数依赖 从而产生一组3NF ③对3NF关系进行投影 消除原关系中主属性对 关键字的部分函数依赖和传递函数依赖 也就是 使决定属性成为投影的候选键 得到一组 BCNF
5.2.8 关系模式的规范化 以上三步也可以合并一步 对原关系进行投影 消除决定 属性不是候选键的任何函数依赖 ④对BCNF关系进行投影 消除原关系中非平凡且非函 数依赖的多值依赖 从而产生一组4NF关系 ⑤对4NF关系进行投影 消除原关系中不是由候选键所 蕴含的连接依赖 即可得到一组5NF 5NF是最终范式 规范化程度过低的关系可能会存在插入异常 删除异常 修改复杂 数据冗余等问题 需要对其进行规范化 转 换成高级范式 但并不意味着规范化程度越高的关系模 式就越好 在设计数据库模式结构时 必须对现实世界 的实际情况和用户需求作进一步分析 确定一个合适的 能够反映现实世界的模式 这也就说 上面的规范化步 骤可以在其中任何一步终止
盘点 任何一个二目关系模式 R(A,B) 一定属于 BCNF 吗? 一定属于 4NF 吗? 关系模式 R(A,B,C),AB C 一定成立吗? 一个全是主属性的关系模式一定可以达到第几范式? 一个全码的关系模式一定可以达到第几范式?
Summary Informal guidelines for good design Functional dependency Basic tool for analyzing relational schemas Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF