CS 325 - Week 9 Lecture 1 - 2016-10-18

*   normalization and normal forms
    ...part of basic "database" vocabulary;

*   functional dependencies play a vital role here!

*   as we mentioned earlier --
    for day-to-day, operational use,
    not all relations are equally "good" --
        some ARE "better" than others;

    *   why? well, for one thing, for SOME relations,
        changing the data can have UNDESIRABLE
	consequences called
	MODIFICATION ANOMALIES

        ...and we'll see that such anomalies can
	often be reduced or eliminated by
	redefining the relation(s) involved;

	...often by breaking up a relation
	into one or more related relations;

    *   when we break up relations to reduce/eliminate
        modification anomalies,
	that process is called NORMALIZATION;

	(and when you go the other way,
	combining relations, that's
	DEnormalization, but that's not really a topic
	for this course...)

*   modification anomalies, then;
    *   some texts break these up into more categories,
        but we'll go with two:

	deletion anomalies
        *   occurs when deleting a row of a relation
	    may cause the loss of "additional" info
	    which it may not be reasonable
	    to have completely disappear

	    or, if you prefer:
	    it occurs when facts about two or
	    more entities are lost with one deletion;

        *   (for example, losing a project's name
	    because an employee is deleted in the
	    example table 1)

	insertion anomalies
        *   when we cannot insert information that's
	    important to the scenario -- about some
	    entity -- but we cannot,
	    and this resulting restriction seems
	    unreasonable;

        *   (for example, that I cannot add to example
	    table 1 that new project 4 has the
	    name rainbow,
	    or that CompSci job class has a charge
	    per hour of $75)

*   normalization is the process of "breaking up"
    a relation into multiple relations
    for the purpose of reducing modification
    anomalies;

    ...breaking up relations with multiple "themes" [Kroenke]
    so each resulting relation only has one "theme";

    (but also maintaining desired relationships;
    ...referential integrity constraints tend to
    be added during this breaking-up;)

*   normal forms
    *   one classic set of normal forms:

        first normal form (1NF)
	second normal form (2NF)
	third  normal form (3NF)
	Boyce-Codd normal form (BCNF)
        fourth normal form (4NF)
	fifth normal form (5NF)
	sixth normal form (6NF)

        ^ these are superset/subsets --
	  if a set of relations is in 2NF, it is also in 1NF,
	  and if it is in 3NF, it is also in 1NF and 2NF, etc.

    *   and it turns out that you can think of normalization
        as a process of working through these normal
	forms to the level that makes sense for your
	database;

	*   (I've read that 3NF is usually sufficient for
	    more operational databases)

1NF (First Normal Form)
*   this one is pretty easy!
    any true relation is, by definition, in 1NF.

    *   as a true relation, then, the relation meets
        that definition:
	*   single-valued cells!
	*   unique and non-null primary keys!
        *   no duplicate rows!

    *   do you ever need to do much work to
        get a set of relations into 1NF?
	...most common issue I've seen/read about
	is the realization that something is not truly
	a relation because it has a multivalued attribute;

	*   and what you'll do is break out that
	    multivalued attribute into its own
	    relation;

            *   you remove the multivalued attribute to
	        a new relation,

		adding the primary key of the original
		relation TO the new relation AS a foreign
		key referencing the original relation,

		and make the primary key of the new
		relation BOTH that foreign key and the
		multivalued attribute

2NF - 2nd Normal Form
*   because, relations in 1NF can have modification
    anomalies due to certain functional dependencies;

    ...for example, there's a category of
    functional dependencies called

    PARTIAL dependencies -
    *   a partial dependency is when a non-primary key attribute is
        dependent on just PART of its relation's primary key;

    *   a relation is said to be in
        2NF if:
        1) it is in 1NF
        2) it includes NO partial dependencies

           (ALL of its non-primary-key attributes
	    depend on their relation's "entire" primary key)

    *   SO, here's an algorithm for converting
        a non-2NF 1NF relation into 2NF relations:

	1. write each key component on its own line,
	   and each combination of key components on
	   its own line

	   empl_num
	   proj_num
	   (empl_num, proj_num)

        2. for each non-primary-key attribute,
	   write it on the line for which it neeeeeeds
	   that "entire" primary key to determine it

	   empl_num - empl_name, job_class, chg_per_hr
	   proj_num - proj_name 
	   (empl_num, proj_num) - hrs

        3. for lines with a non-primary-key attribute
	   added, make a new relation out of that line,
	   whose primary key is the primary-key-combo
	   given, and (typically for the primary key whose
	   set includes multiple attributes)
	   make each a foreign key to its "base" relation
           
	   Employee(EMPL_NUM, empl_name, job_class, chg_per_hr)

	   Project(PROJ_NUM, proj_name) 

	   Work_on_Proj(EMPL_NUM, PROJ_NUM, hrs)
               foreign key (empl_num) references Employee,
	       foreign key (proj_num) references Project