CS 325 - Week 5 Lecture 1 - 2016-09-20

*   quick review of natural join/equi-join
    *   equi-join: combination of Cartesian prodoct
        and selection,
	selecting tuples from the Cartesian product
	where a column in the 1st joined table has
	the same value as a column in the 2nd joined table

	*   equi - equal! a join based on two things
	    being equal
        *   assumes there is a column in each joined
	    table that shares a domain (has the same
	    meaning)

    *   natural join: an equi-join with an additional
        projection, projecting all but one of the
	two columns that were joined upon;

    *   (there are a number of different kinds of
        joins -- we are most interested in
	equi-join and natural joins here,
	to the point that if we say "join"
	we mean one of these --
	
	for the others, I will give their
	"full" names...)

*   several points to remember:
    *   IMPORTANT: the attributes being joined upon
        in these joins SHOULD have a common domain!!
	(or results won't make semantic "sense")

    *   don't worry, under the hood the actual algorithms
        for these joins are more efficient than implied
	above...

    *   CAN join on other conditions than equality,
        BUT we won't be going there...

        *   that's a theta join,
	    when the join condition is not based on equals

    *   there's another type of join, OUTER JOIN, that
        we *might* get around to --

        in an outer join, the "unmatched" pairs are
	included -- students with no enrollments are
	included, enrollments involving no student (?!)
	are included --
	and the values for the "unmatched" columns are left
	NULL (so result is not always a "true" relation, I think...)

        *   there are other types of joins as well...!

*   we'll come back to the other set-theoretic operations --
    union, intersection, and difference -- later on;

========================
*   INTRO to ENTITY-RELATIONSHIP MODELING,
    part 1!

    *   also called E/R modeling or E-R modeling...!

    *   entering the world of DATA MODELING!
        *   the process of creating an REPRESENTATION
	    of the USERS' VIEW of the data in their
	    scenario

        *   Kroenke says it's the MOST important task
	    in the development of effective database
	    APPLICATIONS;

    *   we are discussing a classic approach for expressing
        a user's data model, in the form of an
	ENTITY-RELATIONSHIP MODEL

	(the result of entity-relationship modeling is
	an entity-relationship model)

        *  *****************************************
           VERY IMPORTANT: at THIS phase of a the
	   database life cycle,

	   YOU SHOULD NOT BE THINKING IN TERMS OF TABLES
	   (or relations!) YET!!!!!!!!!!!!!!!!
           *****************************************

	   don't worry, there are GOOD approaches for
	   then TURNING an E/R model INTO a
	   set of appropriate relations FOR that
	   scenario being modeled;

           *   AVOID jumping to the table-design phase
	       too early!

           *   DURING the modeling phase,
	       WE WILL BE TALKING about ENTITY CLASSES
	       and ENTITIES, not tables/relations and
	       tuples/rows;

	       (we'll see that an entity class in an E/R
	       model will eventually turn into ONE OR MORE
	       relations in the eventual design;)

               AND: WE WILL BE TALKING about RELATIONSHIPS
	       between entity classes and entities,
	       NOT foreign keys in tables/relations;

	       ...there are NO foreign keys in the E/R models
	       we'll be coming up with;

    *   when we are happy with our E/R model for a scenario,
        we will CONVERT it into a DATABASE SCHEMA or DATABASE DESIGN:
	*   mathematical definition of db schema or db design:
	    *   relation structures plus constraints on allowable
	        data values

        *   slightly-more-pragmatic definition of db scheme or db design:
	    *   (Kroenke's)
	    *   its relation structures,
	    *   relationships, <-- foreign keys
	    *   domains,
	    *	and business rules

                *   business rule: is a day-to-day rule of operation
		    WITHIN the scenario

                    In a library: a patron can only have 10 books
		    checked out at a time

		    In a hair-cutting business: all stylists/barbers
                    must provide
		    at least one working phone number for
		    reaching them
		    
		    ^ see how that can also be a desired constraint
		      on the allowble data in an eventual database?

		      CAN'T always enforce these in eventual
		      database -- but we do what we can...

*   note that E/R models are going to be 
    DBMS-independent -- doesn't matter what DBMS you
    will eventually use;

    (we'll even see that eventual db designs/schemas
    are often RDBMS-independent, able to be
    implemented on any RDBMS that provides SQL...)

*   There ARE variations in expressing both E/R models
    and database schemas/designs --
    I'll be letting you know which variations we'll be
    using in this class;

==============================
*   Peter Chen introduced the entity-relationship model in a 1976
    paper;
    *   commonly abbreviate e-r or e/r or E/R or e-r
        (I'll accept all of these!)

    *   the diagrams based on the E/R model are called
        entity-relationship diagrams, or ERDs

    *   (beware, there are variations in ERD formatting,
        you are required to use our class variation so
	I can read it!)

*   the elements in this model:
    *   KEY elements are entities (and entity classes),
                         relationships,
			 identifiers, and
			 attributes!

    *   ENTITIES and ENTITY CLASSES
        *   an entity is an "object" that exists and is
	    distinguishable from other "objects"

            ... it is something significant in the
	        users' scenario, that they may have some
		interest in keeping track of;

        *   an entity CLASS is a set of entities of the
	    same "type" (same category?)

        *   for a credit union,
	    an entity might be the savings account #12467,
	    and an entity class might be the set of
	        all savings accoounts at that credit union;

	    for a university, 
	    an entity might be the student Ann Carroll,
	    and an entity class might be the set of all
	    currently-enrolled students;

        *   in an E/R model,
	    we'll be teasing out the significant entity
	    classes in a scenario;

        *   in an ERD, 
	    each entity class will be represented as (some sort of)
	    a rectangle