CS 325 - Week 4 Lecture 1 - 2016-09-13

*   now: for MORE on Codd's relational model;
    *   and relational algebra/relational operations

    *   and a relational DBMS - an RDBMS --
        enforces this abstraction that a database
	is a set of relations

*   in an RDBMS,
    a relational database is a set of relations;

    *   what is a relation?
        *   formal:
	    (Ulmann, 2nd ed. p. 19)
	    "a subset of the Cartesian product of a list
	    of domains"

    *   let's dig into this a little bit:
        *   consider our relation structure form:

	    Employee(EMPL_ID, empl_lname, empl_str_addr,
	             empl_salary)

            *   this could also be written as a 
	        relation scheme as:

		Employee = (EMPL_ID, empl_lname, empl_str_addr,
	             empl_salary)

        *   each thing in the parentheses above
	    is an ATTRIBUTE of that relation;

	    we have a relation Employee
	    with the attributes empl_id, empl_lname,
	    empl_str_addr, empl_salary

        *   each attribute name A is associated with
	    a domain, dom(A), a SET of values, which
	    often includes the special value null
	    (null means LACK of a value)

            ...the set of values it is REASONABLE
	    for that attribute to have;

            *   dom(empl_salary) might be a positive
	        number,

		dom(empl_id) might be the set of 4-digit
		values between 0000 and 9999,

		dom(empl_str_addr) might be valid
		street addresses in Arcata

    *   given a relation scheme
    
        R = (A1, A2, ... An)

        a relation r on scheme R is defined
	as any finite subset of the Cartesion product
	    dom(A1) x dom(A2) x ... x dom(An)

       so...for relation scheme Employee,
       a relation under this relation scheme could be
       any SUBSET of COMBINATIONS of 

       (element from dom(EMPL_ID), 
        element from dom(empl_lname), 
        element from dom(empl_str_addr),
	element from dom(empl_salary))

	{ (1111, 'Jones', '111 Ash Str', 500),
          (2222, 'Smith', '123 Elm Str', 505),
	  ...
          (9999, 'Nguyen', '333 Third Str', 503) }

        *   see how this is looking somewhat tabular-ish?

	    the above is a relation, or a relational table,
	    (and yes, we'll often call it a table --
	    BUT it needs to WORK with the above definition!)

        *   one element in this set,
	    is called a TUPLE (from the relational algebra end)
	    is called a ROW (from the tabular terminology end)
	    and is even sometimes called a RECORD
	                    (from the old-mainframe-world end)

            either way -- notice that a relation
	    is essentially a set of ROWS...
	    (not a set of cells <- in the spreadsheet sense)

        *   note that the ATTRIBUTE (from the relational alg end)
            is called a COLUMN (from the tab terminology end)
	    and is even sometimes called a FIELD
	                    (from the old-mainframe-world end)

*   hopefully each relation corresponds to some
    set of significant "things" in our scenario 
    (or some set of portions of significant things)...

    *   hopefully each attribute is a significant characteristic
        of that significant thing or thing-portion;

    *   each attribute has a domain, a set of
        VALID values (which may include the null value),
	and a domain may have a physical definition
	   (for example, number(3) or varchar2(27) -- a type,
	    for example)

	AND we also can talk about a semantic definition
        of an attribute -- dom(empl_str_addr) being
	the set of actual street addresses in Arcata...

	(yes, there is often a gap between the semantic
	definition of an attribute and what the RDBMS can
	actually support...)

*   SO -- consider plain old tables -- I've said
    not ALL tabular-like-things are relations;
    what kind of restrictions "flow" from the relational
    algebra definition?
    *   a set doesn't have a concept of duplicates --
        an element is a member of the set, or it is not;

	THUS, since a relation is a set of tuples,
	a relation cannot have duplicate tuples
	(no duplicate rows!)

    *   the relation definition say it is finite of
        the Cartesian product dom(A1) x dom(A2) ...

	...these tuples cannot have MULTIPLE values
	for a given attribute!

	SO: a true relation does NOT have more
	than one value in a single "cell";

    *   order of the elements in a set is not
        significant in set theory --
	so the order of the tuples in a relation
	is not significant

    *   each attribute must have a unique name

        BUT the order of attributes is not significant
	either

    *   all of the values for a given attribute
        must be from that attribute's domain
	(which might include null, remember)

*   BUT -- we need another piece before getting
    to our relational operations --

    functional dependency

    *   a relationship between attributes

        if, given the value of attribute A,
	you can UNIQUELY determine the value of
	another attribute B (no matter what)

	...based on the "RULES" of that scenario

	we say that B is functionally dependent on A;

	we say that A -> B  
	            A functionally determines B
                    A determines B	

        SO -- empl_id -> empl_salary
	      BUT, in a world where salaries are based
	          on many factors, and many employees
		  may happen to have the same salary,

		  it is NOT the case that empl_salary -> empl_id

              IF your Employee relation scheme also
	      included job_title, and it turned out that
              salaries were based on job_title alone,
	      then you COULD say that

	         job_title -> empl_salary

    *   one text argues that the storage and retrieval of
        functional dependencies is a MAIN reason for
	even having a database!
	  
    *   note: a functional dependency CAN involve
        SETS of attributes --

	(student_id, class_crn) -> class_grade

        ...this does NOT imply that:

		    student_id -> class_grade

		    or that: 

		    class_crn -> class_grade --

		    ...it takes the PAIR to uniquely
		    determine the class grade;

    *   BUT:
        empl_id -> (empl_lname, empl_str_addr, empl_salary)

	and this DOES imply that 
	
        empl_id -> empl_lname
        empl_id -> empl_str_addr
        empl_id -> empl_salary
        empl_id -> (empl_lname, empl_str_addr)
        ...