CS 325 - Week 4 Lecture 2 - 2016-09-15

*   another definition:

    when you write a functional dependency
    using:

    A -> B

    ...A can also be called a DETERMINANT

*   for a relation,
    each superkey MUST be a determinant;

*   BUT -- it is also possible for
    a determinant -- in SOME functional
    dependency within a scenario --
    to NOT be superkey of a relation;

*   Consider:

    Student(STU_ID, Stu_lname, Stu_phone,
            Advisor_id, Advisor_lname,
	    Advisor_phone)

    ...it IS the case that
    Advisor_id -> Advisor_lname, Advisor_phone

    ...so, Advisor_id IS a determinant,
    but definitely NOT a superkey for the
    Student relation...

*   foreign key -
    *   relationally, a foreign key is a set
        of one or more attributes within
	one relation that is also a
	candidate key in another relation
	(Conolly & Begg)

    *   THAT SAID, I've noticed that Oracle --
        and I suspect other RDBMSs? --
	only permit a foreign key to
	be a set of one or more attributes
	that is also a PRIMARY key in
	another relation

	(JUST being a candidate, minimal
	key is not sufficient for
	at least some RDBMSs...)

    *   this gives us a way to relate
        or link relations by using
	controlled redundancy --
	by adding these foreign keys
	to be able to relate tuples in
	different relations;

*   NOTE:
    we will be discussing WHAT foreign
    keys to put in WHERE

    WHEN we go FROM a database model TO
    a database DESIGN --

    we'll see there are pretty specific rules
    for when and where to do so;

    *   SO: do NOT try to put them in at the
        modelling stage!
	(as we'll continue to note next
	week as we start our modeling
	discussion...)

*   we mentioned referential integrity
    in lab yesterday --
    (DBMS tries to ensure that foreign key
    values have "matching" primary key values
    in the referenced relation)

    I think we've mentioned domain integrity
    also 
    (DBMS tries to ensure that the value given
    for an attribute is part of that attribute's
    domain, as much as it can...)

    And I hope we mentioned entity integrity
    as well --
    a DBMS tries to ensure that
    every tuple in a relation HAS a non-null
    primary key,
    and no two tuples in a relation have the
    SAME primary key;

    (and also, thus, cannot have duplicate
    rows...)

*   RELATIONAL algebra OPERATIONS!
    *   in relational algebra,
        we can talk about doing operations
	on relations;

    *   and we will be interested especially
        in operations that are CLOSED
	over the set of relations --

	the result of such operations
	is ITSELF also a relation!

	(so you can compose quite
	complex operations over relations
	using these operations)

    *   (and note, since relations are
        sets -- sets of tuples, as
	discussed --
	we can often also include
	set operations as part of our
	collection of relational operations)

*   why useful to know some relational
    operations?
    *   help to understand what
        kinds of manipulations are
	possible on relational tables

    *   and help to be able to write
        better QUERIES, to ask questions
	to get answers we want about our
	data that is in the form of
	relations;

    *   help to understand/appreciate
        the kind of work the RDBMS is doing
	for you;

    *   (and perhaps provide early
        warning about when some query
	MIGHT take a long time to
	be computed...)
    
*   we'll be discussing a useful SUBSET
    of these operations,
    especially useful for developing
    database queries;

    *   set-theoretic:
        *   union
	*   difference
	*   intersection
	*   Cartesian product

    *   relation-theoretic
        *   selection
        *   projection
	*   many joins,
	    including natural join
	    and equi-join
	*   ...

*   and least 2 DB texts have noted
    that if a DBMS is considered at all
    relational,
    it better at LEAST support
    the operations of selection, projection,
    and either or both of natural and/or
    equi-join;

*   we are DELIBERATELY stating these
    in NON-SQL terms at this point;
    
    *   and we'll use the notation
        that the "full" name of an attribute
	is the tbl_name.attrib_name

*   SELECTION
    *   "pure" relational selection
        means to SELECT specified rows
	from a relation
        to result in a new relation
	with just those rows;

    *   student where age < 25
        select rows from student relation
	    where attribute age is < 25

*   PROJECTION
    *   projection grabs JUST specified
        attributes from a relation,

	and removes any duplicate tuples
	in the result,

	to result in another relation;

    *   student[stud_major, stud_grade, 
                stud_grade_level]    
	project the attributes stud_major, 
                stud_grade, 
                stud_grade_level from
		the student relation,
		(and throw out any duplicate
		rows so the result is still
		a relation)

*   Cartesian product
    *   you rarely actually want this by
        itself,
	BUT it is CONCEPTUALLY part of
	natural and equi-joins, so...

    *   Cartesian product in plain sets
        is the set of pairs where you
	have every combo of an element
	from one set and an element from
	another

        ...here, then, the sets are all
	relations,
	which are sets of tuples/rows,

	and so the result is a new relation
	where each tuple is a row from
	one relation appended to a row from
	the other
	(and ALL such combos!)

    *   YES, there can be MANY rows in the
        result!
	...in fact, for a Cartesian product
	of a relation with X rows and
	another with Y rows,
	there are X*Y rows in the result!

	(and if there are F attributes in
	a relation and G in another,
	there are F+G attributes in the result!)

*   equi-join, natural join
    *   you JOIN, or combine, relations
        based on some common attribute(s);

    *   SO, you can say,
        I would like the relation that results
	from JOINING relation A and relation B
	where some attribute in each has the
	SAME value

	^ that's an equi-join

    *   equi-join is CONCEPTUALLY:
        *   FIRST take the Cartesian product
	    of the two tables,
	*   THEN do a SELECTION of ONLY
	    those rows in which the
	    "common" attribute in A has
	    the SAME VALUE as the "common"
	    attribute in B

    *   and, natural join?
        *   do the above, and add an
	    extra projection, projecting
	    all by one of the "common"
	    attribute columns;