CS 325 - Week 4 Lecture 1 - 2016-09-13
* now: for MORE on Codd's relational model;
* and relational algebra/relational operations
* and a relational DBMS - an RDBMS --
enforces this abstraction that a database
is a set of relations
* in an RDBMS,
a relational database is a set of relations;
* what is a relation?
* formal:
(Ulmann, 2nd ed. p. 19)
"a subset of the Cartesian product of a list
of domains"
* let's dig into this a little bit:
* consider our relation structure form:
Employee(EMPL_ID, empl_lname, empl_str_addr,
empl_salary)
* this could also be written as a
relation scheme as:
Employee = (EMPL_ID, empl_lname, empl_str_addr,
empl_salary)
* each thing in the parentheses above
is an ATTRIBUTE of that relation;
we have a relation Employee
with the attributes empl_id, empl_lname,
empl_str_addr, empl_salary
* each attribute name A is associated with
a domain, dom(A), a SET of values, which
often includes the special value null
(null means LACK of a value)
...the set of values it is REASONABLE
for that attribute to have;
* dom(empl_salary) might be a positive
number,
dom(empl_id) might be the set of 4-digit
values between 0000 and 9999,
dom(empl_str_addr) might be valid
street addresses in Arcata
* given a relation scheme
R = (A1, A2, ... An)
a relation r on scheme R is defined
as any finite subset of the Cartesion product
dom(A1) x dom(A2) x ... x dom(An)
so...for relation scheme Employee,
a relation under this relation scheme could be
any SUBSET of COMBINATIONS of
(element from dom(EMPL_ID),
element from dom(empl_lname),
element from dom(empl_str_addr),
element from dom(empl_salary))
{ (1111, 'Jones', '111 Ash Str', 500),
(2222, 'Smith', '123 Elm Str', 505),
...
(9999, 'Nguyen', '333 Third Str', 503) }
* see how this is looking somewhat tabular-ish?
the above is a relation, or a relational table,
(and yes, we'll often call it a table --
BUT it needs to WORK with the above definition!)
* one element in this set,
is called a TUPLE (from the relational algebra end)
is called a ROW (from the tabular terminology end)
and is even sometimes called a RECORD
(from the old-mainframe-world end)
either way -- notice that a relation
is essentially a set of ROWS...
(not a set of cells <- in the spreadsheet sense)
* note that the ATTRIBUTE (from the relational alg end)
is called a COLUMN (from the tab terminology end)
and is even sometimes called a FIELD
(from the old-mainframe-world end)
* hopefully each relation corresponds to some
set of significant "things" in our scenario
(or some set of portions of significant things)...
* hopefully each attribute is a significant characteristic
of that significant thing or thing-portion;
* each attribute has a domain, a set of
VALID values (which may include the null value),
and a domain may have a physical definition
(for example, number(3) or varchar2(27) -- a type,
for example)
AND we also can talk about a semantic definition
of an attribute -- dom(empl_str_addr) being
the set of actual street addresses in Arcata...
(yes, there is often a gap between the semantic
definition of an attribute and what the RDBMS can
actually support...)
* SO -- consider plain old tables -- I've said
not ALL tabular-like-things are relations;
what kind of restrictions "flow" from the relational
algebra definition?
* a set doesn't have a concept of duplicates --
an element is a member of the set, or it is not;
THUS, since a relation is a set of tuples,
a relation cannot have duplicate tuples
(no duplicate rows!)
* the relation definition say it is finite of
the Cartesian product dom(A1) x dom(A2) ...
...these tuples cannot have MULTIPLE values
for a given attribute!
SO: a true relation does NOT have more
than one value in a single "cell";
* order of the elements in a set is not
significant in set theory --
so the order of the tuples in a relation
is not significant
* each attribute must have a unique name
BUT the order of attributes is not significant
either
* all of the values for a given attribute
must be from that attribute's domain
(which might include null, remember)
* BUT -- we need another piece before getting
to our relational operations --
functional dependency
* a relationship between attributes
if, given the value of attribute A,
you can UNIQUELY determine the value of
another attribute B (no matter what)
...based on the "RULES" of that scenario
we say that B is functionally dependent on A;
we say that A -> B
A functionally determines B
A determines B
SO -- empl_id -> empl_salary
BUT, in a world where salaries are based
on many factors, and many employees
may happen to have the same salary,
it is NOT the case that empl_salary -> empl_id
IF your Employee relation scheme also
included job_title, and it turned out that
salaries were based on job_title alone,
then you COULD say that
job_title -> empl_salary
* one text argues that the storage and retrieval of
functional dependencies is a MAIN reason for
even having a database!
* note: a functional dependency CAN involve
SETS of attributes --
(student_id, class_crn) -> class_grade
...this does NOT imply that:
student_id -> class_grade
or that:
class_crn -> class_grade --
...it takes the PAIR to uniquely
determine the class grade;
* BUT:
empl_id -> (empl_lname, empl_str_addr, empl_salary)
and this DOES imply that
empl_id -> empl_lname
empl_id -> empl_str_addr
empl_id -> empl_salary
empl_id -> (empl_lname, empl_str_addr)
...