CS 325 - Week 3 Lecture 1 - 2016-09-06

*   more formal definition of database:
    a self-describing collection of integrated records

    *   self-describing: contains data ABOUT the
        data, as well as the desired data to be stored;

	this information about the structure of the
	data is called METADATA

	(sometimes metadata is stored/called a
	data dictionary or data directory)

    *   why bother? why is this considered useful?
        *   because it can help with the
	    goal of promoting increased
	    application program independence

    *   collection of integrated records
        *   you have the source data,
            you ALSO have some description
	    of RELATIONSHIPS among the data records

        *   (some for integration,
            some for performance/efficiency)

*   note that a database can be thought of as
    a model of a model
    *   ideally, the database is a model that
        works with the USER's model of their
	scenario --

	the database isn't a model of reality

    *   the degree of detail included in a
        database should be based on the users
	needs/desired in that scenario,
	leavened by ethical and/or legal
	concerns;

*   a little MORE history...
    *   early database systems tended to organizational
        level, and the applications being supported
        were often organization-wide,
	TRANSACTION-processing systems

	*   transaction - a representation of
	    a single event (that may actually
	    consist of multiple steps)

	    for example: a sale, 
	    a withdrawal, a deposit

        *   early database processing systems
	    focused on supporting regular
	    transactions

            ...these got VERY good at dealing
	    with regular, predictable transactions

	    NOT very flexible,
	    NOT very good for ad-hoc "new" uses
	    of the data,
	    and application programs using them
	       tended to need to be written
	       in procedural languages such
	       as COBOL and PL/I

     *   database models used here:
         hierarchical and network database models

*   1970 
    E. F. Codd
    he developed the relational model based
    on relational algebra

    in a relational database
    you consider data to be stored in the form
    of relations, which are tabular and meet
    certain criteria;

    *   and because these are relations,
        you can perform relational algebra operations
	ON these relations,

	which allow for very flexible use of
	and querying of that data;

        (relational DBMS, or RDBMS,
	maintains this abstraction that the
	data is in the form of these relations)

   *   it is also easier to think about
       data organized as relations than about
       data organized using the older
       hierarchical or network models;

   *   it was initially - and for a while -
       thought this could be never be
       practical --
       thanks to Moore's Law and hardware
       and memory development,
       (and some smooth algorithms under the
       hood!)

       ...it became practical enough for many
       purposes, after all

*   and don't forget the microcomputer revolution
    (starting late 70s, moving into the 80s and
    beyond)
    ...combo of this, the relational model,
    improved and cheaper hardware/memory and
    improved user interfaces enable database
    technology to move from the organizational
    context to a personal-computing context
    (and all the levels in between...)

    *   now add client-server, and local-area
        networks and wide-area networks can get
	involved;

    *   (and development in the area of databases
        continues -- object databases,
	distributed databases, and more...)

*   FOUR main elements of a database:
    *   user data
        *   in a relational database, these
	    conceptually are tables

        *   for now, consider a relation to
	    be a tabular collection with
	    rows and columns

    *   metadata
    *   indexes to improve performance and data access
    *   application metadata
  
*   consider user data for a moment --

    *   even with the relational model,
        there can be more than one view of
	a relation;

        ...and more than one way to depict a
	relation;

    *   one way: tabular form

        Student relation

        stu_lname    stu_fname stu_phone   advisor_lname
        ----------   --------- ---------   --------------
        Jones        Jane      123-4567    Smith
        Nguyen       Anh       234-5678    Silva
        Garze        Juan      345-6789    Schmidt

        *   straightforward
        *   can see contents
        *   can see basic relation structure
        *   can't directly tell the primary
            key, or domain info (what can be
 	    in each column), or how
	    different tables are related

	    (might be able to guess...)

        *   can be a pain to type

    *   what if you don't need to see the current
        contents of a relation, but just its
        structure?

        relation structure form

        relation(PRIMARY_KEY_ATT, PRIMARY_KEY_ATT, ...
            another_att, another_att, ...)

        Student(STU_LNAME, STU_FNAME, stu_phone, 
                advisor_lname)

        *   in CS 325, we will supplement the
            above with SQL-style foreign
	    key clauses to describe any foreign
	    keys

        [NOTE!!!! THESE ARE BADDDDDDLY-DESIGNED TABLES!
        DON'T USE PEOPLE NAMES AS PRIMARY KEYS!]

        Student(STU_LNAME, STU_FNAME, stu_phone, 
                advisor_lname)
        foreign key(advisor_lname) references Advisor

        Advisor(ADVISOR_LNAME, advisor_email,
               advisor_phone)

        *   can explicitly see which attributes
	    make up the primary key and, in
	    our variant, which make up any
	    foreign keys as well
	*   cannot see relation contents in this
	    form
	*   easier to type
	*   still cannot tell domain info (what
	    can be in each column)
    
*   create table statement form -
    describe the relation's structure using
    a SQL create table statement!
    *   includes column domain info!
    *   is executable!
    *   not as convenient to type as relation
        structure form, but easier to type than
	tabular form
    *   also does not include relation contents