CS 325 - Week 3 Lecture 1 - 2016-09-06
* more formal definition of database:
a self-describing collection of integrated records
* self-describing: contains data ABOUT the
data, as well as the desired data to be stored;
this information about the structure of the
data is called METADATA
(sometimes metadata is stored/called a
data dictionary or data directory)
* why bother? why is this considered useful?
* because it can help with the
goal of promoting increased
application program independence
* collection of integrated records
* you have the source data,
you ALSO have some description
of RELATIONSHIPS among the data records
* (some for integration,
some for performance/efficiency)
* note that a database can be thought of as
a model of a model
* ideally, the database is a model that
works with the USER's model of their
scenario --
the database isn't a model of reality
* the degree of detail included in a
database should be based on the users
needs/desired in that scenario,
leavened by ethical and/or legal
concerns;
* a little MORE history...
* early database systems tended to organizational
level, and the applications being supported
were often organization-wide,
TRANSACTION-processing systems
* transaction - a representation of
a single event (that may actually
consist of multiple steps)
for example: a sale,
a withdrawal, a deposit
* early database processing systems
focused on supporting regular
transactions
...these got VERY good at dealing
with regular, predictable transactions
NOT very flexible,
NOT very good for ad-hoc "new" uses
of the data,
and application programs using them
tended to need to be written
in procedural languages such
as COBOL and PL/I
* database models used here:
hierarchical and network database models
* 1970
E. F. Codd
he developed the relational model based
on relational algebra
in a relational database
you consider data to be stored in the form
of relations, which are tabular and meet
certain criteria;
* and because these are relations,
you can perform relational algebra operations
ON these relations,
which allow for very flexible use of
and querying of that data;
(relational DBMS, or RDBMS,
maintains this abstraction that the
data is in the form of these relations)
* it is also easier to think about
data organized as relations than about
data organized using the older
hierarchical or network models;
* it was initially - and for a while -
thought this could be never be
practical --
thanks to Moore's Law and hardware
and memory development,
(and some smooth algorithms under the
hood!)
...it became practical enough for many
purposes, after all
* and don't forget the microcomputer revolution
(starting late 70s, moving into the 80s and
beyond)
...combo of this, the relational model,
improved and cheaper hardware/memory and
improved user interfaces enable database
technology to move from the organizational
context to a personal-computing context
(and all the levels in between...)
* now add client-server, and local-area
networks and wide-area networks can get
involved;
* (and development in the area of databases
continues -- object databases,
distributed databases, and more...)
* FOUR main elements of a database:
* user data
* in a relational database, these
conceptually are tables
* for now, consider a relation to
be a tabular collection with
rows and columns
* metadata
* indexes to improve performance and data access
* application metadata
* consider user data for a moment --
* even with the relational model,
there can be more than one view of
a relation;
...and more than one way to depict a
relation;
* one way: tabular form
Student relation
stu_lname stu_fname stu_phone advisor_lname
---------- --------- --------- --------------
Jones Jane 123-4567 Smith
Nguyen Anh 234-5678 Silva
Garze Juan 345-6789 Schmidt
* straightforward
* can see contents
* can see basic relation structure
* can't directly tell the primary
key, or domain info (what can be
in each column), or how
different tables are related
(might be able to guess...)
* can be a pain to type
* what if you don't need to see the current
contents of a relation, but just its
structure?
relation structure form
relation(PRIMARY_KEY_ATT, PRIMARY_KEY_ATT, ...
another_att, another_att, ...)
Student(STU_LNAME, STU_FNAME, stu_phone,
advisor_lname)
* in CS 325, we will supplement the
above with SQL-style foreign
key clauses to describe any foreign
keys
[NOTE!!!! THESE ARE BADDDDDDLY-DESIGNED TABLES!
DON'T USE PEOPLE NAMES AS PRIMARY KEYS!]
Student(STU_LNAME, STU_FNAME, stu_phone,
advisor_lname)
foreign key(advisor_lname) references Advisor
Advisor(ADVISOR_LNAME, advisor_email,
advisor_phone)
* can explicitly see which attributes
make up the primary key and, in
our variant, which make up any
foreign keys as well
* cannot see relation contents in this
form
* easier to type
* still cannot tell domain info (what
can be in each column)
* create table statement form -
describe the relation's structure using
a SQL create table statement!
* includes column domain info!
* is executable!
* not as convenient to type as relation
structure form, but easier to type than
tabular form
* also does not include relation contents