* XML - eXtensible Markup Language
(then: JSON - JavaScript Object Notation)
* XML - a metalanguage designed to describe data
and to focus on what data is;
* XML can be used to describe data
so that reasonable self-describing data can
be transferred BETWEEN applications
(with databases on the outskirts...)
* XML *looks* like a markup language,
like HTML --
it describes a SYNTAX for making
customized markup languages, essentially;
* XML tags are NOT predefined;
* BUT when a group (informally or formally)
decide on a set of XML tags and rules
for their domain,
they can be FORMALIZED in a DTD - Document
Type Definition or an XML Schema,
and then applications can
USE the DTD or XML Schema
to help them validate data, use it more easily,
etc.
* some terminology:
* XML with correct syntax is called well-formed XML
* XML validated against a DTD or XML Schema
is called valid XML
* A well-formed XML document must follow a standard
structure:
* it begins with an XML prologue
(says: this is an XML document, and this is
the version and language in which it is written)
<?xml version="1.0" encoding="ISO-8859-1" ?>
* IF the XML document is USING a DTD or XML Schema,
you'll include that:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
* FOLLOWING that 1- or 2-piece XML prologue,
you have a root element
a single, outermost element enclosing ALL of the other
content of the document
* ALL well-formed XML documents MUST have
a single root element (containing everything
else)
* XHTML is an XML implementaton of HTML --
an XHTML document has root element html
...XML prologue...
<html>
....
</html>
* all XML elements MUST have a start tag and
and an end tag (although a no-content
element CAN combine these:
<thingy ... />
* XML elements are case-sensitive
* all XML elements must be properly
nested
<strong><em> moo </strong></em> // NOT well-formed XML!
* XML elements MAY have attributes in their
start tags,
BUT all attributes must have values
and those values must be in quotes
(and there's an equal sign...)
e.g.,
<programmer level="fabulous">Grace Hopper</programmer>
* XML commments: <!-- comment -->
* we DO use tree-terminology talking about XML!
root is the root element of a tree;
children of an element are the top-level elements
nested within;
siblings have the same parent;
every element HAS exactly one parent
* elements can have different content types
* an element can have:
* element content - an element contains
(just) element(s) as content
* mixed content - contains both text
and other element(s)
* simple content - contains just text
as content
* empty content - has no content
...and an element can have attributes
(attributes aren't content!)