Amazon.co.uk Review
Author Erik T Ray begins with an excellent summary of XML's history as an outgrowth of SGML and HTML. He outlines very clearly the elements of markup, demystifying concepts such as attributes, entities and namespaces with numerous clear examples. To illustrate a real-world XML application, he gives the reader a look at a document written in DocBook--a publicly available XML document type for publishing technical writings--and explains the sections of the document step by step. A simplified version of DocBook is used later in the book to illustrate transformation--a powerful benefit of XML.
The all-important Document Type Definition (DTD) is covered in depth, but the still-unofficial alternative--XML Schema--is only briefly addressed. The author makes liberal use of graphical illustrations, tables and code to demonstrate concepts along the way, keeping the reader engaged and on track. Ray also gets into a deep discussion of programming XML utilities with Perl.
Learning XML is a highly readable introduction to XML for readers with existing knowledge of markup and Web technologies, and it meets its goals very well--to deliver a broad perspective of XML and its potential. --Stephen W Plain
Review
James Kalback, Journal of the American Society for Information Science and Technology, Oct 2001
Product Description
The arrival of support for XML--the Extensible Markup Language--in browsers and authoring tools has followed a long period of intense hype. Major databases, authoring tools (including Microsoft's Office 2000), and browsers are committed to XML support. Many content creators and programmers for the Web and other media are left wondering, "What can XML and its associated standards really do for me?" Getting the most from XML requires being able to tag and transform XML documents so they can be processed by web browsers, databases, mobile phones, printers, XML processors, voice response systems, and LDAP directories, just to name a few targets.
In Learning XML, the author explains XML and its capabilities succinctly and professionally, with references to real-life projects and other cogent examples. Learning XML shows the purpose of XML markup itself, the CSS and XSL styling languages, and the XLink and XPointer specifications for creating rich link structures.
The basic advantages of XML over HTML are that XML lets a web designer define tags that are meaningful for the particular documents or database output to be used, and that it enforces an unambiguous structure that supports error-checking. XML supports enhanced styling and linking standards (allowing, for instance, simultaneous linking to the same document in multiple languages) and a range of new applications.
For writers producing XML documents, this book demystifies files and the process of creating them with the appropriate structure and format. Designers will learn what parts of XML are most helpful to their team and will get started on creating Document Type Definitions. For programmers, the book makes syntax and structures clear It also discusses the stylesheets needed for viewing documents in the next generation of browsers, databases, and other devices.
From the Publisher
About the Author
Excerpted from Learning XML by Erik T. Ray. Copyright © 2003. Reprinted by permission. All rights reserved.
Theres a Far Side cartoon by Gary Larson about an unusual chicken ranch. Instead of strutting around, pecking at seed, the chickens are all lying on the ground or draped over fences as if they were made of rubber. You see, it was a boneless chicken ranch.
Just as skeletons give us vertebrates shape and structure, markup does the same for text. Take out the markup and you have a mess of character data without any form. It would be very difficult to write a computer program that did anything useful with that content. Software relies on markup to label and delineate pieces of data, the way suitcases make it easy for you to carry clothes with you on a trip.
This chapter focuses on the details of XML markup. Here I will describe the fundamental building blocks of all XML-derived languages: elements, attributes, entities, processing instructions, and more. And Ill show you how they all fit together to make a well-formed XML document. Mastering these concepts is essential to understanding every other topic in the book, so read this chapter carefully.
All of the markup rules for XML are laid out in the W3Cs technical recommendation
for XML version 1.0. This is the second edition of the original which first appeared in 1998. You may also find Tim Brays annotated, interactive version useful.
Tags
If XML markup is a structural skeleton for a document, then tags are the bones. They mark the boundaries of elements, allow insertion of comments and special instructions, and declare settings for the parsing environment. A parser, the front line of any program that processes XML, relies on tags to help it break down documents into discrete XML objects. There are a handful of different XML object types.
Elements are the most common XML object type. They break up the document into smaller and smaller cells, nesting inside one another like boxes. Figure 2-1 shows the document in Chapter 1 partitioned into separate elements. Each of these pieces has its own properties and role in a document, so we want to divide them up for separate processing.
Inside element start tags, you sometimes will see some extra characters next to the element name in the form of name="value". These are attributes. They associate information with an element that may be inappropriate to include as character data. In the telegram example earlier, look for an attribute in the start tag of the telegram element.
Declarations are never seen inside elements, but may appear at the top of the document or in an external document type definition file. They are important in setting parameters for the parsing session. They define rules for validation or declare special entities to stand in for text.
The next three objects are used to alter parser behavior while its going over the document. Processing instructions are software-specific directives embedded in the markup for convenience (e.g., storing page numbers for a particular formatter). Comments are regions of text that the parser should strip out before processing, as they only have meaning to the author. CDATA sections are special regions in which the parser should temporarily suspend its tag recognition.
Rounding out the list are entity references, commands that tell the parser to insert predefined pieces of text in the markup. These objects dont follow the pattern of other tags in their appearance. Instead of angle brackets for delimiters, they use the ampersand and semicolon.
In upcoming sections, Ill explain each of these objects in more detail.
Documents
An XML document is a special construct designed to archive data in a way that is most convenient for parsers. It has nothing to do with our traditional concept of documents, like the Magna Carta or Time magazine, although those texts could be stored as XML documents. It simply is a way of describing a piece of XML as being whole and intact for parsing.
Its important to think of the document as a logical entity rather than a physical one. In other words, dont assume that a document will be contained within a single file on a computer. Quite often, a document may be spread out across many files, and some of these may live on different systems. All that is required is that the XML parser reading the document has the ability to assemble the pieces into a coherent whole. Later, we will talk about mechanisms used in XML for linking discrete physical entities into a complete logical unit. --This text refers to an alternate Paperback edition.