Top
Past Meeting Archive Los Angeles ACM home page National ACM home page Click here for More Activities this month
Check out the Southern California Tech Calendar

Regular Meeting of the
Los Angeles Chapter of ACM

Wednesday, January 5, 2005

"XML: The Common Denominator"

Steve Rowell
Inferential Solutions

The popularity of XML (eXtensible Markup Language) is growing. XML has become the desired method of transferring data and information across the Internet. It has come a long way from its roots as a subset of SGML (Standard Generalized Markup Language). Steve will survey XML technologies and why it is a common denominator to most significant software strategies.

Definition, along with pros and cons of use and abuse, will be covered for XML, schemas, DOM/SAX, transformations (XSLT) and Web Services. XML provides interoperability across languages, operating systems, middleware infrastructures and tools. Steve will provide insight into why that interoperability will soon be expected for all modern software subsystems.

Steve Rowell is a software architect currently working at Boeing as a consultant. At Boeing he has architected and implemented ground server software, a payload simulator and various projects while utilizing technologies such as Rational Rose, C++, windows, XML, XSLT and DII COE.

Prior to Boeing, Steve was manager of software development for AST Research where his charter covered device drivers, PC BIOS, diagnostics, and automated software installation. AST, during much of this time period, was the number three PC manufacturer, under IBM and Compaq. He has also developed an event driven operating system for stress testing at Basic Four and a multitasking diagnostic that ran under Windows to increase throughput of PC manufacturing.

~Summary~

LA ACM Chapter December Meeting
Held Wednesday January 5, 2005

LA ACM Chapter January Meeting. Held January 5, 2005.

The presentation was "XML: The Common Denominator" by Steve Rowell of Inferential Solutions. Steve is currently working at Boeing as a consultant. This was a regular meeting of the Los Angeles Chapter of ACM. Steve presented a short description of the PDS that he gave recently using descriptive material from that one day presentation.

Extensible Markup Language (XML) is becoming a common denominator. There have been attempts to provide program interoperability such as Java and C# but these have not been too successful in being recognized as standards. XML is intended to provide data interoperability and everyone seems to be embracing it.

XML describes, contains and orders (structures) data. You must define your own XML tags unlike HTML where the tags are already defined. XML is not a language but a meta language that is used to create other languages. XML uses a Document Type Definition (DTD) or an XML Schema to describe the data. There are differences between XML and HTML. In HTML tags are predefined and it is used to display data. XML carries data and complements HTML rather than replacing it.

XML has a number of advantages over HTML. XML can be used to separate the data from the presentation. You can change data or display options independently. With XML data can be exchanged with incompatible systems. Data can be stored and shared using plain text documents. The data is available for more users. XML has been used to make new languages (WML, SVG, SOAP, mathml, CML, and WAP).

In the real world IT computer systems and databases contain data in incompatible formats. It is a time consuming challenge to exchange data between such systems. Converting to XML can greatly reduce this complexity and create data that can be read by many different types of applications providing interoperability between diverse systems.

Why XML? Most new applications will exchange their information in XML. This can solve the problem of information being stored in proprietary mediums and frees up the flow of data. HTML's simplicity has allowed the Internet to grow fast, but its simplicity is now restricting it as a foundation for future progress. XML is making up for HTML's lack. It allows you to self-describe your information, giving your information structure and content and making it more easily understandable to people. It increases the efficiency of Business to Business (B2B) applications in filling purchase orders and tracking sales information, etc. There are many mature DTD and XML Schemas vocabularies/processors available like SVG, SMIL, voicexml, CML, etc. Big companies such as IBM and Microsoft are using it to provide XML based web services. Errors in XML syntax or DTD/Schema validation will flag bad data before being committed to program execution. Stopping "garbage in" is much of the solution for solving the problem of "garbage out".

Steve presented simple examples of HTML and XML code. The first line of an XML document is a declaration which specifies the XML version and the character set used. A parser uses this to know which version of the recommendations the document follows. All XML documents must have a root element and there are child elements which must be within the root element. All elements can have child elements and child elements must be correctly nested in their parent element. Element tags must have closing tags except for the declaration which is not an element. XML tags are case sensitive. Every element has one parent and elements can be extended to add more information. Elements can have simple content, mixed content and attributes. Simple content is just text. Element content is an element that contains other elements. Mixed content is both simple content and element content. XML elements can have attributes. An attribute is a name-value pair attached to the element's start tag. There is an equal sign between the attribute's name and its value. Values are enclosed in either single or double quotes. There are XML rules that apply to both element and attribute names. XML names may contain any alphanumeric character and three punctuation marks; the period, the underscore and the hyphen. Names may not contain any white spaces and may not begin with the string xml. Names must start with letters on an underscore, not with a number, period, or hyphen.

A well formed XML document follows XML syntax and a Valid XML document conforms to a Document Type Definition or XML Schema. A valid document is a well formed document which conforms to the specifications of a DTD or XML Schema. The W3C XML Schema is an alternative to DTD which is also used for validation. The purpose of validation is to define the legal building blocks of an XML document. Any document that conforms to a Schema or DTD is an instance of it. A Schema is a formal description of what comprises a valid XML document. A Schema is to a XML document as a regular expression is to a string or as a grammar is to a language. DTDs provide the ability to do basic validation of the following items in XML documents: Element nesting, element occurrence, allowed attributes, and attribute types and default values. The Schema provides fine control over the format and data types of element and attribute values. The schema standard provides: Simple and complex data types, type derivation and inheritance, and element occurrence constraints.

To address weaknesses in the DTD the W3C developed the XML Schema. The industry has developed other schemas with strengths and weaknesses. Examples are RELAX NG and Schematron. The lack of strong data types in XML was the motivation for developing the schemas. About half of the specification comprises data types. XML Schema data types are defined in XML Schema Part2: Data Types, which became a W3C recommendation in May 2001.

XSL (Extensible Style Sheet Language) is divided into two parts. XSL transformations (XSLT) and XSL Formatting Objects (XSL-FO). There is not much industry support for XSL-FO. IE 5.0 and 5.5 only support an old copy of a working draft of XSLT. IE 6.0 supports newer versions.

The DOM (Document Object Model) level 2 Core specification was released on November 13, 2000. Steve showed a sample DOM and a diagram of that DOM viewed as a tree. The DOM is an object model in the traditional object oriented design sense. Documents are modeled using objects and the model encompasses not only the structure of the document, but also the behavior of a document and the objects of which it is composed. As an object module, the DOM identifies the interfaces and objects used to represent and manipulate a document, the semantics of these interfaces and objects including both behavior and attributes, and the relationships and collaborations among these interfaces and objects. DOM is language and operating system neutral. It uses IDL (Interface Definition Language) from OMG (Object Management Group) to define its interfaces which can have multiple language bindings. The DOM requires loading an entire XML document which could be difficult. The DOM document can be ten times the size of the text file. It has the advantage of providing ease of random access of all parts and is one of the most mature and best supported technologies available.

SAX (Simple API for XML) is event driven. The parser reads an XML document and sends parts of the document in order and real-time so the document is presented to your program one piece at a time from top to bottom. Data can be saved, processed or discarded when received. SAX has an advantage over DOM because the entire document does not need to be in memory.

Web Services have become important. A Web Service is programmable application logic that uses standard Internet protocols. Web services have functionality that is easily reused without knowing how the service is implemented. Web services have emerged as a powerful tool for integrating disparate IT systems and assets. IBM and Microsoft have championed Web Services but disagree on implementation details. Web services communicate using platform independent and language neutral protocols that ensure easy integration of heterogeneous environments. A Web Service provides an interface that can be called from another program, can be invoked from any type of application client or other Web Service. The interface acts as a liaison between the Web and the actual application logic that implements the service.

There are a number of Web Service technologies. The Simple Object Access Protocol (SOAP) defines a standard communication protocol. The Web Services Description Language (WSDL) defines a standard mechanism to use a Web Service. The Universal Description, Discovery and Integration (UDDI) provides a standard mechanism to register and discover Web Services.

Steve Rowell presented an excellent overview of his well received, more detailed XML PDS presentation that took an entire day to present. His talk showed what XML is, its advantages, and why you should want to learn about it and use it. This DATA-LINK article contains a considerable portion of the slides he presented, but does not have the insights and explanation so well provided by Steve.

This was fifth meeting of the LA Chapter year and was attended by about 15 persons.
Mike Walsh, LA ACM Secretary 

The next chapter meeting will be on Jan. 5th. Watch this space in December for program information.
Come join us!


The Los Angeles Chapter normally meets the first Wednesday of each month at the LAX Plaza Hotel, 6333 Bristol Parkway, Culver City. The program begins at 8 PM.    From the San Diego Freeway (405) take the Sepulveda/Centinela exit southbound or the Slauson/Sepulveda exit northbound.

5:15 p.m.  Business Meeting

6:30 p.m.  Cocktails & Social

7:00 p.m.  Dinner

The menu choices are listed in the table above.

Avoid a $3 surcharge!!
Reservations must be made by the Sunday preceding the meeting to avoid the surcharge.

Make your reservationsearly.

8:00 p.m.  Presentation

 
Reservations

To make a reservation, call or e-mail Matt Reese, (626)794-5626, and give your name and telephone number, by the Sunday before the dinner meeting.

There is no charge or reservation required to attend the presentation at 8:00 p.m.. Parking is FREE!

For membership information, contact Mike Walsh, (818)785-5056 or follow this link.

Other Affiliated groups

SIGAda   SIGCHI SIGGRAPH  SIGPLAN

LA SIGAda

Return to "More"

LA  SIGGRAPH

Please visit our website for meeting dates, and news of upcoming events.

For further details contact the SIGPHONE at (310) 288-1148 or at Los_Angeles_Chapter@siggraph.org, or www.siggraph.org/chapters/los_angeles

Return to "More"

Past Meeting Archive Los Angeles ACM home page National ACM home page Top

 Last revision: 2005 0111 [Webmaster]