I have to do an overview of XML for my manager/client/investor/advisor. What should I mention?
[
]
* XML is not a markup language. XML is a ‘metalanguage’, that is, it's a language that lets you define your own markup languages (see definition).
* XML is a markup language [two (seemingly) contradictory statements one after another is an attention-getting device that I'm fond of], not a programming language. XML is data: is does not ‘do’ anything, it has things done to it.
* XML is non-proprietary: your data cannot be held hostage by someone else.
* XML allows multi-purposing of your data.
* Well-designed XML applications most often separate ‘content’ from ‘presentation’. You should describe what something is rather what something looks like (the exception being data content which never gets presented to humans).
Saying ‘the data is in XML’ is a relatively useless statement, similar to saying ‘the book is in a natural language’. To be useful, the former needs to specify ‘we have used XML to define our own markup language’ (and say what it is), similar to specifying ‘the book is in French’.
A classic example of multipurposing and separation that I often use is a pharmaceutical company. They have a large base of data on a particular drug that they need to publish as:
* reports to the FDA;
* drug information for publishers of drug directories/catalogs;
* ‘prescribe me!’ brochures to send to doctors;
* little pieces of paper to tuck into the boxes;
* labels on the bottles;
* two pages of fine print to follow their ad in Reader's Digest;
* instructions to the patient that the local pharmacist prints out;
* etc.
Without separation of content and presentation, they need to maintain essentially identical information in 20 places. If they miss a place, people die, lawyers get rich, and the drug company gets poor. With XML (or SGML), they maintain one set of carefully validated information, and write 20 programs to extract and format it for each application. The same 20 programs can now be applied to all the hundreds of drugs that they sell.
In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML:
* browsers allow non-compliant HTML to be presented;
* HTML is restricted to a single set of markup (‘tagset’).
If you let broken HTML work (be presented), then there is no motivation to fix it. Web pages are therefore tag soup that are useless for further processing. XML specifies that processing must not continue if the XML is non-compliant, so you keep working at it until it complies. This is more work up front, but the result is not a dead-end.
If you wanted to mark up the names of things: people, places, companies, etc in HTML, you don't have many choices that allow you to distinguish among them. XML allows you to name things as what they are:
<person>Charles Goldfarb</person> worked
at <company>IBM</company>
gives you a flexibility that you don't have with HTML:
<B>Charles Goldfarb</B> worked at<B>IBM<</B>
Choose your rating:
Post Ans. View More Ans.
How can I include a conditional statement in my XML?
[
]
You can't: XML isn't a programming language, so you can't say things like
<google if {DB}="A">bar</google>
If you need to make an element optional, based on some internal or external criteria, you can do so in a Schema. DTDs have no internal referential mechanism, so it isn't possible to express this kind of conditionality in a DTD at the individual element level.
It is possible to express presence-or-absence conditionality in a DTD for the whole document, by using parameter entities as switches to include or ignore certain sections of the DTD based on settings either hardwired in the DTD or supplied in the internal subset. Both the TEI and Docbook DTDs use this mechanism to implement modularity.
Alternatively you can make the element entirely optional in the DTD or Schema, and provide code in your processing software that checks for its presence or absence. This defers the checking until the processing stage: one of the reasons for Schemas is to provide this kind of checking at the time of document creation or editing.
Choose your rating:
Post Ans. View More Ans.
Can I (and my authors) still use client-side inclusions?
[
]
The same rule applies as for server-side inclusions, so you need to ensure that any embedded code which gets passed to a third-party engine (eg calls to SQL, VB, Java, etc) does not contain any characters which might be misinterpreted as XML markup (ie no angle brackets or ampersands). Either use a CDATA marked section to avoid your XML application parsing the embedded code, or use the standard <, and & character entity references instead.
Choose your rating:
Post Ans. View More Ans.
Can I still use server-side inclusions?
[
]
Yes, so long as what they generate ends up as part of an XML-conformant file (ie either valid or just well-formed).
Server-side tag-replacers like shtml, PHP, JSP, ASP, Zope, etc store almost-valid files using comments, Processing Instructions, or non-XML markup, which gets replaced at the point of service by text or XML markup (it is unclear why some of these systems use non-HTML/XML markup). There are also some XML-based preprocessors for formats like XVRL (eXtensible Value Resolution Language) which resolve specialised references to external data and output a normalised XML file.
Choose your rating:
Post Ans. View More Ans.
I am trying to understand the XML Spec: why does it have such difficult terminology?
[
]
"Answer not available"
Do I have to change any of my server software to work with XML?
[
]
The only changes needed are to make sure your server serves up .xml, .css, .dtd, .xsl, and whatever other file types you will use as the correct MIME content (media) types.
The details of the settings are specified in RFC 3023. Most new versions of Web server software come preset.
If not, all that is needed is to edit the mime-types file (or its equivalent: as a server operator you already know where to do this, right?) and add or edit the relevant lines for the right media types. In some servers (eg Apache), individual content providers or directory owners may also be able to change the MIME types for specific file types from within their own directories by using directives in a .htaccess file. The media types required are:
* text/xml for XML documents which are ‘readable by casual users’;
* application/xml for XML documents which are ‘unreadable by casual users’;
* text/xml-external-parsed-entity for external parsed entities such as document fragments (eg separate chapters which make up a book) subject to the readability distinction of text/xml;
* application/xml-external-parsed-entity for external parsed entities subject to the readability distinction of application/xml;
* application/xml-dtd for DTD files and modules, including character entity sets.
The RFC has further suggestions for the use of the +xml media type suffix for identifying ancillary files such as XSLT (application/xslt+xml).
If you run scripts generating XHTML which you wish to be treated as XML rather than HTML, they may need to be modified to produce the relevant Document Type Declaration as well as the right media type if your application requires them to be validated.
Choose your rating:
Post Ans. View More Ans.
What are the special characters in XML ?
[
]
For normal text (not markup), there are no special characters: just make sure your document refers to the correct encoding scheme for the language and/or writing system you want to use, and that your computer correctly stores the file using that encoding scheme. See the question on non-Latin characters for a longer explanation.
If your keyboard will not allow you to type the characters you want, or if you want to use characters outside the limits of the encoding scheme you have chosen, you can use a symbolic notation called ‘entity referencing’. Entity references can either be numeric, using the decimal or hexadecimal Unicode code point for the character (eg if your keyboard has no Euro symbol (€) you can type €); or they can be character, using an established name which you declare in your DTD (eg ) and then use as € in your document. If you are using a Schema, you must use the numeric form for all except the five below because Schemas have no way to make character entity declarations.
If you use XML with no DTD, then these five character entities are assumed to be predeclared, and you can use them without declaring them:
<
The less-than character (<) starts element markup (the first character of a start-tag or an end-tag).
&
The ampersand character (>) starts entity markup (the first character of a character entity reference).
>
The greater-than character (>) ends a start-tag or an end-tag.
"
The double-quote character (") can be symbolised with this character entity reference when you need to embed a double-quote inside a string which is already double-quoted.
'
The apostrophe or single-quote character (') can be symbolised with this character entity reference when you need to embed a single-quote or apostrophe inside a string which is already single-quoted.
If you are using a DTD then you must declare all the character entities you need to use (if any), including any of the five above that you plan on using (they cease to be predeclared if you use a DTD). If you are using a Schema, you must use the numeric form for all except the five above because Schemas have no way to make character entity declarations.
Choose your rating:
Post Ans. View More Ans.
How can I handle embedded HTML in my XML
[
]
Apart from using CDATA Sections, there are two common occasions when people want to handle embedded HTML inside an XML element:
1. when they have received (possibly poorly-designed) XML from somewhere else which they must find a way to handle;
2. when they have an application which has been explicitly designed to store a string of characters containing < and & character entity references with the objective of turning them back into markup in a later process (eg FreeMind, Atom).
Generally, you want to avoid this kind of trick, as it usually indicates that the document structure and design has been insufficiently thought out. However, there are occasions when it becomes unavoidable, so if you really need or want to use embedded HTML markup inside XML, and have it processable later as markup, there are a couple of techniques you may be able to use:
* Provide templates for the handling of that markup in your XSLT transformation or whatever software you use which simply replicates what was there, eg
<xsl:template match="b">
<b>
<xsl:apply-templates/>
</b>
</xsl:template/>
* Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup verbatim, eg
<xsl:template match="ol">
<xsl:copy-of select="."/>
</xsl:template/>
* As a last resort, use the disable-output-escaping attribute on the xsl:text element of XSL[T] which is available in some processors, eg
<xsl:text disable-output-escaping="yes"><![CDATA[<b>Now!</b>]]></xsl:text>
* Some processors (eg JX) are now providing their own equivalents for disabling output escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary code from being passed untouched through your system. It also adds another dependency to your software.
For more details of using these techniques in XSL[T], see the relevant question in the XSL FAQ.
Choose your rating:
Post Ans. View More Ans.
When should I use a CDATA Marked Section?
[
]
"Answer not available"
What is parsing and how do I do it in XML
[
]
"Answer not available"
|