|
Article Excerpt Integrating efficient XML publishing into high-volume content environments remains a significant challenge. Among the many real-world barriers: the need to convert quantities of paper and other legacy documents and to integrate easy-to-use XML publishing tools into the content-creation process, and the lack of workflow management tools necessary for mass conversion environments.
In many environments content creators resist using XML authoring tools, preferring traditional word processing or desktop publishing applications, and simplified "template-style" DTDs are used to accommodate productivity requirements. Consequently, high-volume XML conversions are typically accomplished through "brute force" solutions, where mass OCR (optical character recognition) scanning and tagging are done through expensive outsourcing, often to developing countries where labor for repetitive high-volume publishing tasks is plentiful and inexpensive.
To realize the full benefits of XML for both highly structured and mixed-structure content in high volumes -- without the cost, cycle time requirements, and other outcomes inherent in outsourcing -- an XML publishing system that minimizes the ongoing intervention of XML programmers is essential. To efficiently convert documents in Word, HTML, PDF, RTF, or other formats into XML, intelligent, rule-based automated markup solutions are required.
For large-volume projects, efficient XML publishing also requires batch processing and workflow management solutions that optimize productivity. In this article I'll discuss the process requirements for high-volume XML creation and introduce new tools and technologies expressly developed for these mass-conversion environments.
The High-Volume XML Publishing Challenge
Any document can be represented in XML, but document types vary widely, creating diverse challenges for high-volume publishing requirements. Documents can possess data that ranges from highly consistent and repetitive to extremely diverse content structures that defy accurate digital representation. For example, accident reports, product catalogs, employment applications, financial forms, and other types of documents are very amenable to automated XML conversion solutions. On the other hand, dissertations, marketing reports, resumes, news articles, and other documents feature abstract intellectual information with highly diverse components and inconsistent composition.
For highly structured content, identifying and tagging variables can easily be automated through forms, scripts, and other techniques, but mixed-structure data requires an XML authoring tool or a postauthoring conversion process. The assignment of tags to elements in a document is fundamentally a separate and distinct exercise from the authoring process. Authors may intuitively recognize elements -- such as a "phone number," "chapter heading," "ingredient," or "customer type," but their identification as an element requires a start tag, an element type, and an end tag delimited by brackets ( ). This process can be simplified and accelerated, but it can't be eliminated.
Requiring content creators -- knowledge workers such as technical writers, paralegals, insurance adjusters, law...
|
|

More articles from XML Journal
Markup is madness: the only standard that's mandatory is TCP/IP -- tha..., March 01, 2002 Happy 4th Birthday, XML! As the publisher of XML-Journal, we had to as..., March 01, 2002 XML without wires Part 1 of 2: yes, size matters. (compression techniq..., March 01, 2002 Ant: an introduction by example: a build tool based on XML, using Java..., March 01, 2002 A client for testing server-side XML applications: help in speeding up..., March 01, 2002
Looking for additional articles?
Search our database of over 3 million articles.
Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication
name or publication date.
About Goliath
Whether you're looking for sales prospects, competitive information, company
analysis or best practices in managing your organization,
Goliath can help you meet your business needs.
Our extensive business information databases empower business
professionals with both the breadth and depth of credible,
authoritative information they need to support their business
goals. Whether it be strategic planning, sales prospecting,
company research or defining management best practices -
Goliath is your leading source for accurate information.
|
|