Home | Business News | Browse by Publication | X | XML Journal

Parsing filters.

Publication: XML Journal
Publication Date: 01-FEB-02
Format: Online - approximately 3617 words
Delivery: Immediate Online Access

Article Excerpt
Web applications often need their HTML/XML documents cleaned asynchronously or synchronously. "Cleaning" could be checking for well-formedness, manipulating an expression in the documents, or parsing them for custom-specific functionality. To my knowledge, there are no specific tools that can be molded to cater to custom-specific functionalities in terms of flushing documents in their own custom way. So why not write your own parsing tool -- one that could be specific to your application logic and also have the capabilities of seamlessly integrating and/or wrapping with other existing tools?

This article presents a simple and open framework that eases custom parsing and checking of any type of document -- XML, HTML, text, and so on -- through the concept of Filters and XML-based configurations. The idea here is to define filters, tie them up in a piped fashion, and stream the input documents through these chained filters -- and in the process of streaming, make each filter apply its specific logic on the same streamed source at the same time.

A filter is meant for a specific function, and each one serves its own purpose of massaging the provided input stream. A filter could be an implementation of your own specific logic or a wrapper around an existing tool. If you're familiar with the Unix operating system, the concept appears similar to its piping feature. The difference could be that in this case the parameters are supplied through an XML-based configuration file.

Filters are made Runnable instances and run as threads. This helps in the simultaneous execution of all the filters in parallel in the input stream. Filters are aggregated in a chain and are applied simultaneously, similar to the concept of pipes in Unix.

Piping Diagram

Now let's jump into the details of a filtering framework. The piping shown in Figure 1 is a rough version of what we're going to look into. The Filters to be applied are identified through a configurable XML document and are connected by piping the input and output of each filter. Once the pipe is established, input documents are streamed through these piped filters to the desired location. As the documents are being streamed, each filter applies its own filtering logic on it in parallel. The filtering framework provides plug-and-filter functionality by simply manipulating the configuration file. You can apply some filters for one class of documents and other filters for other classes of documents by modifying the configurations. If you need a new filtering feature, write one and configure its details in the configuration document -- the rest is transparent.

[FIGURE 1 OMITTED]

Design

The class diagram for the filtering parser is shown in Figure 2. It has a filter class that serves as an interface for all the custom filters to implement. The filter interface has methods for setting input stream, output stream, a filtering message, and the filtering logic function. Thus each filter reads from its input source, applies its specific filtering logic, and writes it into its output stream.

[FIGURE 2 OMITTED]

If you want to add your own custom filter, you have to implement this interface and modify the configuration file for the filter to apply its logic. The AbstractFilter class implements Filter and takes care of the common functionality implementations...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from XML Journal
Building Web Services with Java: Making Sense of XML, SOAP, WSDL, and ..., February 01, 2002
Updated DocExpress adds XML links. (XML News).(Brief Article)(Product ..., February 01, 2002
Schema-based spec for business reporting finalized. (XML News).(Brief ..., February 01, 2002
BusinessObjects Developer Suite 5.5 unveiled. (XML News).(Brief Articl..., February 01, 2002
Sun introduces Java XML pack. (XML News).(Brief Article)(Product Annou..., February 01, 2002

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.