|
...particular a number of systems have been built to design information graphics based on user's information needs (e.g., visualizing a file system using a cone tree (1)). However, information graphics created by these systems are carefully designed and preprogrammed by hand. Handcrafting information graphics is a difficult and time-consuming task, and relatively few information system developers have had training in graphic design. Moreover, handcrafting information graphics in advance is inadequate for an interactive environment, where different users may have different tasks in mind (e.g., comparing two data items vs understanding a trend) or may change their tasks in the course of information exploration. To support dynamic design and customization, researchers have been investigating approaches to the automated generation of information graphics. There are two main approaches: rule-based and example-based graphics generation.
For the past decade, researchers have been developing systems that can automatically create information graphics using a set of design rules. Given a set of data entities, a presentation intent (e.g., summarize a set of information), and a presentation context (e.g., a specific type of users and presentation devices), an automated graphics generation system automatically constructs a customized information graphic that communicates the input data to the intended users in the specified context. For example, systems like APT, (2) SAGE, (3) and IMPROVISE (4) employ handcrafted design rules to map data onto proper information graphics. Nevertheless, the rule-based approaches present two major problems. First, acquiring design rules manually is difficult. Handcrafting design rules can be laborious, as experts may need to process large amounts of evidence before extracting rules. Second, maintaining and extending a large rule base is difficult. As the rule base grows, it is difficult to integrate new rules with existing rules, and to discover and rectify inconsistencies among the rules.
Alternatively, researchers have started to explore example-based design approaches. (5,6) Instead of using rules, an example-based approach starts with a set of existing examples. Upon a user's request (e.g., find presentations that are suitable for my data), this approach searches through a graphics database and retrieves the most relevant examples. The retrieved examples can then be directly reused for, or adapted to, the new situation (e.g., new data or new visual preference). However, an example-based approach alone is usually inadequate, because it is difficult to learn all low-level design details directly from examples (e.g., the exact scale and position of every graphic element).
To overcome these problems, we are exploring a hybrid approach, which makes use of both rule-based and example-based generation. Our approach is embodied in a prototype system, IMPROVISE*, which is an extension of our previous rule-based generation system, IMPROVISE. (4) On the one hand, IMPROVISE* employs an example-based approach to improve extensibility, allowing new graphic examples to be easily added and reused. On the other hand, IMPROVISE* utilizes rules to determine various graphic details efficiently, complementing example-based generation.
Our focus here is on applying machine learning to support systems like IMPROVISE*. To the best of our knowledge, our work is the first to apply machine learning to automated information graphics generation. Our contributions are threefold. First, we introduce a unique feature selection/extraction method for information graphics design. In particular, we define an object-oriented, integrated hierarchical feature representation for annotating information graphics. As described later, our annotation captures cohesively both data and visual characteristics. Our annotation formalism provides the needed foundation for applying machine learning to automated graphics generation.
Second, we address how to systematically acquire design rules from existing graphic examples. Specifically, we employ a decision-tree learning algorithm to induce design rules automatically from a set of annotated graphic examples. Our results demonstrate that machine learning helps the system to acquire useful design rules automatically and to improve the overall quality of a rule base (e.g., by simplifying expert-derived rules and removing redundancies).
Third, we investigate how to create new information graphics directly from existing graphics to augment our previous rule-based generation approach. (4) In particular, we develop a case-based learning method. We use a semantics-based, quantitative visual similarity measuring model to retrieve top-matched graphic examples, which can then be directly reused for, or be adapted to, the new situation.
The paper is organized as follows: first we discuss related work, then provide an overview of IMPROVISE*. We then introduce an object-oriented, integrated hierarchical feature representation for annotating graphic examples. Using our annotation formalism, we present the systematic application of decision-tree learning to extract design rules from annotated graphic examples. We then describe a case-based learning method that uses a semantics-based quantitative visual similarity measuring model to retrieve relevant graphic examples. Finally we conclude and indicate future research directions.
Related work
Before presenting our machine-learning approaches to information graphics generation, we discuss related work from two aspects: machine learning in rule acquisition and example-based graphics generation.
Automatic rule acquisition. Machine-learning techniques, especially supervised learning, have been widely applied to rule acquisition in other domains (e.g., speech synthesis (7)). However, applying these techniques to the acquisition of graphic design rules has hardly been addressed. One of the causes lies in the difficulty of expressing information graphics accurately and comprehensively.
To identify features for machine learning, we have adopted previous research results on data and graphics characterization. On the data side, researchers establish data characterization taxonomies to abstract what and how presentation-related data properties influence visual encoding strategies. (8-10) To describe visual patterns systematically, on the other hand, researchers characterize different visual formats, (11) formulate a set of graphics languages, (2) and define the formal syntax or semantics of a particular visual representation. (12) However, we have extended these efforts to characterize both the data and visual patterns more comprehensively. In particular, we developed an object-oriented hierarchical feature representation to uniformly express the semantic, meta-level, and structural properties of a graphic at multiple levels of abstraction. This is also different from the conventional feature representation used in many machine-learning applications, where a flat structure of simple features is used.
Example-based graphics generation. Programming by demonstration is perhaps the earliest technique that supports example-based graphics generation. (13) Through user demonstration, this approach attempts to generalize the behavior to a design principle, which can then be applied to an entire class of tasks in the future. However, coming up with the right set of examples requires design knowledge on the part of the user. (14)
Instead of asking users to supply the right examples, a recent example-based generation approach allows a user to select an example to be automated from the user's past requests, or operations. (6) The user can modify the selected example or refine it for performing future tasks. This approach also uses a hierarchical representation to express the semantics of stored examples and new user operations. However, it matches a new user operation with a single selected example based solely on the similarity of lowest level components, without considering compositional structures. In contrast, IMPROVISE* matches a user request against multiple examples, and it takes into account both content and structural similarities and differences.
Another closely related example-based generation system is Sage/SageBook. (5) Upon a user's request, Sage searches its stored visual presentations and retrieves those relevant for reuse in the new situation. It uses both data features and visual properties to describe the stored presentations and user requests. Moreover, Sage matches a user request with a presentation example by qualitatively comparing lowest level data and visual contents (e.g., data elements and graphemes). IMPROVISE* differs significantly from Sage in its use and representation of examples and in its method for matching requests to examples. First, Sage uses only examples that it creates, (3) whereas IMPROVISE* exploits graphic examples from a wide variety of sources, which may or may not be generated by our system. Second, Sage employs a flat data characterization, separated from its hierarchical visual feature description. In contrast, we express both visual and data features hierarchically and integrate visual features with their corresponding data features at every level of the abstraction. Third, Sage uses a qualitative matching method to retrieve desired examples, whereas we develop a quantitative similarity measuring method to facilitate a more accurate comparison between examples and user requests. In addition, we allow users to dynamically adjust various weights in our similarity model and to submit partial requests with different matching criteria.
IMPROVISE * overview
Figure 1 shows the high-level components of IMPROVISE *. The initial input is a set of user requests, including the data to be conveyed, presentation intent, and presentation context. The output is an information graphic that conveys the input data. A typical generation process includes three main steps.
Step 1. Starting with a set of inputs, IMPROVISE* always attempts to find the matched examples first, because the examples imply the most specific rules. To search for a set of relevant examples, IMPROVISE* calculates the similarity distances between the user request and the existing graphics. Consequently, it retrieves the top-k matched examples (e.g., k = 3). If there are no matched examples (e.g., the similarity difference exceeds a certain threshold), IMPROVISE* will attempt to generate a sketch using a rule-based approach. (15)
Step 2. Using the matched examples, IMPROVISE* first creates a sketch, which is an intermediate representation of an information graphic. A sketch outlines the basic visual...
NOTE: All illustrations and photos
have been removed from this article.

More articles from IBM Systems Journal
Machine intelligence and the Turing Test. (Technical forum)., September 01, 2002 An architecture of diversity for commonsense reasoning. (Technical for..., September 01, 2002 Arguing A.I.: The Battle for Twenty-First Century Science.~(book revie..., September 01, 2002
Looking for additional articles?
Search our database of over 3 million articles.
Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication
name or publication date.
About Goliath
Whether you're looking for sales prospects, competitive information, company
analysis or best practices in managing your organization,
Goliath can help you meet your business needs.
Our extensive business information databases empower business
professionals with both the breadth and depth of credible,
authoritative information they need to support their business
goals. Whether it be strategic planning, sales prospecting,
company research or defining management best practices -
Goliath is your leading source for accurate information.
|