Home | Industry Information | Business News | Browse by Publication | M | Management Science

Analyzing consumer-product graphs: empirical findings and applications in recommender systems.

Publication: Management Science
Publication Date: 01-JUL-07
Format: Online
Delivery: Immediate Online Access

Article Excerpt
1. Introduction

The past few years we have witnessed dramatic growth in research using random graph theory to study complex systems in a wide variety of scientific, engineering, and social domains. Examples include telecommunication networks and the World Wide Web (Faloutsos et al. 1999); as...

View more below

Read this article now - Try Goliath Business News - FREE!   
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access

Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More   Terms and Conditions

Purchase this article for $6.95

Already a subscriber? Log in to view full article

...biological systems such genetic and metabolic regulation, protein folding, and neural networks (Amaral et al. 2000, Jeong et al. 2001); scientific literature coauthorship and citation networks (Barabasi et al. 2002, Newman et al. 2002); and social networks of collaborators and acquaintances (Amaral et al. 2000, Watts 1999). Various graph-generation mechanisms have been explored to explain the topological structure of these real-world networks and to predict global and local network features--e.g., the robustness of the Internet (Albert et al. 2000) and the creativity and performance of teams in collaboration networks (Guimera et al. 2005, Uzzi and Spiro 2005).

Several recent studies have adopted the methodology of random graph modeling to study phenomena in the business world, including interlocking boards of directors (Conyon and Muldoon 2004, Davis et al. 2003, Robins and Alexander 2004), corporate governance and ownership (Baum et al. 2003, Kogut and Walker 2001), and electronic bidding behavior (Yang et al. 2003). These studies have focused on characterizing the statistical topological properties of the business networks and linking these properties to underlying individual and organizational behavioral explanations. Many of these studies have shown that random graph modeling methodology can be fruitfully applied to business problems and can bring about useful insights. Two examples of such findings are the resilience of the structure of the corporate elite to macro and micro changes affecting corporate governance (Davis et al. 2003), and the domination of online auctions by an unusually active minority (Yang et al. 2003).

This paper applies random graph modeling to study consumer purchase behavior in e-commerce settings. In particular, we are interested in studying the evolution of interactions among consumers and products reflected in online-sales transactions. It has been recognized in marketing research that observed consumer-purchase behavior is the key to predicting consumer behavior offline (Ehrenberg 1988) and online (Bellmann et al. 1999). By representing consumers and products as vertices, and sales transactions as edges linking consumer and product vertices, we view the entire transaction history as a growing consumer-product graph. We examine the structure of this consumer-product graph with an attempt to gain insights regarding the underlying mechanism that governs consumer-purchase behavior and improve predictive analysis of the consumers' future purchases.

Sales transactions, as a major form of interaction between consumers and products, have also been extensively studied computationally in recommender system research (Resnick and Varian 1997). As an important component of the marketing and customer relationship management process, a recommender system suggests products and services to potential customers based on the observed customer behavior and the customer and product attributes. Sales transaction data are a major input to many algorithmic engines for commercial recommender systems and personalization systems (Huang et al. 2004, Schafer et al. 2001). In fact, the collaborative filtering approach (Hill et al. 1995, Resnick et al. 1994), arguably the most successful recommendation approach, relies on transaction data alone to make recommendations. The success of such a recommendation approach relies on the existence of consistent consumer-purchase patterns reflected in the sales transactions alone (as opposed to the consumer demographic and product attribute information) and on how well the algorithms can capture such patterns. Our research is aimed at exploring the use of random graph modeling of sales transactions to reveal the characteristics of consumer-behavior patterns and to improve the performance of recommender systems.

The intended contributions of this paper are twofold. First, to the best of our knowledge, we present the first empirical study of consumer-product graphs using the complex systems/random graph analysis methodology. Using two online retail data sets, we compare measures of topological characteristics of consumer graphs and product graphs (projected from consumer-product graphs representing the sales transactions) with the theoretical predictions given by a random graph model. We find that these consumer and product graphs show a consistently higher tendency to cluster over time and have path lengths or distances that are relatively short, but still longer than those predicted by the random graph model. These findings strongly suggest that the product choices of a community of consumers are not random, and in turn empirically confirm the fundamental assumption of recommender system research. Further simulation-based investigation embedding two popular recommendation algorithms as the graph-generation mechanism shows that such algorithms can lead to graphs that provide better matches with real-world consumer-product graphs than those produced by the purely random model. However, significant deviations remain. Our second contribution is a new graph partitioning-based recommendation algorithm motivated by the empirical and simulation-based findings summarized above. Our experimental study using the real-world e-commerce data sets shows empirically that this algorithm significantly outperforms representative collaborative filtering algorithms in situations where the clustering coefficients of the consumer-product graphs are sufficiently larger than can be accounted for by these standard algorithms.

The remainder of this paper is structured as follows. In [section]2, we provide a brief review of the two streams of literature relevant to this paper. Section 2.1 reviews the literature on random graphs and introduces topological measures and random graph models that are relevant to our research. Section 2.2 briefly surveys the recommender system literature and discusses representative recommendation algorithms relevant to our research. Section 3 describes the consumer-product graph representation of sales transactions and illustrates how to project this consumer-product graph into consumer and product graphs. We also summarize theoretical predictions of the topological measures of the consumer and product graphs under the assumption of a randomly generated consumer-product graph. In [section]4, we present the empirical and simulation-based findings on consumer-product graphs using two online retail data sets. In [section]5, we discuss conceptual implications of our findings in the context of recommender systems. These findings motivated the development of a new graph-based recommendation algorithm. Section 5 contains a detailed discussion of this algorithm and related computational evaluation. We conclude the paper in [section]6 by summarizing our findings and pointing out future research directions.

2. Research Background and Related Work

2.1. Related Random Graph Modeling Research

Graphs have been used to represent various types of relationships in a wide range of complex systems. As is common in the random graph modeling literature, we use graph and network interchangeably in this paper. Random, graph modeling research exploits a graph representation of complex systems and is aimed at capturing the mechanisms that determine the network topology of such systems. The key assumption is that the fundamental mechanism that governs the generation of relationships among components of a system leaves certain identifiable traits in the resulting network topology. Thus, a simple graph-generation model that can reproduce similar topological features of the real network may bring important insights to understanding the actual mechanism that governs the real system. The domain-specific interpretations of such abstract graph-generation models could potentially lead to development of theories regarding the interactions among the system constituents.

Many recent studies show that real-world networks demonstrate surprisingly consistent topological characteristics across different domains (Albert and Barabasi 2002). Three major concepts related to such topological features are small world, clustering, and scale-free phenomena.

Small World. The small world concept describes the fact that despite their often large size, most networks exhibit a relatively short path between any two vertices. The distance between two vertices is defined as the number of edges along the shortest path connecting them. The average path length (or typical/characteristic distance) measure L, defined as the average of the path lengths of all connected vertex pairs, quantifies this property.

Clustering. Many real-world networks show an inherent tendency to cluster. A typical example is social networks, in which cliques form, representing circles of friends or acquaintances in which every member knows every other member. Such a tendency is quantified by the clustering-coefficient measure (Newman et al. 2001, Watts and Strogatz 1998). We adopt the Newman definition:

C = [3 x (number of triangles in the graph)]/number of connected triples, (1)

where a triangle is a set of three vertices, each of which is connected to both of the others, and a connected triple is three vertices x-y-z, with both vertices x and z connected with y (note that x-y-z and x-y-z are considered the same connected triple). The factor 3 in the numerator accounts for the fact that each triangle contributes to three connected triples of vertices. The clustering coefficient C is strictly bounded between and 1 and measures the extent to which being a neighbor is a transitive property. In our context, for example, a consumer graph represents relationships between consumers who purchase the same products. In a consumer graph with a high clustering coefficient (close to 1), such a copurchase relationship tends to be transitive under most cases, i.e., if consumers a and b purchase the same products and consumers b and c purchase the same products, then consumers a and c are highly likely to do so as well.

Scale-Free Phenomena. The scale-free property is linked to the degree distribution of a graph. The degree of a vertex in a graph is the number of edges incident on that vertex. We define [p.sub.k], known as the degree distribution of the graph, to be the probability that a vertex chosen uniformly at random has degree k (i.e., the fraction of vertices that have degree k). Scale-free graphs refer to graphs with power-law degree distributions as described by (2):

[p.sub.k] ~ [k.sup.-[alpha]] (2)

where [alpha] is a positive constant. Power-law degree distributions have been observed in a wide range of networks, including many of the real networks mentioned previously.

Various graph-generation models have been explored to explain empirically observed topological characteristics of real-world networks. The classic random graph model developed by Paul Erdos and Alfred Renyi (Erdos and Renyi 1959), called the ER model, studies a graph as N labeled nodes connected by n edges that are chosen randomly from N(N - 1)/2 possible edges. In the ER model, the generation of the graphs is conditional only on the size of the graph and the vertex connection probability (or number of edges). Despite its simplicity, the ER model serves as the foundation for the analysis of statistical properties of graphs generated by a wide range of probabilistic graph-generation models. Although these models assume certain nonrandom principles governing the graph-generation process, they still contain certain random elements that differentiate them from deterministic graph models. In the literature, the term random graph is sometimes used to refer to the purely random ER model. For simplicity, in this paper we use this term in a broader sense to refer to both the purely random ER model and newer...

NOTE: All illustrations and photos have been removed from this article.



More articles from Management Science
Complex systems--a new paradigm for the integrative study of managemen..., July 01, 2007

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.