|
Article Excerpt This paper introduces a form of neural network known as the
self-organising map (SOM), which has been used extensively outside of marketing. The SOM clusters data in a manner similar to cluster analysis, but has the additional benefit of ordering the clusters, enabling the visualisation of large numbers of clusters. The technique is particularly well suited to the analysis of large datasets.
Introduction
The importance of exploratory data analysis as a means of gaining insights and ensuring the correct specification of statistical models has long been recognised (Ehrenberg 1975; Tukey 1977). Its employment in 'data mining' has seen exploratory data analysis become an integral part of modern quantitative marketing practice. The large numbers of variables we commonly employ in marketing applications presents a great challenge, as the problem of how to best describe multivariate data is far from being well understood.
The last two decades have seen a dramatic growth in techniques aimed at describing multivariate data. Among academic marketers, the key focus has been on the application of clustering algorithms and mixture models (see Wedel & Kamakura 2000). While the popularity of these techniques is testimony to their usefulness, they have largely been developed with market segmentation issues in mind, rather than the goal of providing the analyst with an understanding of the data. (1)
Graphs such as histograms, scatter plots and box plots are generally only useful for understanding the relationship between small numbers of variables. Some approaches that have been developed for understanding datasets containing more than three variables, such as Chernoff Faces (Chernoff 1973; Huff et al. 1981) and Andrews Curves (Andrews 1972; Darden & Flaschner 1974), but these methods cannot be used with more than a few dozen data points, as they require the analyst to compare pictographs representing each of the data points.
In marketing, where we are normally faced with a minimum of a few hundred data points, we traditionally attempt to understand multivariate data using factor analysis, cluster analysis, correspondence analysis, or by 'jumping in' and specifying regression models. This paper demonstrates a technique known as the self-organising map, a form of unsupervised neural network developed by Tuevo Kohonen (1982), which has great potential for application in exploratory data analysis in marketing. While rarely used in marketing (for an exception, see Curry et al. 2003), self-organising maps have been used in more than 3000 studies in other fields, such as acoustics, robotics, physics and neurophysiology (Kohonen 1995; Grabowski 1998).
The paper is organised as follows. First, self-organising maps are described. Next, an empirical comparison of self-organising maps with cluster analysis and correspondence analysis is presented. Finally, conclusions and directions for future research are discussed.
Self-organising maps
The self-organising map (SOM) is a technique that has been widely adopted in computer science. In this paper, the SOM is presented as a type of cluster analysis, where the clusters are ordered to provide an understanding of the relationship between different clusters. As such, the method makes all of the assumptions of most types of k-means cluster analysis (e.g. two-mode interval-scale data, Euclidean distances). Many representations of the SOM make stronger claims for its use, arguing that it is a model of the brain (Kohonen 1995), a mapping algorithm with similar applications to MDS (hence the name, Chen et al. 1995; Kaski & Kohonen 1996), a clustering algorithm and an algorithm that simultaneously maps and clusters (Kohonen 1995; Murtagh 1996). These stronger claims are not explored in this paper.
SOMs attempt to group 'similar' observations. In marketing, the observations most commonly represent people or organisations. How SOMs work is best understood by comparison to a form of cluster analysis known as batch k-means. The following five steps show both the mechanics of SOMs and of batch k-means. The text in italics describes the mechanics of batch k-means. The text in its entirety (i.e. italic and roman) shows the mechanics of SOMs.
Step 1: Specify the number of clusters (i.e. groups of observations) that are required. These clusters are pre-ordered on a lattice (similar to the lattices used to train vines in gardens). Figure 1 shows 12 clusters, each represented by a hexagon, with the 12 clusters ordered on a lattice containing three rows and four columns. (Within the literature on SOMs, the term 'neuron' is generally used instead of 'cluster'.)
Step 2: Randomly assign each observation in the dataset to one cluster.
Step 3: Calculate the average value of each observation in each cluster on each variable. For example, if five pubs have been assigned to a cluster, and they have purchased 10, 12, 18, and kegs of Guinness respectively, the average for the cluster is 8 kegs. This average value of a cluster on each of the variables is referred to...
|