This research provides theoretical arguments and empirical evidence for how Genetic Algorithms (GA) can be used for efficient estimation of macro-level diffusion models. Using simulations, we find that GA and Sequential Search-Based-Nonlinear Least Squares (SSB-NLS) provide comparable parameter estimates when the data including peak sales are being used, for a range of error variances, and true parameter values commonly encountered in the literature. From empirical analyses, we find that the forecasting performance of the GA estimates is better than that of SSB-NLS, Augmented Filter, Hierarchical Bayes, and Kalman Filter when only pre-peak sales data are available for estimation. When sales data until the peak time period are available for estimation, SSB-NLS is able to obtain parameter estimates when the starting values provided are the estimates from using GA. The estimates from GA are not biased and do not change in a systematic fashion when post-peak sales data are used, whereas the estimates from SSB-NLS are biased and change in a systematic fashion. Summarizing, we find that GA may be better suited for diffusion-model estimation under the three conditions where SSB-NLS has been found to have problems.

By Rajkumar Venkatesan, Trichy V. Krishnan, and V. Kumar

Originally published in “Marketing Science,” Volume 2, Number 3, Summer 2004

### 1. Introduction to Research

Data available for estimation of the Bass model or its extensions are usually restricted to a set of 12 to 15 observations. There are two reasons for this. First, sales data are typically collected annually to avoid fluctuations in sales within a year and seasonality issues. Second, sales of most of the new products tend to stop growing, and in fact start decreasing, after 7 to 10 years. Because a manager’s interest is very likely to diminish after the growth stage of a new product, researchers have to work with a smaller dataset in many cases. A natural outcome of this problem is researchers’ interest in exploring more and more sophisticated estimation techniques that can extract as much information as possible from smaller datasets with maximum efficiency. Alternatively, researchers have also investigated using information such as advance purchase orders (Moe and Fader 2002) and spatial dimensions of product adoption (Garber et al. 2003) for early prediction of new product sales.

The chronology of the various estimation techniques and the benefits and drawbacks of each method are outlined in Table 1. Of the three timeinvariant estimation techniques employed in the diffusion literature, namely, Ordinary Least Squares (OLS), Maximum Likelihood (ML), and Nonlinear Least Squares (NLS), it is generally accepted that NLS is the best option among the current alternatives (Putsis and Srinivasan 2000). For the Bass (1969) model, NLS is applied to the equation

where *s*(t) is the sales function, m is the market potential parameter, *E*(t) is the normal additive error, *F(*t) is the cumulative density function of time of adoption given by

t = time period, p = coefficient of innovation, and q =coefficient of imitation.

Author | Estimation Method | Advantages | Drawbacks |

Bass (1969) |
Ordinary least squares (OLS) | Easy and straightforward implementation. Good Fit. | Discrete form for a continuous process. No standard errors for the parameter estimates. Parameters may sometimes be outside the allowable range. |

Schmittlien and Mahajan (1982) |
Maximum likelihood (ML) | Continuous form operationalization. Minimizes sampling error. | Not efficient in reducing errors from sources other than sampling. |

Srinivasan and Mason (1986) Jain and Rao (1990) |
Nonlinear least squares (SSB-NLS) | Continuous form operationalization. Provides better fit than other methods, especially maximum likelihood. | Problems with convergence when data does not contain peak time period. Bias and systematic change in parameter estimates. |

Lenk and Rao (1990) | Hierarchial Bayes (HB) | Provides good predictions of future sales including before peak time period when compared with ML. Utilizes variation in previous diffusion histories to avoid convergence in local minima. | Distribution assumptions about parameters. Lower accuracy when diffusion curves are skewed. Not easy for practitioners to use. |

Xie et al. (1997) | Augmented Kalman filter (continuous observation–discrete measurement) | Provides better predictions of future sales including before peak time period when compared with adaptive filter, SSB-NLS, OLS, and ML. | Does not provide as good a forecast as the GA estimates. (Shown in the present study.) Not easy for practitioners to use. |

Present study | Genetic algorithm (GA) | Addresses all three problems associated with SSB-NLS. Consistently performs better than other estimation methods. | Need specific software. |

NLS used in popular computer packages employs a sequential search technique to obtain parameter estimates. The widely used sequential search-based (SSB) NLS places three major restrictions on the estimation that span every stage of the product lifecycle. SSB-NLS estimation seems to have problems with data that covers three stages of a diffusion curve: pre-peak sales, peak sales, and post-peak sales (see Figure 1). With the pre-peak sales data, SSB-NLS has been repeatedly found to not achieve convergence (Srinivasan and Mason 1986, Lenk and Rao 1990). With the peak-sales data, it has been found that SSB-NLS’s convergence largely depends on the initial values one provides for the parameters.With the postpeak sales data, it has been found that the SSB-NLS estimates of the Bass model are biased and change systematically as we add datapoints from later years (Van den Bulte and Lilien 1997, Bemmaor and Lee 2002, Venkatesan et al. 2000).

In §2, we provide theoretical arguments and intuition for how GA is, under certain circumstances, able to arrive at global optimal parameter estimates more efficiently even when the response surface is multimodal and noisy. We also show how a SSB-NLS has a probability of converging at a local optimal solution in these cases. In §3, using simulated data we show that the estimates from GA are similar to estimates from SSB-NLS under commonly encountered error variances and parameter estimate values, provided full datasets are used for estimation. Then, using empirical datasets we compare the performance of GA with SSB-NLS and other techniques proposed in the literature when the data does not contain peak sales, when there is data until peak sales, and when datapoints are added sequentially to post-peak sales data. Based on the results of our analyses in §4 and Appendix 2 we conclude that GA is able to produce better parameter estimates than SSB-NLS as evident in lower Mean Squared Errors (MSE) and Mean Absolute Deviation (MAD) under the three data related scenarios mentioned above.