When I begin working with a client that is unfamiliar with my applied view of complexity science one of the first questions they ask is "Why can't I do this with a spreadsheet?" If you haven't read it already, here is my answer
to that one.
The second question in analytically sophisticated organizations is usually "How is this different than a multi-regression or econometric model?" That's the subject of this post.
Statistical multi-regression models and System Dynamics or other "complexity science" models are complimentary approaches to understanding how a system works.
In a situation where there is significant historical data to examine and guide our development of a model then statistical approaches such as multi-regression modeling or other tools can be applied. Regression models depend upon identifying correlations between variables that imply viable causal relationships. So for example, in a simple regression model to explain ice cream consumption we might observe that consumption is correlated with daytime high temperatures. This makes causal sense to us and we can show via various statistical techniques that it is a valid hypothesis.
However when creating regression models of complex or poorly understood situations it is easy to identify correlations that suggest relationships that do not, in fact, exist. For example, taking our ice cream example a step further, we might observe that the petty crime rate is correlated with ice cream consumption. Are we to believe that ice cream consumption causes a higher crime rate? Of course not. Petty crime is driven by more people being out and about which is in turn driven by the seasonal weather pattern.
Further, many common causal relationships are difficult or impossible to represent within the framework of a regression model. Causal structures that include time delays between cause and effect; non-linearity between cause and effect; and, most importantly, feedback relationships between cause and effect will not be clearly represented in a regression model. In these cases, which include most of the interesting business questions, then there is going to be a lot of useful information about the system that can not be comprehended by a regression model. This is going to limit what can be accomplished, both in terms of understanding and in terms of prediction, with a statistical approach.
A System Dynamics (SD) model is based on an explicit causal hypothesis that is developed by the modeling team. A statistical analysis can suggest aspects of the causal hypothesis. And, as part of the development of the model the team must show that the SD model does in fact produce the correlations that are observed via statistical analysis. Other clues about the underlying causal structure come from direct observation of the system and from anecdotal evidence. The causal structure of the model must be consistent with the “physics” of the real world that we are modeling and a set of tests can be devised to ensure that this is so. SD models can easily represent time delays, nonlinearities, and feedback relationships and do it in a way that is easily understood by non-modelers.
In fact, we find that one of the most valuable aspects of the approach is that the management team, not just the modeling team, can view, understand, and challenge the underlying causal hypothesis of a SD model. Try that with a regression model! A SD model produces actionable insights by building the understanding of the relevant system within the management team. This in turn surfaces new opportunities and questions about the business at hand. In this way a SD model is far, far superior to a statistical model. It also produces time series for historical “goodness of fit” type tests and makes predictions about the future just as you would do with a statistical model.
I love a story, possibly apocryphal, that the late Barry Richmond used to tell that illustrates some of the differences between the approaches. An econometrician wrote a model to describe milk production. He used a variety of “right hand side” variables including GDP, weather, and so forth and he was able to get an excellent fit on the historical time series. Nowhere in his model was there any mention of cows. Knowledge of cows was not needed to model milk production! An SD modeler approaching this problem would start by observing that the physics of the case is that cows produce milk and go from there.
In practice when starting to work with a new client I often need to model a system for which there is little or no data available. In this case we turn things around a bit and use an explicit causal hypothesis, incorporated into a simulation model, to specify a data collection process. This is like creating a "hold-out" sample in advance of the data collection and provides a robust test of the causal hypothesis and model implementation. As observations are collected then they can be compared with the predictions made by the simulation and analyzed statistically.
In closing, the high-power approach is to use statistical techniques in conjunction with complexity science techniques to gain actionable insights into the system that is driving your results.
