Data as stories, models as narratives

Ever since the first humans gathered around their first fires (or even before that!) we absolutely love to listen (and tell) stories, parables, narratives about real or fictional events. Notwithstanding the important roles these activities played in facilitating social organization, there are reasons to pay particular attention to modes of story telling if we are to understand the way science works. By that I do not mean the particular mechanics of a given scientific theory (e.g. how atoms are structured, whether a group of medications works in a disease or even if stimulus packages or austerity works), but rather how theories come about, how they flourish and then abandoned for something else. The sociology of the scientific process has been described  by Thomas Kuhn in his Structure of Scientific Revolutions, but drawing an analogy with more familiar territory may be of some value.
As anyone who has ever told a story (including but hopefully not limited to gossip), the nucleus of a story-telling activity is formed by experience: things we saw, heard or observed in the world. These observable experiences usually unfold in space and time and the first version of a story usually consists of recounting these events in the particular order we observed them. Picture a group of primitive humans describing an avalanche starting off as a small, non-threatening snowball then gathering in size, momentum, generating lots of noise and clearing everything in its path as part of a natural land reclamation project. At this step there is no personal interpretation, the story telling is nothing more than the (recollected) experiences and it is likely that nothing more will ever be said about the first occurrence. However, if the experience happens to be relived, then a close scrutiny is very likely to occur: the events will be re-examined and debates; the subsequent exposition will be more organized in order to reflect similarities between the two instances of the experience. A third, fourth, etc repetition will add even more experiences to what is now perceived to be a noteworthy feature of the world and more similarities are likely to be noted. It is also very likely that the resulting story will become even more organized to  include additional interpretative elements that link observations together or even highlight causal statements linking together the observed sequence of events. Going back to our primitive ancestors watching physical phenomena such as avalanches, thunders these stories now become much more like narratives: it is not just about a big, white ball or  loud noise/light in the sky but it is also about the mountain deity (or Zeus himself!) who started the avalanche or threw his thunder. Hence, what started off as a local story becomes much more like an over-reaching statement guiding human understanding of the world and subsequent behavior. So whenever one hears a thunder (observable), then one should think that Zeus is angry (unobservable but undeniable “fact” because all thunders are thrown by Zeus!) with the mortals  going off to sacrifice (decision/action) the proverbial virgin to appease the god.

A close analogy can be drawn with an example familiar from high school physics: whenever one sees an object accelerating (observable) one should invoke that there is a force acting (remember Newton’s law) and act accordingly (Star Trek fans would probably activate the Enterprise’s force fields). In both cases of primitive religion (thunder/avalanche) and high school physics (acceleration) an explanation is put forward for the observable (data) in terms of unobservable variables (deity/Newtonian force) linking thus the particular case to the general explanation (model). The latter does not attempt to capture ALL details of the particular experiences analyzed but only the commonalities while leaving some aspects  unexplained (noise).  As experiences and observations accumulate so do the narratives about them, which  may become both more specific (increasing quality) and more profuse (increasing quantity).

How are these observations relevant to statistics? The relevance comes out if one identifies narratives with statistical models specifying likelihood functions linking model premises, H , to observable data D : p(D|H) and the refinement of narratives with the iterative process of model evaluation and building. To turn to the primitive human example about natural phenomena we can identify

  • real world avalanches/thunders with data (D)
  • mountain deity/Zeus as the cause of avalanche/thunder with the premise of a hypothesis or model (H). This model also specifies a narrative about the sequence of events that one should expect to experience under the hypothesis.
  • the extent to which  future, yet to be experienced, avalanches/thunders match deity/Zeus induced ones with the likelihood of the model p(D|H)

By making this link one can ponder how one would choose to believe one narrative, among the many possible ones, for a particular topic one is interested in. This introspection is important when one considers the more abstract (but directly analogous situation) of evaluating models in the context of scientific theories (a topic that will be dealt with, in a subsequent post).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: