big data

Big data – Scientific analysis.

In this fourth article in the series on Big Data, the analysis of data is central. Where the classical scientific method searches for causal connections via hypotheses, Big data searches for trends, which are often based on correlation.

Scientific method.

The traditional method of data analysis is based on the scientific method. The scientific method is a systematic way of acquiring knowledge. The methods vary between formal, empirical or social science, but there is also the necessary discussion and change of insights. However, the method must always be justified rationally in the publication of the results, which means that the method and result can be tested.

Causal or correlation.

The analysis of Big data mainly focuses on trends and developments on the basis of which one tries to make predictions. These analyzes often do not use the principles of scientific testing, so there is a high probability that a correlation between 2 data groups is mistaken for a causal relationship. In statistics one speaks of correlation if there appears to be a more or less (linear) correlation between two series of measurements or the possible values ​​of two random variables. The strength of this relationship is described with the correlation coefficient.

A well-known form of correlation is that of people who watch violent films and show violent behavior themselves. The question is: does a person become violent from the film, or do violent people watch violent films. Scientific research has later shown that there is no causal relationship. The cause of the behavior must be sought elsewhere.

There is a causal connection when events occur as a result of certain other events that preceded them; a cause precedes an effect. In general, three conditions must be met before one can speak of causality:

  • Covariation or correlation: both variables always change together
  • The cause comes before the effect
  • Elimination of alternative hypotheses: no “third” variables, such as moderators or mediators.

Distinctive interest.

To return to the violent films. In the United States, debate arises from time to time on whether to curb violent films, music, and other “violent manifestations.” Reference is made to studies that show the correlation, but not a causal relationship. Such a restriction of the freedom of speech et al. Constitutes a fundamental violation of a constitutional right. When such decisions are made on the basis of correlation, we yield a lot of freedom, but the prohibition ultimately yields nothing, because there is no causal connection.

As Big Data becomes more and more important in making strategic business decisions, it is very important that the conclusions drawn are based on a proper scientific analysis that defines cause and effect. Casual relationships can lead to completely wrong decisions, which can affect the continuity of an organization.

So in addition to all the technological solutions to collect data, organizations and space will have to pay attention to statisticians and econometricians, for example, who can make good analyzes to prevent organizations from being misled.


Big Data, Communities and Ethical Resilience: A Framework for Action By 2013 Bellagio/PopTech Fellows Kate Crawford, Gustavo Faleiros, Amy Luers, Patrick Meier, Claudia Perlich and Jer Thorp. Draft Date: Oct. 24, 2013
Monroy-Hernandez, A., E. Kiciman, D. Boyd, and S. Counts. 2012. Tweeting the Drug War: Empowerment, Intimidation, and Regulation in Social Media. HCIC. [online] URL:
Robertson, J. 2013. How big data Could Help Identify the Next Felon – Or Blame the Wrong Guy. [online] URL:
For example, see Solove, D. 2011 Nothing to Hide: The False Tradeoff Between Privacy and Security. New Haven: Yale University Press.
For example, see the Fair Information Practice Principles:
Crawford, K., and J. Schultz. 2013. big data and Due Process: Towards a Framework to Redress Predictive Privacy Harms. Boston College Law Review. 55(1). [online] URL: