IP90 and the Datasaurus: The dangers of summary statistics

June 13, 2017 by

I recently read the article Same Stats, Different Graphs which spoke about Anscome’s Quartet, four highly varied datasets that are identical when examined using summary statistics, then took it to the next level with the Datasaurus Dozen by demonstrating how highly varied datasets could produce the same summary statistics.

Datasaurus Datasource: Alberta Cairo 

The article states “It can be difficult to demonstrate the importance of data visualization. Some people are of the impression that charts are simply “pretty pictures”, while all of the important information can be divined through statistical analysis.” It also references Alberto Cairo who created the Datasaurus dataset to urge people to “never trust summary statistics alone; always visualize your data” because visualization can reveal valuable insights that could be otherwise missed.

This inspired me to revisit my 2015 blog How useful are IP30, IP60, IP90 … initial production measures where I illustrated how two wells with identical IP90 production performance measures had very different production profiles. 

To further illustrate the dangers of using near-term production performance measures like IP90, without visualizing the production, I’ve expanded the dataset from my previous blog to show just how different the production profiles are of 33 Montney Regional Heritage wells that have the same IP90 values (i.e. IP90 values between 4400 and 4500 mcf/day). The chart on the left shows the rate vs. time profiles of each well, while the chart on the right shows the cumulative production vs. time.

IP90 is the average daily production rate after 2160 hours of production. In our sample set of 33 wells it takes from 3 to 22 months to produce for 2160 hours, and the cumulative production after 24 months ranges more than 5 fold, from 516 mmcf to 2,693 mmcf. This further demonstrates that IP90 is not a reliable indicator of longer term production. Also, the need to visualize the production profile is becoming increasingly important as rate restriction becomes a more common operational practice (look for wells without an obvious decline profile in the chart on the left).

IP90 is sometimes used as a proxy for the long term production potential of a well (i.e. Estimated Ultimate Recoverable or EUR). It’s important to note that the uncertainty of any predictive measure decreases as you include more production history. The challenge is often to find the balance of including enough history to have reasonable certainty, but not requiring too much history so more recent wells are excluded from you analogue selection. The table below is useful in deciding how much history you should use to reduce the uncertainty of you production performance measure (full details of this correlation analysis can be found in my previous blog).


  1. More data is better
  2. Don’t rely on summary statistics alone
  3. Visualize your data, from multiple perspectives, to elevate your understanding