Machine learning models that forecast production for PDP or PUD oil and gas wells may increase accuracy, save engineering time, or replace deterministic models in comparison to in-house methods. These are good reasons to switch to machine learning models and, not coincidentally, these are often the focal points of machine learning sales pitches.

However, when it comes to evaluating the utility of a machine learning model, uncertainty quantification is critical and, yet, often completely absent from machine learning product discussions.

To better understand the importance of uncertainty quantification, I will describe how Novi creates individual well-level prediction bands that give relative confidence in each forecast and describe why it is so important to separate error scores on whole datasets from the expected error on a single forecast.

## In this Post

- Differentiating Single Well Uncertainty from Large Well-count Total Error
- Computing Well-level PUD & PDP Oil and Gas Forecast Uncertainty with Prediction Bands
- The Novi approach to PUD & PDP Oil and Gas forecast uncertainty quantification
- The Novi Forecast Method Versus Traditional Approaches
- Conclusions

## Differentiating Single Well Uncertainty from Large Well-count Total Error

**The Folly of Focusing on Total Error**

For starters, let’s imagine someone is trying to sell you their product and told you that the error on their machine learning model for your use case could be as low as 15%. Now, let’s also imagine the first question you ask is “What metric are you using?”, and someone answers “adjusted mean absolute percent error on a hold-out dataset”.

It might be convenient to dismiss the whole proposition because your adjusted mean percent absolute percent error is 10% or, conversely, you might jump at the chance to buy this product because your error is actually 30%.

While your benchmark is absolutely important in this evaluation, what you should be doing is asking more questions about that hold-out dataset, the method for generating it, and what the individual error bands look like for a forecast starting from today and extending into the future.

By directing the conversation towards methodology and ensuring that error is evaluated in the aggregate on collections of wells and also on each individual well’s forecast, you might soon realize that the 15% error quoted is the average error on cumulative production for all IP days across 1,000 wells using past production from a model trained on similar wells trained over the same time span. But the product cannot provide you any level of uncertainty for a well’s forecast.

If your goal is to assess risk on your total inventory and you do not care about each individual well forecast, then maybe this approach works for you. However, in almost all other cases, this approach is inadequate and misleading.

**You Should Be Asking For (Forecast) Uncertainty**

Among many problems with this approach is that 1) it will lead you to believe your forecast accuracy is higher than it would be if your evaluation was actually based on a simulated, forward-looking prediction of future, held-out data; and, 2) it doesn’t tell you what the expected forecast error for a single well would look like or how much you should trust it.

For example, suppose you are interested in a single PDP well and want to understand the uncertainty of that specific forecast, e.g. “how much could the forecast be wrong in the positive or negative direction>”, or, “how much risk is there in using the volumetric forecast to project future revenue?”. Without a quantification of forecast uncertainty, you are forced to make your own using crude estimates. Since you have a 15% anchor in-hand, you’re left to wonder, “Is it acceptable to just add/subtract 15% to get the interval?”. If not that, what is available in your data to even make a rudimentary guess about the error?

Most engineers have used confidence intervals that bracket certain types of predictions. These are great for evaluating what happened in the past or in mapping static inputs to noisy outputs. However, it is not a simple task of computing confidence intervals on past production and translating those to your forecast. This is because 1) your forecast uncertainty should grow as a function of time because your errors compound with each new forecast step (see Figure 1c); and, 2) you would actually be using the error on your training data, which you have complete knowledge of, to estimate the error on unknown projections. Spoiler alert: This will not work out as expected. Hence, you need a prediction band that brackets each forecast step with some prescribed level of confidence — or a confidence interval strictly for a sequence of forecasts!

Novi Labs built our prediction band technology to address these problems. It is specifically designed to give the most realistic depiction of forecasting into the future with a proprietary prediction band generation algorithm that is designed to give you a sense for how confident we feel about one forecast versus another.

Our thesis is simple – we believe that the real question is, “How confident am I in the forecast for well “A” given its proposed engineering design and subsurface conditions, versus well “B” given its proposed engineering design and subsurface conditions?”. This is the elemental question that should drive capital allocation decisions – a question of forecasting many different scenarios while at the same time understanding the risk.

Now, you’re probably wondering, “How exactly do we do this?”, so let me explain.

## Computing Well-level PUD & PDP Oil and Gas Forecast Uncertainty with Prediction Bands

### Step 1: Label Data by Date and Retain Date Throughout Fitting Process

First, our data ecosystem is built to partition thousands of individual well-level production histories by date, so at any given point in time we separate past production from future production. This allows us to artificially create a time machine that mimics the real-world use case to ask hypothetical questions such as “If I built this model in 2017, how well would it perform in 2018?” or “How did production from 2018 change which wells I’m most uncertain about?”. We utilize this time machine to march forward from the beginning of your data to the end and, in the process, determine which hyperparameters to use and assess the aggregate-level and well-level errors.

### Step 2: Compute Standard Errors on Full Dataset, PUD & PDP oil and Gas wells

Second, after we finish training a model, we can easily break down error according to whatever bucket you like and aggregate it via whichever methodology makes the most sense for your use case. For example, perhaps you are most curious about the IP Day-based production rate median percent error for all wells extending one quarter into the future. This is no problem for our software and is exactly the sort of thing we constantly do to compare new models to old models. However, our work does not stop there.

### Step 3: Repeat Fitting Process on Some Data, Estimate Errors Given Features, and Quantify Forecast Errors on all Unseen Data

Third, with a trained model in-hand, we generate prediction bands for each well by repeating the time machine process again and building new models strictly designed to evaluate the cone of uncertainty for each well. Unlike other methodologies for doing this, our approach assumes no distribution for the residuals, does not require hundreds or thousands of refits, and gives a unique prediction band for each well forecast. The output is something like the plot below where an uncertainty cone around the forecasts show our 90% prediction band for each individual forecast such that our uncertainty for the left forecast is higher than for the right forecast.

Now, at this point, you may be very intrigued but might also be wondering whether we are using Monte Carlo methods or other statistical methods for generating those uncertainties, so let me further explain.

## The Novi approach to PUD & PDP Oil and Gas forecast uncertainty quantification

### Conformal Inference Combines Error from Known Production History with Error from External Factors

The underlying methodology leverages a peer-reviewed technique called Split Conformal Inference that is enhanced by our own proprietary modifications to fit the use case of forecasting collections of histories into the future. Using Split Conformal Inference we build only two models — one for our best stand-in of the full-model using only past production data and one for how uncertain we feel about each production forecast data point for unseen wells in the future. By doing this we 1) use a slightly less accurate model to partially inflate our uncertainty for future wells, 2) evaluate all forecast data points from all possible prediction starting points — more on this next, and 3) we use the full set of features including geological features, completions, and spacing to understand which wells are most similar or dissimilar to the wells we’ve already seen.

By doing this we 1) use a slightly less accurate model to partially inflate our uncertainty for future wells, 2) evaluate all forecast data points from all possible prediction starting points — more on this next, and 3) we use the full set of features including geological features, completions, and spacing to understand which wells are most similar or dissimilar to the wells we’ve already seen.

Thus, we might end up with large forecast uncertainty because the well history before the forecast is chaotic or because the well was drilled in a fringe area in geographical or feature space that doesn’t match anything we’ve seen before. Our method natively incorporates both types of uncertainty.

### Enhancing Uncertainty Quantification on Unseen Data With Multiple Forecasts per PDP Oil and Gas Well

Now about that second point. Suppose for a single well, I train on data for a PDP well up to the end of 2017 but have data for that well up to the start of 2019. A naive way of evaluating uncertainty would be to restrict the uncertainty evaluation by making a single forecast that starts at the end of the known data (i.e., end of 2017) and extends to the end of available actual data (i.e., start of 2019), so that every date in the forecast is also strictly tied to the number of forward forecast steps. While that should always be true for PUD wells that do not incorporate any prior production, we may want to start forecasting a PDP well at any point in its history and need to understand the interaction between IP Day, number of forward forecast steps, and the exogenous features.

So, a more robust error evaluation of that hypothetical PDP well, is to steadily march the cutoff day forward in time such that all actual data in 2018 is used to understand error at every forward step that is possible for a particular date. For example, for a monthly model, we could evaluate the error in December by starting the prediction in January (e.g., twelve steps forward), September (e.g., four steps forward), or any other month but error in February is available only if our first prediction is in January (e.g., two steps forward) or February (e.g., one step forward).

Using this approach, we get a full picture for how past production for a PDP oil and gas well influences the error at a particular point in time and how exogenous features, such as proppant per foot, consistently affect error for all points in time. This provides a comprehensive model for taking arbitrary inputs in space and joining them to an existing production history to create a tailored cone of uncertainty.

So, now you ask, “Wait, explain to me one more time why this ‘Split Conformal Inference’ is better than a Monte Carlo or bootstrap approach?”. So, let me explain the Novi Labs philosophy on this.

## The Novi Forecast Method Versus Traditional Approaches

### Why don’t we use Monte Carlo Methods?

For simple models, such as an Arps equation, there are a fixed set of parameters that you may be uncertain about and may even have a good idea about the distribution of, so it makes perfect sense to model the uncertainty of a forecast by varying the parameters. Monte Carlo approaches are perfect in this use case.

However, with such simple models you obviously cannot evaluate both the uncertainty of the forecast given previous production and the uncertainty of the forecast as a result of exogenous variables (i.e., proppant per foot) without additional human influence or semi-empirical heuristics. You could imagine simply applying a scaling factor to the uncertainty, which increases as a function of the distance from the middle of the distribution for each exogenous variable, hypothetically again of course.

While this method says something about your prior knowledge and incorporates additional uncertainty about exogenous features, it doesn’t actually give you a good sense for what influences your uncertainty nor does it increase your confidence in the model. Even with more advanced Monte Carlo methods, such as Markov Chain Monte Carlo (MCMC), the downside is that the models are largely required to be parametric — as in, have a set of tunable parameters that can be varied.

However, at Novi, we often build nonparametric models that make predictions via localized averages across a collection of wells or well segments that naturally constrains production to replicate real-world examples. To account for Novi’s highly accurate nonparametric models, we require a technique for prediction band computation that is generalizable to a range of model types.

### Why don’t we use Bootstrap Methods?

For some complex models, the cost of fitting a single model may not be exorbitant and the results may be stable when resampling, so it might make sense to bootstrap your way to an uncertainty quantification with 1000 bootstrapped models; however, if you’re not careful you could easily start to confuse an abundance of data for an abundance of information.

For example, in setting up such an approach, would each new bootstrapped model and data resampling randomly select by well with disregard for date or randomly select by date with disregard for the well? Will you aggregate all errors to give a per-IP day, per-forward step uncertainty profile? And if so, how would you incorporate exogenous variables? Alternatively, would you aggregate all errors to give single per-well, per-IP day uncertainty profiles? And if so, how would you handle completely unseen wells?

At the end of the day, using a bootstrap approach is not an inherently bad idea, but we feel that our approach provides all the necessary tools for evaluating uncertainty at the lowest computational cost. This is because we spend the bulk of our computational burden on exhaustively computing the error on simulated forecasts (see *Enhancing Uncertainty Quantification on Unseen Data With Multiple Forecasts per Well*) given an adequate single model instead of generating an exhaustive number of realizations that individually provide trivial information gain.

## Conclusions

- As machine learning becomes more widely-used for production forecasting of PDP and PUD wells and their business value is increasingly accepted as truth within our industry, it becomes even more important to understand the limitations of a single machine learning model forecast.
- Simple approaches for calculating or reporting aggregated error fail to capture the granularity of a single forecast and might lead to incorrect conclusions about the trustworthiness of a forecast.
- It is absolutely critical that every forecast be paired with an uncertainty envelope, the prediction band. This prediction band will let you understand which forecasts you can be confident about and give you a sense for how changing something like proppant or spacing might lead to an increase in forecast uncertainty.
- Without prediction bands, there is no effective tool for evaluating choices or outcomes and it would be foolish to think you could make fruitful projections about the future.

Our next posts will cover **The Benefits of Automation**. Check back soon!

Click here to read more about our PDP product offering here: PDP in Oil and Gas Forecasting Press Release. To learn more, reach out below!