Whether exploring for oil offshore Brazil or scaling type curves in Lea County, engineers and geologists rely upon the power of analogy to estimate the productivity of a given area or engineering design choice. We use geologic and engineering similarities to group oil and gas wells, then we use an average of that group to make a prediction. Decision-tree based machine learning algorithms follow a similar process–but at computerized speed. These algorithms look for the most predictive ways to group “similar” production wells, based on all the oil well data present.
Firstly, we will explain the intuition behind this algorithm. Secondly, we explore the underlying computation with a simple example. Thirdly, we present a case study showing how it can aid the decision making processes when developing new acreage.
in This Post
Tree-Based Models for oil well data
Tree-based models start with all of the training data in a single group. Novi has employed tree based models in its oil and gas analytics software since its founding in 2014. We authored a patent on applying these types of models to oil & gas data. USPTO granted this patent in November of 2018.
The algorithm then groups production wells with similar output by splitting according to a given feature. In this example, it’s whether water saturation is above or below 25%. Then, the wells are further groups according to another feature, and another (and so on). In the below example tree, the model has split the wells according to water saturation, proppant loading, and spacing. The group of wells in the bottom right have average production of 108,000 bbl, 1-year cum. This is quite a large difference relative to the other groups.
Novi Models Build Thousands of Well Performance Trees

Our algorithms builds thousands of trees with options always guided by what is most predictive including:
- Which variables to use?
- How should it split on those features?
- How many levels deep should the trees be?
- Quantity of trees to build?
Lets explore How Tree-Based Models Make Predictions for Future wells in production
Let’s stay with the single-tree example from above. To use it for a prediction, the algorithm simply runs the hypothetical well (shown in red) down the tree. If this production well has 35% water saturation, 2500 lbs/ft proppant, and 1320′ spacing, it would end up in the leaf at the bottom with a set of oil and gas wells with average 1-year production of 104,000 bbl.

Let’s take this a step farther and add in additional trees. Typical Novi models have hundreds or thousands of trees. We will stick with a simple set of four trees that split three levels deep. Each of these trees would be different — splitting on different variables or at different thresholds.

Hypothetical Forecasting to Drill a New Well
Imagine we are planning to drill a well and want to know what its production curves will look like. The model can be specified with location, length, azimuth, completion size and much more. As it traverses each tree, it ends up in a final (leaf) node with some wells from the training set. These are wells that the model thinks are analogous.

The model produces a set of training wells that were located in the same leaf as the planned well and produces the number of trees where this occurred. Novi refers to these as “contributing wells“, because they contribute to the prediction via a weighted average. The percentage of trees where the contributing well and planned well share a leaf forms the similarity score. We simply divide by the sum of the similarity scores to convert into weights. This ensures the weights sum to one. For a simple set of sixteen training wells and four trees (like above), the “contributing wells” table might look like this:

Novi Forecast Engine Analog Well Table
Novi’s Forecast Engine software generates a table of analog wells with every prediction. Novi Cloud this in the “contributing_wells” table. The contributing weights in the table above are much larger than those in Novi’s predictive models. This is because the models typically consider one thousand or more analog wells for each prediction made.
These similarity scores and weights form a powerful tool for understanding the model’s prediction approach. The similarity scores are used for augmenting the traditional analog selection process. The algorithm may find similar wells that would have likely slipped past an engineer or geologist. Another operator may have already tried that high fluid loading design in rock with similar pressure and brittleness, even though it was two counties over.
To leverage this powerful oil and gas dataset, we developed Novi Model Engine software. It allows for interrogation of contributing wells and weighting derived from polling of the underlying model tree structure. This video shows an example use case in the Williston Basin. It is looking at analogs for a barely-developed part of the basin:
Case Study for Contributing Wells: Parsley-Jagged Peak Deal
In October, Parsley announced their $2.27B acquisition of Jagged Peak, a huge move into the Delaware Basin for a producer focused on the Midland. We used Novi Forecast Engine and Novi Cloud software to analyze the deal a few weeks ago. In that analysis, we charted economic returns at reduced WTI strip prices. A critical part of that development plan is the Bone Springs 3rd Sand. It has shown very promising results in relatively limited development across their position.
How can Novi Contributing Oil Well Data help provide context to interpret these well results and guide future development plans?

Novi Makes Accurate Forecasts Using Oil Well Production Data
We used Novi’s Forecast Engine Software to make predictions and generate Contributing Oil Well Production Data for a set of planned wells across the acreage. This includes Bone Springs Third Sand, Wolfcamp A-Lower, and Wolfcamp C zones. In Figure 7 below, I am showing maps of contributing wells built using the Novi Contributing Well Data for a 3rd Bone Springs Sand well in the Big Tex area (left), and a Wolfcamp A-Lower well in the Cochise area.
Note: Much of the prediction for the 3rd Bone Springs well comes from analogs in the far northern and northwestern parts of the Delaware. Nearby wells dominate thehe Wolfcamp A-Lower prediction.

This is the case because Bone Springs development is concentrated in the northern part of the basin. It’s just off the paleo-shelf edge (see shelf edge map from Montgomery’s classic 1997 work). The model is finding similar geologic properties at Big Tex and the northern Delaware because they have similar depositional environments. Of course, differences remain, such as: steepness of the margin, siliciclastic source material, etc., but those geologic factors can be turned into interpretation products and run through the model to quantify their impacts.

Conclusions
- Tree-based models follow traditional analogy-based workflows to estimate production for a given well, similar to the type curve process or exploration analogs, but evaluated at computerized speed and with basin wide datasets to learn from.
- Novi Contributing Oil and Gas Well Data, which is automatically generated as an artifact of Novi’s Forecast Engine software workflow, provides a powerful tool to understand how a model came up with its prediction AND a rich dataset to supplement and accelerate existing engineering and geoscience workflows.
- Our model finds geologic similarity for the 3rd Bone Springs Sand between Parsley’s Delaware Basin acreage and northern Eddy & Lea Counties. Parsley should consider studying 3rd Bone development in those areas for ideation on completions designs, well spacing, and targeting strategies.