What drives unconventional oil production? Of course completions, parent-child, and well spacing all play a huge role — but it all starts with the rocks.
Understanding the impact and relative importance of the different geologic variables requires REGIONAL PERSPECTIVE. Only then do you have sufficient variation in the underlying properties to learn their effects. The combination of machine learning models, explainability datasets, and software for oil and gas developments unlocks Machine Learning Regional Profiles.
We are kicking off our ML Regional Profiles series in the Bakken-Three Forks play of the Williston Basin, North Dakota. In future posts, we’ll move from geology to completions and spacing.
In This Post:
- Using machine learning software for regional profiles
- Results: forecasted unconventional oil production for the Bakken and Three Forks
- Production Driver: the impact of Clay Volume on Bakken production
- Production Driver: the water saturation production driver in the Three Forks Play
- Conclusions: Improving unconventional oil and gas developments
Using machine learning software for regional profiles
Setting up a regional profile in Forecast Engine takes minutes. All you have to do is lay out a series of pads across your Area of Interest. In the video below, I show you how I configured the profile for the Williston, and share some tricks to speed up your own workflows. This profile (or cross-section) will be our framework for the analysis!
Next, we run a forecast set on that regional profile. Our software will make forecasts for each of those unconventional oil well locations. It also generates a production drivers dataset (SHAP values) to show how the model is coming up with its forecasts. Additionally, we configured the completions designs to be 500 lbs/ft, 1.0 ppg fluid, 225′ stages, and lateral lengths to 10,000′ (the big items for unconventional oil and gas development). Though we are focused on regional geology in this post, the same tools can be used for standard shale well planning.
Results: forecasted unconventional oil production for the Bakken and Three Forks
Let’s take a look at the model’s forecasted 2-year cumulative oil production going from southwest to northeast. Our configuration included 10,000′ laterals and 500 pounds per foot proppant for each well:

Clearly, Bakken targets outperform Three Forks targets across nearly the entire profile. We do see that the Three Forks formation has a very narrow fairway that (mostly) overlaps with the Nesson Anticline. However, the Bakken shows two sweet spots — one over the Nesson, one associated with the Parshall-Sanish field in the northeast.
SHAP values allow us to understand how the model is coming up with its forecasts. Sure, we can point out some gross regional trends, but what is actually driving those differences in performance? We can plot the SHAP values for each forecasted pad to investigate the regional trends, one step further than standard oil and gas data analytics:

For the Bakken, we see Clay Volume (orange bars) showing the biggest variance around the sweet spots, with Oil in Place (green bars) and water saturation (blue bars) also having significant contribution to the forecasts (black line). For the Three Forks, on the other hand, Water Saturation dominates, with Effective Porosity and Clay Volume also having an impact.
Production Driver: the impact of Clay Volume on Bakken production
From our SHAP values regional profile above, we saw that Clay Volume (Vclay) had the biggest explanatory impact on the model forecasts. Let’s plot the Clay Volumes along with the SHAP values to see what’s going on:

It is VERY common for us to see Clay Volume rise near the top of the geologic feature importance. This surprises some customers, but others recognize the critical relationship between rock brittleness/frackability and clay volume. Clay is much more ductile than quartz or calcite, which often make up the main non-clay mineral fractions. Consequently, more clay decreases the rock frackability. However, seeing such a large swing in SHAP values over a small % change in Clay Volume is likely due to the other information that Clay Volume provides–in this instance, depositional architecture and hydrocarbon migration. Remember, SHAP values correspond to the information that the features provide, whether or not it is a “root cause”.
Clay volume and migration
Migration into structural and stratigraphic traps plays a key a role in enhancing Bakken production — making it a little different from other unconventional oil and gas plays (see this 2012 review by Theloy and Sonnenberg). Migration improves oil saturations but also can increase the pressure gradient by plumbing the shallower trap into the absolute pressure values present in the source kitchen.
While our model did not include pressure data for training, it may be using the clay volume as a proxy. This is less far-fetched than you might think — in the Parshall field, the Middle Bakken member was deposited in shallower, higher-energy conditions improving both reservoir quality and enhancing permeability to receive migrated oils. The Middle Bakken thins slightly onto the Nesson Anticline. This paleobathymetric feature may be the cause of the lower clay volumes along the Nesson. So, in the Bakken, lower clay volumes don’t just improve production in their own right by increasing brittleness and frackability, but also signal higher-energy depositional environments that became migration foci.
Production Driver: the water saturation production driver in the Three Forks Play
We learned from the SHAP values chart that, along this regional cross-section, water saturation was the key geologic difference-maker for the Three Forks play. Let’s dig a little deeper into that concept by plotting the water saturations along with their SHAP values:

Plotted this way, we can see the production rockets up when the Three Forks water saturation passes below 45%. (Perhaps that threshold corresponds to a relative permeability shift). What is driving water saturation below 45%? It’s not a simple case of oil migration into the Nesson Anticline.

In the above image, I am plotting the true vertical depth (TVD) of the Three Forks wells in our regional profile. The basic bowl shape of the basin is broken by the Nesson Anticline, a huge structure rising in the center of the basin. However, the high oil saturations (green) extend off of the structure to the east, even with the rocks dipping up to the northeast. Clearly, it’s not a simple case of an oil-water contact sitting at a given structural depth. Generative potential of the Lower Bakken Shale may be playing a role here, or there could be more complex thermal maturity or hydrodynamic trapping mechanisms. Let me know what your theory is — we’d love to mix up our training data to investigate the cause here.
Conclusions: Improving unconventional oil and gas developments
In this post, we walked through a use case of machine learning software for a regional study of production drivers for the Bakken and Three Forks plays in North Dakota.
- The combination of a powerful machine learning model, explainability datasets like SHAP, and software for configuring scenario analysis enabled us to cut through the complexity of unconventional oil & gas to generate this “regional production drivers” cross-section through the play.
- For more on oil well water analysis with machine learning click here.
- For the Bakken, clay volume, oil in place, and water saturation provide the biggest drivers for discriminating the high-producing areas of the play from low-producing areas. The clay volume likely rose to the top because of the additional information it provides around basin architecture.
- For the Three Forks, water saturation by far dominates the production trends. The cutoff of 45% water saturation appears to be the key threshold to discriminate strong Three Forks producers from poor ones along this trend.
- Understanding regional production drivers is key to optimizing development — you have to learn from all the data you can, not just what’s going on around your acreage. As operators exhaust the core, understanding production drivers for Tier 2 and Tier 3 rock will be essential.
Thanks to Wood MacKenzie for providing the model training data and permission to publish this work.
Want to see what else you can do with our machine-learning models?
Want to see how much our machine-learning models can help you to forecast crude oil production?
In Novi, we want to help you answer your forecasting questions, generate novel insights, and meet the challenges of the oil and gas industry. That’s why we focus on giving you the best data across subsurface, production, and completions variables, all of which affect unconventional oil production. Great data and great models make for powerful learnings and optimizations.
Terms & meanings
Unconventional oil production
While unconventional oil production can be used to refer to any type of source which is difficult to produce with traditional methods, such as oil sands, we will use it to connote tight reservoirs produced with horizontal drilling and hydraulic fracturing (often referred to as “fracking”).
Geological characterization
Geologists use well logs, core data and seismic data (along with a few other data types) to characterize the subsurface. This typically involves mapping, interpreting rock properties, and then generating grids or models to show the distribution of those properties around the area of interest.
These rock properties are treated as variables, or features, by machine learning models. Obviously, the geology has a huge impact on well performance, and the models can learn these relationships for optimization questions.
Explainability in AI
In machine learning, the capacity to describe what occurs in your model from input to output is known as explainability. It eliminates the black box issue and makes models transparent.
It is common to use the terms “explainability” and “interpretability” interchangeably. This is still a developing field, and explainability will only get more important as models have increasing use in our industry.
SHAP Values
SHAP Values (an acronym from SHapley Additive exPlanations) break down a prediction to show the impact of each feature. They use game theory mathematics, named after the Nobel Prize winner Lloyd Shapley.
SHAP values are a powerful tool for understanding how much each feature contributed to the model prediction. They’re available for all our customers in Novi Cloud, for use with model transparency, sensitivity studies, and general performance research/insights.