[URTeC 2021 Paper] Use of Machine Learning Production Driver Cross-Sections for Regional Geologic Insights in the Bakken-Three Forks Play

Machine learning explainability tools like SHAP values provide a powerful tool to understand how geologic performance drivers vary across a play.

Talk Details::
- Tuesday, July 27th at 11:15 AM | Room 361
- Theme 3: Emerging Geological Evaluations, Tools & Workflows: Data Driven Methods
- AUTHORS:: T. Cross, K. Sathaye, J. Chaplin (Novi Labs)
Production drive cross-sections covering the Bakken play in North Dakota. In the north, structural depth is a big negative for production. In the play core, low water saturations and low clay volumes provide a strong positive contribution. In the southwest, lower TOC values for the Lower and Upper Bakken source rock shales negatively impacts production.


Objectives/Scope: Petroleum geologists have long employed cross-sections as tools to understand subsurface structures, show variation in log properties, and provide regional context for local analysis. In this study we use cross-sections as our technique for visualizing a novel dataset generated with machine learning models: geologic production drivers. These “machine learning cross-sections” show not just regional variation in geologic properties but what the model has learned about the changing impact of those properties on hydrocarbon production along the cross-section.

Methods/Procedures/Process: We trained a decision-tree based model which uses completions, geology, and spacing data to predict oil, gas, and water production in the Bakken-Three Forks play of North Dakota. The subsurface training variables included oil in place, net to gross, porosity, water saturation, and clay volume. We trained a surrogate model to produce an explanation dataset for the predictions, using Shapley values, a method becoming increasingly widespread within the machine learning community. Finally, we generated synthetic multiwell unit developments along cross-sections stretching from the southwestern to the northeastern edges of the play, in order to extract these Shapley values evenly spaced along a cross-section.

Results/Observations/Conclusions: The cross-sections identify differing production drivers for the Bakken and Three-Forks sweet spots. For the Three Forks, the key driver for the outperformance in the Nesson area is water saturation, with the critical threshold at 45% water saturation. For the Bakken, we see a more diverse set of production drivers, but clay volume ranks the highest both on the Nesson and in the Parshall-Sanish field. Though clay volume varies only between 13% and 17% along the cross-section, the Shapley dataset shows a large increase when clay drops below 14.5%. We speculate that this reflects not only greater frackability but potentially paleobathymetric highs that have persisted and now sit as foci of migration, contributing to overpressure.

Applications/Significance/Novelty: While the focus of this study is the Bakken, the workflow presented here generalizes to other basins and other purposes. Shapley values can provide a machine learning-based perspective on the importance of geoscientific data to the “target” of interest. Incorporation of explainability datasets with traditional methods like cross-sections can provide powerful context for understanding subsurface variability and its impact.

Interdisciplinary Components: This study incorporates work from geoscience & data science.

You can find an early version of this study on our ML in Oil & Gas Blog here.

Complete the form below and we will email the paper straight to your inbox.

First name(Required)


Accurate forecast on parent-child developments

In this live webinar, you will learn how Novi’s new algorithm improves model sensitivity for spacing and parent-child scenarios, providing powerful results for previously difficult-to-analyze problems.

Ted Cross, our VP of Product Management, will show you how this update improves spacing and infill scenario analysis without sacrificing model accuracy.