Machine learning explainability tools like SHAP values provide a powerful tool to understand how geologic performance drivers vary across a play.
Talk Details:: |
---|
- Tuesday, July 27th at 11:15 AM | Room 361 - Theme 3: Emerging Geological Evaluations, Tools & Workflows: Data Driven Methods - AUTHORS:: T. Cross, K. Sathaye, J. Chaplin (Novi Labs) |

Abstract::
Objectives/Scope: Petroleum geologists have long employed cross-sections as tools to understand subsurface structures, show variation in log properties, and provide regional context for local analysis. In this study we use cross-sections as our technique for visualizing a novel dataset generated with machine learning models: geologic production drivers. These “machine learning cross-sections” show not just regional variation in geologic properties but what the model has learned about the changing impact of those properties on hydrocarbon production along the cross-section.
Methods/Procedures/Process: We trained a decision-tree based model which uses completions, geology, and spacing data to predict oil, gas, and water production in the Bakken-Three Forks play of North Dakota. The subsurface training variables included oil in place, net to gross, porosity, water saturation, and clay volume. We trained a surrogate model to produce an explanation dataset for the predictions, using Shapley values, a method becoming increasingly widespread within the machine learning community. Finally, we generated synthetic multiwell unit developments along cross-sections stretching from the southwestern to the northeastern edges of the play, in order to extract these Shapley values evenly spaced along a cross-section.
Results/Observations/Conclusions: The cross-sections identify differing production drivers for the Bakken and Three-Forks sweet spots. For the Three Forks, the key driver for the outperformance in the Nesson area is water saturation, with the critical threshold at 45% water saturation. For the Bakken, we see a more diverse set of production drivers, but clay volume ranks the highest both on the Nesson and in the Parshall-Sanish field. Though clay volume varies only between 13% and 17% along the cross-section, the Shapley dataset shows a large increase when clay drops below 14.5%. We speculate that this reflects not only greater frackability but potentially paleobathymetric highs that have persisted and now sit as foci of migration, contributing to overpressure.
Applications/Significance/Novelty: While the focus of this study is the Bakken, the workflow presented here generalizes to other basins and other purposes. Shapley values can provide a machine learning-based perspective on the importance of geoscientific data to the “target” of interest. Incorporation of explainability datasets with traditional methods like cross-sections can provide powerful context for understanding subsurface variability and its impact.
Interdisciplinary Components: This study incorporates work from geoscience & data science.
You can find an early version of this study on our ML in Oil & Gas Blog here.
Complete the form below and we will email the paper straight to your inbox.
Latest Resources
[Data Engine] Merging multiple source files and building a dataset
Looking for a faster and more efficient way to build analytics-ready datasets? Join Charles Kosa, Novi’s Head of Customer Success, as he takes you on …
[URTeC 2022] The Diminishing Returns of Lateral Length Across Different Basins (ID 3723784)
The Diminishing Returns of Lateral Length Across Different Basins Talk Details:: – Tuesday, June 21st at 2:15 PM | Room 371 – Theme 7: Applications …
[URTeC 2022] Accelerating field optimization for Shell in the Neuquén Basin using Novi Labs machine learning
Accelerating field optimization for Shell in the Neuquén Basin using Novi Labs machine learning AUTHORS DP. Zannitto2, C. Kosa1 (1. Novi Labs 2. Shell Compania …
News, trends and data for the US upstream industry
Novi Energy Newsletter
