Machine learning explainability tools like SHAP values provide a powerful tool to understand how
geologic performance drivers vary across a play.
Tuesday, July 27th at 11:15 AM | Room 361 | Theme 3: Emerging Geological Evaluations, Tools & Workflows: Data Driven Methods
T. Cross, K. Sathaye, J. Chaplin (Novi Labs)
Objectives/Scope: Petroleum geologists have long employed cross-sections as tools to understand subsurface structures, show variation in log properties, and provide regional context for local analysis. In this study we use cross-sections as our technique for visualizing a novel dataset generated with machine learning models: geologic production drivers. These “machine learning cross-sections” show not just regional variation in geologic properties but what the model has learned about the changing impact of those properties on hydrocarbon production along the cross-section.
Methods/Procedures/Process: We trained a decision-tree based model which uses completions, geology, and spacing data to predict oil, gas, and water production in the Bakken-Three Forks play of North Dakota. The subsurface training variables included oil in place, net to gross, porosity, water saturation, and clay volume. We trained a surrogate model to produce an explanation dataset for the predictions, using Shapley values, a method becoming increasingly widespread within the machine learning community. Finally, we generated synthetic multiwell unit developments along cross-sections stretching from the southwestern to the northeastern edges of the play, in order to extract these Shapley values evenly spaced along a cross-section.
Results/Observations/Conclusions: The cross-sections identify differing production drivers for the Bakken and Three-Forks sweet spots. For the Three Forks, the key driver for the outperformance in the Nesson area is water saturation, with the critical threshold at 45% water saturation. For the Bakken, we see a more diverse set of production drivers, but clay volume ranks the highest both on the Nesson and in the Parshall-Sanish field. Though clay volume varies only between 13% and 17% along the cross-section, the Shapley dataset shows a large increase when clay drops below 14.5%. We speculate that this reflects not only greater frackability but potentially paleobathymetric highs that have persisted and now sit as foci of migration, contributing to overpressure.
Applications/Significance/Novelty: While the focus of this study is the Bakken, the workflow presented here generalizes to other basins and other purposes. Shapley values can provide a machine learning-based perspective on the importance of geoscientific data to the “target” of interest. Incorporation of explainability datasets with traditional methods like cross-sections can provide powerful context for understanding subsurface variability and its impact.
Interdisciplinary Components: This study incorporates work from geoscience & data science.
You can find an early version of this study on our ML in Oil & Gas Blog here.