[URTeC 2020] GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis (ID 2743)

Since Novi was founded, customers, partners, and prospects have been asking us if machine learning could be used to extract a subsurface model directly from logs. With this work leveraging statistical extraction and dimensional reduction with Principal Component Analysis, we have answered this question with tens of thousands of electric logs in the Williston. Adding in SHAP values, we have a powerful measure of machine learning rock quality. We wrote a blog post about the workflow, or you can read the abstract & request a copy of the paper below.


Talk Details::

Monday, 4:20 PM. Theme 3: Reservoir Characterization and Well Placement Using Modern Tools and Workflows

SHAP values for proppant, fluid loading, stage length, geology, and spacing, for tightly-spaced wells with >1000 #/ft proppant in the Williston Basin. Proppant shows most impact from IP120-210, with steadily decreasing importance through time.



Although machine learning models can provide tremendous value to the unconventional oil and gas industry, interpreting their inner workings and outputs can be a laborious, time consuming, and difficult process. Here we present a novel method for extracting an overall rock quality index from a machine learning model trained on well logs. This rock quality index (RQI), which we term geoSHAP, can be used for performance benchmarking, completions tailoring, and acreage evaluation workflows. We trained a decision trees-based model on a regional Williston Basin dataset. The model predicts oil, gas, and water production at 30-day increments out to IP 720 based on training features of completions design, petrophysical grids, and spacing/stacking parameters. We started with over 400 petrophysical grids and reduced them down to 5 principal components using a Gaussian Kernel Principal Components Analysis. We then employ SHAP values (SHapley Additive exPlanations), which reflect how much each individual feature contributed to the model prediction. To extract our RQI, we sum the SHAP values for each of the principal geologic components for each well at each IP day. These summed geoSHAP values reflect the overall rock quality around the basin, identifying sweet spots and low performing areas. The model is able to identify high-performing areas on the Nesson Anticline, Antelope Anticline, Fort Berthold area, and Parshall/Sanish. We also show how the geoSHAP trends with overall operator performance and can be used to benchmark performance relative to expectation. This method is repeatable across trees-based machine learning algorithms. It removes the need to construct partial dependence plots or to take the time-consuming steps of running synthetic pads across the entire basin. Additionally, this method simplifies the selection of petrophysical grids and removes issues with multicollinearity that can debilitate machine learning models. GeoSHAP provides a purely empirical perspective on rock quality that can be compared to more prescriptive, assumptions-laden traditional methods, such as combining Archie’s equation with recovery factors. It also provides a generalizable method applicable to models built with simpler, easier to obtain data such as formation tops and isopachs.

Enter your name and email below. We will email the technical paper to you right away.

First name(Required)