“What do you mean the model doesn’t use **INSERT PET GEOLOGIC VARIABLE HERE**?!” Anyone who’s built and reviewed enough machine learning models and SHAP (SHapley Additive exPlanations) values will instantly recognize the question above. The algorithm dropped some geologic variable in the feature selection process due to high correlation with another input geologic variable, and someone in the review has now dismissed your results.
Whether we want to admit it or not, many geologic variables–especially in resource plays–are highly correlated with each other. This could cause pet variables to be dropped in feature selection, counterintuitive results on feature importance, or even issues in model performance.
To tackle these issues, we’ve developed a subsurface workflow that starts with raw log data (or any set of geology input data) and ends with a machine learning rock quality index.
This is the subject of our upcoming URTeC 2020 paper, GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis.
In This Post:
- From 300 to 5 Geo Variables with Principal Components Analysis: Machine Learning Model Development
- Understanding Feature Impact with SHAP values
- Machine Learning Model Deployment – Rock Quality: geoSHAP
- Conclusions
From 300 to 5 Geo Variables with Principal Components Analysis: Machine Learning Model Development
To begin, we generated 300 geology features directly from electric logs. We started with an algorithm that auto-picked the tops of the Lower, Middle, and Upper Bakken and Three Forks Formations (from the Williston Basin of North Dakota). The outputs included values like P90 resistivity of the Upper Bakken, mean neutron porosity of the Middle Bakken, and P5 sonic for the Lower Bakken. Additionally, we incorporated a variety of structural tops above and below in the column, and mudlog gas measurements. This left us with over 300 measurements — quite a lot!! Click the video below for an overview of the entire process:
In order to reduce down the variables, we employed a principal components analysis (PCA). PCA attempts to express as much of the variation in the inputs with as few reduced variables as possible. For instance, you can describe a set of points nearly linear in x-y space with x. Or, you can express a cloud of points tracing the radius of a circle with radius from the center.

In order to maintain explainability, we can reconstruct which variables went into each PCA feature It’s also useful to see that different geologic variables do correlate. Below you can see the depth-neutron porosity example (geoPCA0 feature).

SHAP values in machine learning?
Commonly, machine learning models are used to forecast outcomes, but SHAP values from a ML model enable you to easily extract insights, by showing the impact of each feature value on the model forecast.
SHAP values are based on the cooperative game theory mathematics of Shapley values. In this case, the “game” is the outcome of the model and the “players” are the features in the model. Shapley values quantify the contribution that each player brings to the game, while SHAP Values quantify the contribution that each feature brings to the prediction function made by the ML model.
Because of this, SHAP Values are used to increase the transparency and interpretability of any ML models like linear regression, gradient boosting, a neural network, tree based model, or any other machine learning algorithm that takes some features as input and produces some predictions as model output, by calculating the impact of having a certain value for a given feature in comparison to the prediction we’d make if that feature took some baseline value.
It is a powerful tool to “reverse engineer” the forecast from a ML model, compared to other feature importance scores that show collected results across the whole dataset. Because it is at the level of an individual prediction-feature pair, a SHAP value shows the direction and magnitude of the model explanation; e.g. positive shap values indicate a positive impact, and vice versa for negatives.
Understanding Feature Impact with SHAP values
Subsequently, we train a model and generate SHAP values. SHAP values are a powerful tool for understanding how much each feature contributed to the model prediction. They’re available for all our customers in Novi Cloud, for use with model transparency, sensitivity studies, and general performance research/insights. We’d like to highlight a few key features of SHAP Values.
First, every feature has a SHAP value for each IP day and fluid stream. Second, they come in units of barrels (or mcf). Third, they can be positive or negative. Fourth, they represent how much that feature pushed the prediction above or below the average well in the model. Below, I have plotted the SHAP values for geopc0 and geopca1 against their feature values. You can read it as low values for geopca0 (wells in the deep part of the basin) receive a positive force.

Machine Learning Model Deployment – Rock Quality: geoSHAP
Because the SHAP values represent how much each feature contributed to the model prediction, we can sum up the SHAP values for the geo features to get the total impact of the geology. We call this sum geoSHAP. Essentially, this is the model’s rock quality index. The model is able to identify high-producing areas like the Nesson Anticline, Ft. Berthold sub-play, and Parshall-Sanish field, despite being fed no explicit geographic information nor any interpreted products (sorry Archie’s!).

Our customers use geoSHAP for a variety of use cases, including inventory ranking, looking for completions-geology-spacing interactions, and for performance benchmarking. It also provides a concrete grounding for those not familiar with PCA. Displaying the geoSHAP map can reassure experts that the machine learning model was able to identify the sweet spots. And don’t forget — because our models remove inherent biases, measure completion design changes, predict gas, oil/condensate, and water, geoSHAP is available for each of those streams!
You can see in detail how we build this workflow in the video below.
Conclusions
- Principal components analysis and SHAP values are a powerful tool to handle a huge range of potential input geo data, even working just from raw logs.
- GeoSHAP is a machine learning based model of rock quality index. It provides a useful anchor to help nonexperts assess the model behavior, or for a variety of other use cases.
- GeoSHAP is available for any Novi model through Novi Cloud.
Understand how rock quality varies in your area of interest with the help of machine learning
With Novi’s cloud platform, you can generate GeoSHAP values to estimate rock quality across a region, basin, or play using a machine learning model. This method is automated, and purely data-driven – that means you can get reliable rock quality values without having to waste endless time fine-tuning your STOIIP maps. Save your team effort and time!
You can request a free demo here: novilabs.com/demo
Paper Details ::
GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Based Models and Principal Components Analysis, Control ID 2743
URTeC 2020, Monday, July 20, Afternoon Session, Theme 3: Reservoir Characterization and Well Placement Using Modern Tools and Workflows.