“What do you mean the model doesn’t use **INSERT PET GEOLOGIC VARIABLE HERE**?!” Anyone who’s built and reviewed enough machine learning models and SHAP values will instantly recognize the question above. The algorithm dropped some geologic variable in the feature selection process due to high correlation with another input geologic variable, and someone in the review has now dismissed your results.
Whether we want to admit it or not, many geologic variables–especially in resource plays–are highly correlated with each other. This could cause pet variables to be dropped in feature selection, counterintuitive results on feature importance, or even issues in model performance.
To tackle these issues, we’ve developed a subsurface workflow that starts with raw log data (or any set of geology input data) and ends with a machine learning rock quality index.
This is the subject of our upcoming URTeC 2020 paper, GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis.
In This Post
From 300 to 5 Geo Variables with Principal Components Analysis: Machine Learning Model Development
To begin, we generated 300 geology features directly from electric logs. We started with an algorithm that auto-picked the tops of the Lower, Middle, and Upper Bakken and Three Forks Formations (from the Williston Basin of North Dakota). The outputs included values like P90 resistivity of the Upper Bakken, mean neutron porosity of the Middle Bakken, and P5 sonic for the Lower Bakken. Additionally, we incorporated a variety of structural tops above and below in the column, and mudlog gas measurements. This left us with over 300 measurements — quite a lot!! Click the video below for an overview of the entire process:
In order to reduce down the variables, we employed a principal components analysis (PCA). PCA attempts to express as much of the variation in the inputs with as few reduced variables as possible. For instance, you can describe a set of points nearly linear in x-y space with x. Or, you can express a cloud of points tracing the radius of a circle with radius from the center.
In order to maintain explainability, we can reconstruct which variables went into each PCA feature It’s also useful to see that different geologic variables do correlate. Below you can see the depth-neutron porosity example (geoPCA0 feature).
Understanding Feature Impact with SHAP values
Subsequently, we train a model and generate SHAP values. SHAP values are a powerful tool for understanding how much each feature contributed to the model prediction. They’re available for all our customers in Novi Cloud, for use with model transparency, sensitivity studies, and general performance research/insights. We’d like to highlight a few key features of SHAP Values.
First, every feature has a SHAP value for each IP day and fluid stream. Second, they come in units of barrels (or mcf). Third, they can be positive or negative. Fourth, they represent how much that feature pushed the prediction above or below the average well in the model. Below, I have plotted the SHAP values for geopc0 and geopca1 against their feature values. You can read it as low values for geopca0 (wells in the deep part of the basin) receive a positive force.
Machine Learning Model Deployment – Rock Quality: geoSHAP
Because the SHAP values represent how much each feature contributed to the model prediction, we can sum up the SHAP values for the geo features to get the total impact of the geology. We call this sum geoSHAP. Essentially, this is the model’s rock quality index. The model is able to identify high-producing areas like the Nesson Anticline, Ft. Berthold sub-play, and Parshall-Sanish field, despite being fed no explicit geographic information nor any interpreted products (sorry Archie’s!).
Our customers use geoSHAP for a variety of use cases, including inventory ranking, looking for completions-geology-spacing interactions, and for performance benchmarking. It also provides a concrete grounding for those not familiar with PCA. Displaying the geoSHAP map can reassure experts that the machine learning model was able to identify the sweet spots. And don’t forget — because our models remove inherent biases, measure completion design changes, predict gas, oil/condensate, and water, geoSHAP is available for each of those streams!
- Principal components analysis and SHAP values are a powerful tool to handle a huge range of potential input geo data, even working just from raw logs.
- GeoSHAP is a machine learning based model of rock quality index. It provides a useful anchor to help nonexperts assess the model behavior, or for a variety of other use cases.
- GeoSHAP is available for any Novi model through Novi Cloud.
Paper Details ::
GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Based Models and Principal Components Analysis, Control ID 2743
URTeC 2020, Monday, July 20, Afternoon Session, Theme 3: Reservoir Characterization and Well Placement Using Modern Tools and Workflows.