“What do you mean the model doesn’t use **INSERT PET GEOLOGIC VARIABLE HERE**?!”
Anyone who’s built and reviewed enough machine learning models will instantly recognize the question above — some geologic variable (could be carbonate %, hydrocarbon pore volume, permeability, anything) is dropped in the feature selection process due to high correlation with another input geologic variable, and someone in the review has now dismissed your results.
Whether we want to admit it or not, many geologic variables–especially in resource plays–are highly correlated with each other. This could cause pet variables to be dropped in feature selection, counterintuitive results on feature importance, or even issues in model performance.
To tackle these issues, we’ve developed a subsurface workflow that starts with raw log data (or any set of geology input data) and ends with a machine learning rock quality index.
This is the subject of our upcoming URTeC 2020 paper, GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis.
In This Post
From 300 to 5 Geo Variables with Principal Components Analysis
We generated 300 geology features directly from electric logs, using an algorithm that auto-picked the tops of the Lower, Middle, and Upper Bakken and Three Forks Formations (from the Williston Basin of North Dakota). The outputs included values like P90 resistivity of the Upper Bakken, mean neutron porosity of the Middle Bakken, and P5 sonic for the Lower Bakken. We also incorporated a variety of structural tops above and below in the column, and mudlog gas measurements. This left us with over 300 measurements — quite a lot!! Click the video below for an overview of the entire process:
In order to reduce down the variables, we employed a principal components analysis (PCA). PCA attempts to express as much of the variation in the inputs with as few reduced variables as possible. For instance, a set of points nearly linear in x-y space can just be described with x, or a cloud of points tracing the radius of a circle can be expressed with radius from the center.
We can reconstruct which variables went into each PCA feature, which maintains explainability. It’s also useful to see that different geologic variables do indeed have correlation — like depth and neutron porosity in the geopca0 feature below.
Understanding Feature Impact with SHAP values
SHAP values are a powerful tool in the Novi arsenal for understanding how much each feature contributed to the model prediction. They’re available for all our customers in Novi Cloud, for use with model transparency, sensitivity studies, and general performance research/insights. Every feature has a SHAP value for each IP day and fluid stream. They come in units of barrels (or mcf), and can be positive or negative, representing how much that feature pushed the prediction above or below the average well in the model. Below, I have plotted the SHAP values for geopc0 and geopca1 against their feature values. You can read it as low values for geopca0 (wells in the deep part of the basin) receive a positive force.
Machine Learning Rock Quality: geoSHAP
Because the SHAP values represent how much each feature contributed to the model prediction, we can sum up the SHAP values for the geo features to get the total impact of the geology, which we term geoSHAP. Essentially, this is the model’s rock quality index. The model is able to identify high-producing areas like the Nesson Anticline, Ft. Berthold sub-play, and Parshall-Sanish field, despite being fed no explicit geographic information nor any interpreted products (sorry Archie’s!).
Our customers use geoSHAP for a variety of use cases, including inventory ranking, looking for completions-geology-spacing interactions, and for performance benchmarking. It also provides a concrete grounding for those not familiar with PCA — displaying the geoSHAP map can reassure experts in a review that the model was able to learn where the good rock is, even if the process is unfamiliar to many. And don’t forget — because our models predict gas, oil/condensate, and water, geoSHAP is available for each of those streams!
- Principal components analysis and SHAP values are a powerful tool to handle a huge range of potential input geo data, even working just from raw logs.
- GeoSHAP is a machine learning rock quality index, and provides a useful anchor to help nonexperts assess the model behavior, or for a variety of other use cases.
- GeoSHAP is available for any Novi model through Novi Cloud.
Paper Details ::
GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis, Control ID 2743
URTeC 2020, Monday, July 20, Afternoon Session, Theme 3: Reservoir Characterization and Well Placement Using Modern Tools and Workflows.