• Products
    • Products Overview
    • Oil and Gas Data
    • Oil and Gas Software
  • ML in Oil & Gas Blog
  • Resources
  • News & Events
    • Company News
    • Conference Presentations
  • About
    • About Novi Labs
    • Careers
  • Contact
Novi Labs
  • Skip to primary navigation
  • Skip to main content
  • Skip to footer
get started on your future of well planning with Novi

Novi Labs

  • Products
    • Products Overview
    • Oil and Gas Data
    • Oil and Gas Software
  • ML in Oil & Gas Blog
  • Resources
  • News & Events
    • Company News
    • Conference Presentations
  • About
    • About Novi Labs
    • Careers
  • Contact

Novi geoSHAP – estimating rock quality using SHAP values in machine learning models: URTeC 2020 Novi Paper Summary

July 14, 2020

About the author

Ted Cross

Ted Cross is a Technical Advisor at Novi, where he applies his background in geology to ensure Novi's predictive analytics align with customer development planning problems. Prior to Novi, Ted was a Senior Geologist with ConocoPhillips.

“What do you mean the model doesn’t use **INSERT PET GEOLOGIC VARIABLE HERE**?!” Anyone who’s built and reviewed enough machine learning models and SHAP values will instantly recognize the question above. The algorithm dropped some geologic variable in the feature selection process due to high correlation with another input geologic variable, and someone in the review has now dismissed your results.

Whether we want to admit it or not, many geologic variables–especially in resource plays–are highly correlated with each other. This could cause pet variables to be dropped in feature selection, counterintuitive results on feature importance, or even issues in model performance.

To tackle these issues, we’ve developed a subsurface workflow that starts with raw log data (or any set of geology input data) and ends with a machine learning rock quality index.

This is the subject of our upcoming URTeC 2020 paper, GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Models and Principal Components Analysis.

In This Post

  1. From 300 to 5 Geo Variables with Principal Components Analysis: Machine Learning Model Development
    1. Understanding Feature Impact with SHAP values
      1. Machine Learning Model Deployment – Rock Quality: geoSHAP
        1. Conclusions

          From 300 to 5 Geo Variables with Principal Components Analysis: Machine Learning Model Development

          To begin, we generated 300 geology features directly from electric logs. We started with an algorithm that auto-picked the tops of the Lower, Middle, and Upper Bakken and Three Forks Formations (from the Williston Basin of North Dakota). The outputs included values like P90 resistivity of the Upper Bakken, mean neutron porosity of the Middle Bakken, and P5 sonic for the Lower Bakken. Additionally, we incorporated a variety of structural tops above and below in the column, and mudlog gas measurements. This left us with over 300 measurements — quite a lot!! Click the video below for an overview of the entire process:

          In order to reduce down the variables, we employed a principal components analysis (PCA). PCA attempts to express as much of the variation in the inputs with as few reduced variables as possible. For instance, you can describe a set of points nearly linear in x-y space with x. Or, you can express a cloud of points tracing the radius of a circle with radius from the center.

          An example usage of PCA, with both a gaussian and polynomial Kernel, from http://proc-x.com/2010/02/sas-implementation-of-kernel-pca/

          In order to maintain explainability, we can reconstruct which variables went into each PCA feature It’s also useful to see that different geologic variables do correlate. Below you can see the depth-neutron porosity example (geoPCA0 feature).

          An example of using machine learning models for forecasting
          Maps and inputs for geopca0 and geopca1 variables. Geopca0 tracks closely with depth, and geopca1 tracks closely with sonic, picking up on the overpressured eastern part of the play and part of the Nesson anticline.

          Understanding Feature Impact with SHAP values

          Subsequently, we train a model and generate SHAP values. SHAP values are a powerful tool for understanding how much each feature contributed to the model prediction. They’re available for all our customers in Novi Cloud, for use with model transparency, sensitivity studies, and general performance research/insights. We’d like to highlight a few key features of SHAP Values.

          First, every feature has a SHAP value for each IP day and fluid stream. Second, they come in units of barrels (or mcf). Third, they can be positive or negative. Fourth, they represent how much that feature pushed the prediction above or below the average well in the model. Below, I have plotted the SHAP values for geopc0 and geopca1 against their feature values. You can read it as low values for geopca0 (wells in the deep part of the basin) receive a positive force.

          SHAP value comparisons in the machine learning model
          Maps of geopca0 and geopca1, along with their SHAP values, presented in terms of impact on the 720-day cumulative oil (bbl/ft) prediction.

          Machine Learning Model Deployment – Rock Quality: geoSHAP

          Because the SHAP values represent how much each feature contributed to the model prediction, we can sum up the SHAP values for the geo features to get the total impact of the geology. We call this sum geoSHAP. Essentially, this is the model’s rock quality index. The model is able to identify high-producing areas like the Nesson Anticline, Ft. Berthold sub-play, and Parshall-Sanish field, despite being fed no explicit geographic information nor any interpreted products (sorry Archie’s!).

          SHAP value comparison for Bakken and Three Forks.
          GeoSHAP is derived by summing the SHAP values for the input geologic features, in this instance, geopca0-4. On the right, geoSHAP is mapped for the Bakken and Three Forks formations. Dark green represents highest rock quality, dark red lowest.

          Our customers use geoSHAP for a variety of use cases, including inventory ranking, looking for completions-geology-spacing interactions, and for performance benchmarking. It also provides a concrete grounding for those not familiar with PCA. Displaying the geoSHAP map can reassure experts that the machine learning model was able to identify the sweet spots. And don’t forget — because our models remove inherent biases, measure completion design changes, predict gas, oil/condensate, and water, geoSHAP is available for each of those streams!

          Conclusions

          • Principal components analysis and SHAP values are a powerful tool to handle a huge range of potential input geo data, even working just from raw logs.
          • GeoSHAP is a machine learning based model of rock quality index. It provides a useful anchor to help nonexperts assess the model behavior, or for a variety of other use cases.
          • GeoSHAP is available for any Novi model through Novi Cloud.

          Paper Details ::

          GeoSHAP: A Novel Method of Deriving Rock Quality Index from Machine Learning Based Models and Principal Components Analysis, Control ID 2743

          URTeC 2020, Monday, July 20, Afternoon Session, Theme 3: Reservoir Characterization and Well Placement Using Modern Tools and Workflows.

          Filed Under: Big Data Management, Machine Learning in Oil and Gas Blog, Conference Presentations

          Previous postWater, water everywhere: oil well water analysis with machine learning models to improve produced water forecasts in the Williston Basin: URTeC 2020 Novi Paper Summary
          Next postMaking sense of it all: extracting actionable core-data from pXRF using PCA and K-means cluster analysis
          If you would like more information, please reach out using the form below.
          • This field is for validation purposes and should be left unchanged.

          Footer

          Connect

          • LinkedIn
          • YouTube

          oil and gas analysis by novi

          Contact

          1905 Aldrich Street, Suite 220
          Austin, Texas 78723

          intro@novilabs.com
          512.368.9042

          • Home
          • Products
          • Resources
          • About
          • ML in O&G Blog
          • Privacy Policy

          Copyright © 2021 Novi Labs
          All rights reserved.