Rejoice geologists, geophysicists and petrophysicists! We provide a novel approach to answer the question: Is it worth it for the industry to spend hundreds of millions of dollars each year to collect and interpret subsurface data? In short, you bet!
This blog post is a summary of our URTeC 2022 paper: “How Much Better Could We Have Done? Using a Time Machine Method to Quantify the Impact of Incremental Geologic Data on Machine Learning Forecast Accuracy Control”
In this post:
The method: using a “time machine” to quantify value of information
Subsurface practitioners have always known how important our work is to guide upstream decisions; however, it has always been difficult to quantify the value of our contributions. Utilizing Novi’s self-service machine learning platform and its time machine method (yes, it’s as cool as its sounds) enables the ability to demonstrate the ROI of data collection an interpretation investments as a function of the actual NPV of wells and pads.

The Novi time machine simply simulates how a model would have worked in the past, by withholding all data from after a certain date and training the model on data from before that date. This gives our users powerful counterfactual abilities to ask questions like “how would the advanced subsurface interpretation have impacted our forecasts back in 2019?”

The experiment: “base” subsurface vs. “rich” subsurface
Our evaluation focused on Howard County, a highly productive region of the Midland Basin that emerged as a hotspot much later than the rest of the Wolfberry play, making it an ideal candidate for counterfactual analysis. Would the addition of advanced subsurface data have enabled the model to identify this highly productive area?
Only subsurface features were varied for this analysis. The table below describes the differences between the features in “base” vs. “rich” models. Engineering features including spacing, completions and production remained constant for both models that are used for the comparison.
The results: how accurate were each of the models?
The 694 Howard County wells held out from the models by time machine produced a total of ~85 million barrels over their first year. The base model underpredicted this total by over 7%, while the enriched model nailed the production, getting within 1% of 85 million barrels.

It is important to note that even though the model included all available subsurface features for the enriched version, model accuracy could be further improved through iterating the inclusion and exclusion of specific subsurface features. The Novi Model Engine platform provides two means for feature selection. The first option allows the user to select which features will be utilized in the model. The second method is for the machine learning pipelines to auto-select the features that provide the most signal to the model outputs (i.e., production in this case).
The value: implications for NPV and leasing decisions
The enriched subsurface provides a more accurate forecast of production, but how does that translate to the assessment of acreage value? In this example, five pads were randomly selected from the study to compare forecasted NPVs for each model. The rich model predicted approximately $32MM additional NPV when compared to the base model. Therefore, if decisions were being made based on the base model, then perhaps those lower NPV estimates would have influenced the decision to not drill and complete (or lease!) because economic hurdles were not met. If the rich model drove the decision making process, then the drilling, completions and production would have proceeded with greater confidence because of the robust project NPV.

This approach can influence a subsurface team’s priorities and focus additional subsurface data collection spend (e.g, which logs/core). For example, if we find that geomechanical properties are key drivers to improve model accuracy and are important to explainability, then the team may want to collect and process additional sonic logs and/or invert seismic volumes for geomechanics. Alternatively, if certain reservoir quality parameters (e.g., Vclay, Sw, porosity, etc.) do not improve the model or contribute to explainability, then perhaps petrophysicists could divert time away from refining those reservoir quality models and focus in other areas.
The time machine workflow is valuable for much more than estimating value of subsurface information, such as:
- Building confidence for decision makers in use of machine learning models.
- Providing context for well reviews. How good were my AFE curves relative to my machine learning prediction and actuals?
- Addressing questions such as: How does my model accuracy change when I introduce new spacing and/or engineering features?
Conclusions
In this analysis we:
- Demonstrated that high quality subsurface features can increase predictive model accuracy.
- Quantified how increased model accuracy supports improved financial decisions.
- Explained how an improved understanding of the value of subsurface interpretation can help prioritize team workloads and data collection spend.
Paper Details ::
How Much Better Could We Have Done? Using a Time Machine Method to Quantify the Impact of Incremental Geologic Data on Machine Learning Forecast Accuracy Control. ID: 3723907
This paper was prepared for presentation at the Unconventional Resources Technology Conference held in Houston, Texas, USA, 20-22 June 2022.
—-
What is subsurface geology?
Subsurface geology is the study of rock formations and other features beneath the land or sea-floor surface. Within oil and gas, pretty much all the geology we deal with is subsurface, though of course it can be informed by outcrops. .
Why do we need geologic data?
Geologic data helps geologists and engineers understand the characteristics of the reservoirs and source rocks in the subsurface. This is the basis for all oil and gas exploration and development.
Why is subsurface data important for the oil and gas industry?
Subsurface data is required to determine the rock quality at any given location. This can be used to evaluate new plays, guide development decisions, or optimize field production.
Subsurface data is used within machine learning models for such tasks as identifying sweet spots, customizing completions designs, and choosing a spacing/stacking pattern. This provides a data-driven approach to these critical upstream workflows.
What is meant by seismic data?
Seismic data, in the oil and gas space, refers to the data collected by sending sound waves into the subsurface and recording the sound reflections back to the surface. The seismic data can be interpreted to understand subsurface structures, map faults, study depositional patterns, and more. 3D seismic data is rarely available on the regional scale for unconventional oil and gas plays, so operators more commonly rely on interpreted well log data, which is the basis for Novi’s subsurface.
Paying for interpreting subsurface data is still significantly less expensive than working based on a inaccurate predictive model
Get access to Novi’s self-service machine learning platform and its time machine method to make better investment decisions.
You can request a free demo here: novilabs.com/demo