How Much Better Could We Have Done? Using a Time Machine Method to Quantify the Impact of Incremental Geologic data on Machine Learning Forecast Accuracy
|- Tuesday, July 21st at 2:15 PM | Exhibition Hall - Station - Theme 3: What's in Your Rock: Geological Integration that Impacted Go-Forward Business Decision - AUTHORS:: J. Reed, C. Macalla*|
Objectives/Scope: Operators spend millions of dollars each year gathering and interpreting subsurface data to improve their field development strategies. While most companies are committed to this expenditure, it is difficult to quantify its importance for development planning and forecasting. This study uses a novel machine learning approach to quantify the value of robust reservoir characterization and basin-wide mapping. We employed a “time machine” (aka backtesting) approach that trains machine learning models only on data from before a certain cutoff date. This allows us to interrogate the impact of enriched subsurface data on the models’ pre-drill forecast accuracy. Howard County, Texas (Midland Basin) is the primary area of interest and represents some of the more recent development in the basin.
Methods/Procedures/Process: We used a backtesting approach to compare predicted vs. actual well performance. We built two machine learning models to predict oil production in the Midland Basin. Both models were trained on wells that were put online before 01/01/2019. Those wells were tested against wells brought online after 01/01/2019. Both models used publicly available completions data, proprietary interwell spacing measurements, and a base set of geologic features: structure and gross thickness. The second model was trained on the same engineering features plus a set of enriched subsurface features: porosity, vclay, TOC, brittleness, BVI, Sw, permeability, STOOIP, and WMR. We evaluated the accuracy of the model predictions for wells online after 01/01/2019 against the actual production data.
Results/Observations/Conclusions: The enriched subsurface data model improved cumulative (cum) 360-day accuracy by more than 7 percentage points when compared to the structure and gross thickness model. Accuracy is defined as the absolute cum error percentage for predicted vs actual production. We used 519 Howard Co. wells in our test set with at least 360 days of actual production. The actual cum 360-day oil production for these wells was 85,500,000 (i.e, ~ 165,000 bbls/well). The base subsurface model underpredicted the cum 360-day oil production by > 7%, while the enriched subsurface model overpredicted by < 1%. The enriched subsurface features model resulted in a one-year oil forecast that accurately predicted an additional 6,000,000 (i.e., ~ 11,600 bbls/well) barrels for the sample wells than the original model.
Applications/Significance/Novelty: We have demonstrated the value of applying a “time machine” approach to evaluate improved model accuracy by using enriched subsurface data in an emerging area of the Permian. This novel approach quantifies the value and importance of robust reservoir characterization and basin-wide geologic mapping. With so many decisions derived from production forecasts, it’s essential for operators to generate accurate predictions. Underpredicting the production of a new development pad could lead to underdevelopment and opportunity cost loss of millions of dollars. Our results support the need to continue to invest in subsurface data collection and interpretation.