With thousands or tens of thousands of wells in every major shale play, operators and financial institutions know that the data is out there to help them make better decisions. But implementing reliable artificial intelligence and machine learning methods can be technically challenging and time-consuming, even for experts. So companies often abandon their efforts to implement a data-driven approach after a year or two of trial and error.
Since our founding in 2014, Novi has built thousands of machine learning models in every major basin stretching from Canada to Argentina. In working with our customers, we have learned firsthand what it takes to build a successful data-driven program. We put all that learning into our self-service platform, which, with the release of Model Engine, is now complete. What have we learned? Read on to find out.
In this post:
Data Science is 80% Data and 20% Science
Although shiny machine learning algorithms get most of the attention, data preparation takes the most work. Erroneous completions data must be fixed or removed. Messy formation names need to be cleaned up. You’ll need to blend your proprietary, daily production data with monthly data from a public source. Header, production, and subsurface data all need to be joined together – to say nothing of spacing calculations!
Beyond all that, this needs to be in a sustainable, evergreen package. What happens if you trade for some dailies or open a VDR? Better hope that all your data prep wasn’t manual, because now you need to do it all again.

We know just how hard this can be, which is why we spent so much time building Data Engine, giving our customers the tools to build high quality, analytics-ready datasets specific to oil and gas. It’s also why we acquired ShaleProfile, so our customers could start on day one with the best public data set in the market. A year after Data Engine’s release, we have seen the results: customers building accurate models and making data-driven decisions in days, not months, years, or ever.
You cannot succeed without self-service capabilities
Let’s say you’re an operator in the Haynesville. You’re intrigued by the improved recovery of upsized completions jobs, but concerned about rising sand costs. You hire an impressively credentialed data scientist to crunch the numbers. They gather your data, retreat to their Bay Area ivory tower, and come back nine weeks later with an answer: 2,950 lbs/ft.
Who cares if they are right – are your engineers going to accept that number? This is a lesson we learned the hard way at Novi– it is not enough to have the best algorithms in the industry, to have the most accurate forecasts, or even to have a team of intimidatingly intelligent PhDs. Engineers and analysts need to have control over each step of the process. Only then can they confidently take ownership of the results.
We call this self-service: meaning each decision to be made, every knob to be turned or field to be mapped, is put in the hands of our users. Which data source would you assign highest confidence for total proppant volumes? What variables should the models use to predict production? What undrilled locations should we make forecasts for? All of these choices, and many more, can be made by our users in our self-service platform without typing a single line of code. Ultimately, this means data-driven decisions made confidently with the involvement of your whole team.
Transparency is critical
Self-service provides insight into the process of building datasets, training models, and configuring forecasts, but true transparency requires shining light onto the model itself. Everyone reading this post has surely heard machine learning pejoratively referred to as a “black box.”
We were the first in the oil and gas industry to leverage the machine learning innovation of “SHAP Values”, which we also call “production drivers”. These unparalleled insights allow engineers and management teams to understand why the model forecasted what it did. Our customers have seen tremendous benefit from these production driver datasets, using them for such workflows as mapping rock quality around a basin, investigating record-setting producers, and understanding why wells turn gassy.

Furthermore, we employ algorithms which are much less opaque than deep learning models or some of the other neural net architectures that have received a lot of attention over the last few years. Besides being the most accurate for oil and gas forecasting, our algorithms also mirror traditional area-based type curve thought processes. At their core, the algorithms group together similar wells and take a weighted average–just done at computerized scale and speed across thousands of decision trees, with minimal human bias.
Besides being inherently less opaque, these algorithms can tell you which “analog wells” were used for every forecast! Our customers use these analog well datasets to understand the forecasts, identify new analogs they hadn’t considered, and even refine their own type curve processes.
Finally, we also generate quantified confidence intervals for every forecast. With every forecast, users can understand how each variable impacts the forecast, which analog wells informed the forecast, and how confident the model is in the forecast. Ultimately, this transparency builds the confidence our customers need to make data-driven, multi-million dollar investment decisions.
Building an accurate model is only the beginning
The model review went great. The engineers were impressed with the low error scores, and management is ready to run it on a pad you’ve got coming up later this year–now what?
The dirty secret of machine learning is that using a model is harder than building it. Models are very sensitive creatures – all the input data for your forecast needs to be in exactly the same structure and format as your input data. Now you need to extract the subsurface data at each well location, calculate parent-child relationships, and format your completions design for all the hypothetical ways to develop this upcoming pad. Better hope everything you did for data prep can be easily repurposed for planned wells!

This is why Novi built a software platform that handles each component of the data -> model -> forecasting workflow. No one in the industry offers anything like what we have built here at Novi with our self-service platform. When it’s time to make a forecast, just drop in your upcoming pad in our map-based GUI, and Forecast Engine will take care of all the data prep and formatting for you. It’s a seamless process that means not just accurate models but accurate models that you can actually use.
If you put in the hard work and build your own models in-house, maintenance for ongoing usage will get you. The bright engineer who spearheaded the successful analytics pilot gets hired away, the model gets more out of date every month, and eventually, you’re left with just one analyst who understands how that python script actually works. What happens when that analyst moves to another company?
You would be amazed by the number of operators who reach out to us after trying & failing to build a sustainable analytics program. Executives always underestimate the magnitude of commitment required to build one of these programs. You’re looking at multiple FTEs to build and maintain the code. Do you really want to manage a team of software engineers? Better to partner with a tech company if you want a successful, affordable implementation of data to decision workflows.
Conclusions
Does that sound like a lot of work to build a successful data-driven program? It is! It took us multiple years with a large team of the best engineers and data scientists to build our self-service machine learning platform, but now, we offer that program to you. Here are our key takeaways, from what we have learned building our platform and from working with dozens of operators:
- Data preparation is most of the work. You must plan for this and resource it accordingly.
- Self-service tools are needed to build accurate models and get buy-in from your engineering teams. You want the people who know your assets and investments best to be driving here.
- Transparency is KEY. Without understanding how the model works, you will not trust it.
- Building a data-driven program requires quite the commitment. Ongoing usage and maintenance will be surprisingly labor-intensive.
Data-driven workflows
What makes data-driven workflows better?
At a high level, the idea is to develop workflows that make use of the available data. This enables engineering teams to develop and implement workflows that have minimized bias, repeatability, and the ability to improve.
How it help your business?
Data-driven workflows improve returns on development decisions and streamline operations. A data-driven approach helps identify, quantify and measure areas that need the most improvement or opportunities to go after. Data-driven workflows increase confidence through quantifying risk.
Improve outcomes using data analytics; Better data, better decisions.
At Novi Labs, we build machine-learning models that will optimize and improve your processes.
Don’t miss out and request a free demo here: novilabs.com/demo