What Does Data Quality Look Like – Cohorts
When you consider an Artificial Intelligence and Digital Oilfield project, it is important to understand that all of the wells might not be candidates to participate in the building and training of the solution. We call the set of wells that will be used for building our machine learning models a cohort. While it is possible to have different cohorts for different diagnostic, failure or optimization conditions it normally far best to have a single cohort. Teams will be spending substantial amounts of time to better understand the design, implementation, and operation of this set of wells. A clear consistent set of known wells is critical to this methodology.
For example, the chart above is the output of a recent data quality assessment performed by OspreyData. It outlines the number of wells that were initially considered for this project, then through a series of evaluation steps, wells were removed. In this case, we started with 174 wells for consideration and ended with a single cohort of only 100 wells.
The project teams worked collaboratively to establish a set of criteria or dimensions for evaluation for this assessment. Not all data quality assessments have the same or as strict a set of guidelines, but it is normal to have a 30 to 40% reduction of the wells during the selection process.
To clarify, the cohort is used during the construction of machine learning models. Once a set of models is built, they can be used on all of the wells available. There is a possibility that some wells may still not participate, typically due to missing sensors on the specific well. For instance, if we are attempting to detect “holes in tubing” for a well, then “casing pressure” and “tubing pressure” are required.
If you would like to communicate with our sales or engineering teams about how we can help you evaluate your source data and build a cohort model that leads to faster response times and a higher ROI, feel free to contact us here. We will respond to your inquiry as soon as possible.