Sign In

Communications of the ACM

Arab World special section: Hot topics

Data Science for the Oil and Gas Industry in the Arab Region

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
oil pump jacks and data graph

Credit: Getty Images

Oil and gas (O&G) sources will still supply around 50% of the global energy demand by 2040.a In this article, we make the case for why the Arab region is well positioned for building world-class data science teams to fill the supply shortage of data professionals,5 especially in the O&G field critical to region's economy. This article presents challenges facing O&G industry players, such as governments, regulatory bodies, operators, and investors, and shows how Raisa Energy (with its Egypt-based data science team) is efficiently and effectively solving these challenges. Such challenges aim at assessing the economic viability of an O&G asset that depends on several factors (as shown in the accompanying figure) such as estimating well production, O&G prices, and risks associated with inputs uncertainty. It is worth emphasizing that the challenges presented here are global in nature and yet are tackled with a team fully formed from the region working at a world-class research and development level. We hope this article will motivate academics and practitioners to tackle challenges within O&G using data science technologies.

Figure. A Zohr gas field in Egypt.

The Arab region is well suited for building data science teams serving a global market specially for the O&G industry:

  • There is recent interest from governments in the region to offer data science-related programs and degrees.
  • The region can supply a talented, well-trained workforce at a relatively lower cost.
  • The O&G industry is key in the region; hence data is readily available in large quantity. Such massive data is the key behind any modern artificial intelligence system.

For example, Raisa Energy, a U.S.-based O&G investment company, has its entire software and data science teams in Egypt building capacity in the important energy domain offering a unique edge for the region. Though junior talent is generally available, there remains a challenge in easily finding senior talent as professionals typically move early into managerial roles for career growth. Our answer to this challenge is twofold: create opportunities for juniors to grow technically by working on challenging problems of global nature, and complement juniors by experienced returning expats to the region.

We next detail some of the technical challenges that Raisa Energy faces and the novel approaches it uses in solving them that resulted in several academic publications and U.S. patents.1,2,3,4

Well production forecasting is a time series forecasting problem of an O&G well production. Well features include geological maps, engineering parameters used in drilling and completing the wells such as the amount of water and chemicals, location information, well lateral length, and production of neighboring wells. Raisa introduced a state-of-the-art production forecasting system using an ensemble of a random forest and a sequential deep learning model trained on 50,000 wells1,2 and has been granted a U.S. patent on the topic. Raisa's approach is the first of its kind to utilize a comprehensive set of features, leverage data in the range of thousands of wells and use advanced machine learning methods through experimentation. Our models are tested on independent test sets both offline (on older held-out sets) and online in the day-to-day job of reservoir engineers. When measured on held-out sets our models achieve close to 90% prediction accuracy. To cope with the ever-growing nature of well data, models are continuously retrained using our in-house continuous training platform. Whenever a retrained model scores higher than existing models on test sets it is pushed to production.

The O&G industry is a risky business as there is large uncertainty in input factors values such as geology, oil prices, and the effect of neighboring wells.

In addition to estimating a single-point estimate production curve, it is important to estimate a probabilistic forecast to model variability in input features. We estimate different percentile curves by adapting machine learning loss function to include quantile loss.3 Finally, we use production forecasting models to answer what-if questions such as what happens if we increase a chemical? Such capability is crucial for optimizing well operation for instance.

Timing indicators for well life cycle events. Investors in O&G need to estimate when well production begins, which signals the start of cash flow. An investor does not control, though the drilling schedule of lands owned by an operator and hence machine learning can be used to reverse engineer operator logic in selecting lands to be drilled next. This problem is unique to Raisa as an investor in the O&G industry and there is very little work done on this technically challenging problem. Raisa has been making strides tackling this problem, which led to a pending U.S. patent. The main idea of our approach is to use a comprehensive set of features to model operator rig movement from location to location and hinge on the machine learning model to optimize mapping from these features to the selected next best locations to move. Features include production volume estimation at target site, distance to pipeline infrastructure, automatic detection of drilling pads using satellite imagery,2 and estimated oil price to model operator movement using publicly available historical data gathered by GPS-powered drilling rigs. For example, we have researched and implemented a deep learning image segmentation method based on U-nets to detect drilling pads, which are clear indicators of imminent drilling at the site.2 Our work on semantic image segmentation was among the top four competing methods in the INRIA challenge.2

Figure. Economic evaluation of oil and gas assets.

Operator movement from location to location is then modeled as a sequential decision problem and our current solution relies on optimizing a step-by-step classifier to build the optimal operator trajectory. Once we build this classifier, we test various operator models using a subset of the already drilled locations achieving around 80% classification accuracy of next location prediction for operators with enough historical samples. We have also experimented with more elaborate approaches such as reinforcement learning by modeling the problem as a Markov decision process (MDP). Our initial experimental results are very encouraging.

Risk analysis. The O&G industry is a risky business (whether in exploration or production) as there is large uncertainty in input factors values such as geology, oil prices, and the effect of neighboring wells. To mitigate such risks, several scenarios governing input parameters are studied, along with correlation between inputs. For example, we estimate probability distributions for well production, geological parameters, and oil price estimates and hence it is of paramount importance to have probabilistic predictions on key variables of interest. We also estimate correlations among these variables. The goal is to build a multivariate probability distribution of inputs with the dollar value of the O&G asset being the output. We use Monte Carlo-based methods to sample (typically 2,000 samples are used) from input distributions and then run the samples through our economic model to estimate output cash flows.

Raisa's risk simulation solution is on par with expensive simulation software giving Raisa a unique competitive edge in the market.

With such an output distribution at hand, one can estimate the level of risk associated with an O&G asset. It is important to note that Monte Carlo simulations are computationally expensive and hence usually require fast sampling strategies (such as lattice hyper cube sampling) and use of parallelization or distributed processing. Another closely related challenge to risk estimation is to find an optimal risk-reward portfolio of assets owned by a single investment company for instance. The goal is to build a risk-reward profile of all assets owned by Raisa and then decide on new opportunities based on its economic Rvalue as well as where it fits with already owned assets in terms of risk-reward profile. Through careful modeling of input distributions and a highly optimized parallelized implementation, Raisa's risk simulation solution is on par with expensive simulation software giving Raisa a unique competitive edge in the market.

Document information extraction. Much of O&G data comes in the form of text containing well cost information, production reports, and legal documents describing drilling restrictions. Automatic information extraction technology can be a huge time saver in these scenarios. For example, at Raisa we automatically extract cost information from well documents using semi conditional random fields (CRF) models to achieve highly accurate results (typically more than 85% F1 on fields of interest) on a challenging problem with little labelled data and high variability in document structure and templates. Over the years, we have built a full text processing pipeline that handles various document formats including those requiring OCR. Typical processing blocks include page layout extraction, table extraction and slot filling. Once information is automatically extracted it can be stored in databases enabling fast search and retrieval.

Figure. An oil and gas processing plant in Egypt.

Back to Top

Other Challenges

Beside the challenges mentioned here there are other technical challenges in the O&G field where data science can provide effective solutions. We list a sample here for the sake of completeness.

  • Oil price estimation. Oil price is key in estimating cash flow generated by an O&G asset. This is a time series forecasting problem where deep learning methods have been used such as Gated Recurrent Units (GRU) architectures. In other work, researchers analyzed news articles to estimate oil price movement using deep networks.
  • Model interpretability. O&G decision makers seek transparent ML models enabling them to understand the effect of different features on prediction. Hence it is crucial to couple accurate, and often complex, prediction models with model interpretability approaches.
  • Regulating oil prices. Regulatory bodies such as OPEC monitor production of member countries to control supply. One recent use of machine learning in this area is through automatic image understanding of oil tanks filling levels from satellite imagery.

Back to Top


1. Amr, S., El Ashhab, H., El Saban, M., Caile, C., Schietinger, P., Kaheel, A. and Rodriguez, L. A large-scale study for a multi-basin machine learning model predicting horizontal well production. In Proceedings of the SPE Annual Tech. Conf. and Exhibition, (Dallas, TX, USA, Sept. 24–26, 2018).

2. Khalil, A. et al. Large-scale semantic classification: outcome of the first year of INRIA aerial image labelling benchmark. IGARSS 2018.

3. Mostafa, H., El Mahdy, M., El Saban, M. and Abo El Kheir, M. Probabilistic time series forecasting for unconventional oil and gas producing wells. In Proceedings of the IEEE Conf. Novel Intelligent and Leading Emerging Sciences, 2000.

4. Systems and methods for optimizing production of unconventional horizontal wells, United States Patent Application 20190284910.

5. State of AI Report, 2020;

Back to Top


Motaz El Saban is Director of Data Science of Raisa Energy, and associate professor in the Faculty of Computers and Artificial Intelligence at Cairo University, Egypt.

Back to Top



©2021 ACM  0001-0782/21/4

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2021 ACM, Inc.


No entries found