Analyzing data is now essential to success in education, employment, and other areas of activity in the knowledge society. Even though several frameworks describe the competencies and skills needed to meet current and future challenges, no data analytics competency framework exists to describe the importance of specific skills to succeed in data analytics assignments. In this article, we explore which competencies are required for effective data analytics by applying the Delphi technique and exploring the opinions of data analytics experts.
Our results present a list of cognitive, intrapersonal, and interpersonal competencies, voted up by a consensus panel of experts in the field. Focusing on the three categories of essential competencies will help to better prepare students and employees for existing and future roles as citizens, employees, managers, parents, and volunteers. We urge policymakers, academic institutions, and educators to establish programs and reassess existing curricula and materials to support the development of these essential competencies.
Analyzing data has become mandatory for making successful decisions and accomplishing tasks, not only in the workplace but also across many contexts related to health, education, leisure, citizenship, and more.28 The demand for professionals in the rapidly growing field of data analytics keeps rising.2,22 Hence, identifying, evaluating, and teaching data analytics competencies is an important goal for academic institutions and organizations in every sector. However, we must understand what those competencies are, and how organizations can identify and measure the competencies of candidates as they pertain to data analytics. A review of the literature reveals discrepancies between contemporary conceptualizations of the skills and capabilities required for data analytics.29
The importance of data. Technological developments, as well as the diffusion of smart devices, have resulted in large volumes of data that are generated at an unprecedented rate. The formats of data and the way humans interact with it are rapidly changing, requiring people to become "data literate." They need to become active data explorers who can plan for, acquire, manage, analyze, and infer insights from data.10 At the same time, technological developments also enable organizations to efficiently manage huge data sets and to store them for long periods. Hence, data becomes a vital factor for both individual and organizational success. The term "big data" was coined by Roger Magoulas from O'Reilly Media in 2005. It refers to a wide range of large datasets almost impossible to manage and process using traditional data management tools—due to their complexity as well as their size.13 Data can exist in the form of documents, email, text messages, audio, images, video, graphics data, and more.12,30
Data analytics is the process of examining data to draw conclusions and make better decisions.26,29 Organizations, therefore, use data analytics techniques to make better organizational decisions. This process might, for example, boost business performance by increasing revenue, improving operational efficiency, or optimizing marketing campaigns. Integrated data can support not only academic research and organizational decisions, but also the analytics requirements of government itself. Data that government agencies collect and use to run their programs provides policymakers with new sources of facts for benchmarking goals and measuring the successes and shortcomings of existing and future programs.15
The profession of data analytics. In the past, data analytics tasks were performed by a limited number of professionals: scientists, researchers, economists, etc. During the last decade, as developments in science and technology have become widely applied in various sectors of the economy, employer demand for data analytics expertise rose and often resulted in new occupations being defined.23 Analysts in the organization who work with the data are key to success in decision-support data-management tasks.36 Data scientists and other professionals skilled in working with large quantities of data are critical to successfully using big data.24 Thus, the demand for professionals who can support the work keeps rising.1,11,22 The Occupational Information Network (O*NET) database of occupation descriptions is a program sponsored by the U.S. Department of Labor.23 The O*NET-SOC 2019 taxonomy, which includes the most up-to-date O*NET list, contains descriptions of 1,016 occupations, covering the entire U.S. economy. Many of the new data occupations—for example, data scientists and digital forensics analysts—were added to the taxonomy only recently, in 2018 and 2019,32 further evidence of the field's dynamic nature and uncertainty.
The data analytics process comprises many tasks,20 which are usually performed by an expert. The extract, transform, and load (ETL) process starts with collecting data from many sources and combining them to transform the different types of data into a common format. The next stage is cleansing the data. In this process, duplicate and irrelevant records are deleted, missing values are filled, and the data is checked for errors. When the data is ready for analysis, it is uploaded into the analytics system, where it can serve as the basis for a model that can be trained to forecast future information. The data can also be presented in data visualization systems such as dashboards, graphs, and charts.
Professional competencies. What is the difference between a skill and a competency? A skill is a specific, learnable ability that one needs to perform a given job well.27 A competency is more than just knowledge and skills. It involves the ability to meet complex demands by drawing on and mobilizing psychosocial resources, including skills and attitudes, in a particular context.27 In this research, we used the term "competencies" rather than "skills" to reflect our view that skills and knowledge are intertwined, as suggested by Pellegrino and Hilton,28 based on OECD terminology.27
Identifying, evaluating, and teaching data analytics competencies is an important goal for academic institutions and organizations in every sector.
Professional competencies are skills, knowledge, and attributes that refer to the ability to perform effectively within a professional role. They are usually the competencies one must present during a job interview. In today's economy, however, workers must be more collaborative and creative in their problem-solving techniques. While "hard" technical skills associated with programming—for instance, programming languages and databases—remain a prerequisite for new hires, the industry also wants workers who can demonstrate a range of so-called "soft" skills, as well as the resiliency and flexibility to work on a range of tasks. Thus, the existing STEM education gap, which is relevant to the data analytics profession, represents a deficit not only in the hard cognitive skills associated with science and engineering, but also in the soft interpersonal and intrapersonal skills linked with effective communication, collaboration, and adaptability.4
Existing competency frameworks. A competency framework consists of a set of specific competencies, bound together in an integrated approach.27 A competency model is a descriptive tool that identifies the skills, knowledge, personal characteristics, and behaviors needed to effectively perform a role in the organization.21 Since there is no specific data analytics competency framework, policymakers and educators often refer to other sets of skills such as "21st-century skills," "deeper-learning skills," or "higher-order-thinking skills." Competency frameworks are used for hiring new employees, training them, evaluating their performance, promoting them, and developing their careers, as well as to support organizational change.5
Data analytics competencies are an individual's personal characteristics that may influence how that person approaches data analytics tasks and acquires data-relevant knowledge and skills. Though several frameworks are called on to describe the competencies and skills that children and adults require to meet the world's current and future complex challenges, to the best of our knowledge, no specific data analytics competency framework exists. In this study, we take one more step toward closing this gap by compiling an expert consensus around the required competencies for data analytics practitioners. The study is based on a profession-related framework,34 education, and other life context-related frameworks.10,28 By understanding the ways that organizations use competency frameworks, higher education institutions can develop curriculums to enhance student development.21
AIS IS 2010 Curriculum Guidelines.34 Information systems (IS) are complex systems requiring both technical and organizational expertise for design, development, and management. The availability of curriculum models enables local academic units to maintain academic programs that are consistent with regional, national, or global employment needs and with the common body of knowledge of the IS field. The curriculum is designed to educate and prepare graduates to enter the workforce by equipping them with the knowledge and skills specified in three categories: IS-specific knowledge and skills, foundational knowledge and skills, and domain fundamentals.
The "IS 2010" report, a collaborative effort between the Association for Computing Machinery (ACM) and the Association for Information Systems (AIS), is the latest output from model curriculum work for information systems that began in the early 1970s. It is grounded in the expected requirements of the industry, represents the views of organizations employing graduates, and is supported by other IS-related organizations. The report identifies prerequisite skills needed by all students in basic personal-productivity software (word processing, email, Web browsing, spreadsheet modeling, etc.) and lists skills that IS professionals must possess (critical thinking, collaboration, effective communication, persistence, flexibility, curiosity, creativity, etc.).
National Research Council (NRC) report—Education for life and work: Developing transferable knowledge and skills in the 21st century.28 The National Research Council (NRC) is the principal operating agency for both the National Academy of Sciences and the National Academy of Engineering, providing services to the government, the public, and the scientific and engineering communities. The council connects the broad science and technology community with the academies' purposes of furthering knowledge.
In its report, the committee addressed questions about the terms "21st-century skills" and "deeper learning," as well as about their educational and social implications. The committee examined the evidence for the importance of various types of competencies for success in education, work, health, and other life contexts, and drew on a large research base in cognitive, developmental, educational, organizational, and social psychology and economics.
To organize the various terms for 21st-century skills and to provide a starting point for further research as to their meaning and value, the committee identified three broad domains of competence—cognitive, interpersonal, and intrapersonal—which represent distinct facets of human thinking. The cognitive domain involves reasoning and memory, and it includes competencies such as analysis, problem solving, scientific literacy, and creativity. The intrapersonal domain deals with the capacity to manage one's behavior and emotions to achieve one's goals. It includes competencies such as perseverance, adaptability, flexibility, self-direction, and the ability to cope with uncertainty. The interpersonal domain includes competencies, such as communication and collaboration,28 which are used to express information to others, to interpret others' messages (both verbal and nonverbal), and to respond appropriately. It involves expressing ideas and interpreting and responding to messages from others.
Innovating Pedagogy report.10 This report is the sixth in a series of annual reports on innovations in teaching, learning, and assessment. While it is not the most up-to-date report, it is the most relevant for the current research, since it introduces 10 pedagogies that either already influence educational practice or offer opportunities for the future. The report is the result of a collaboration between researchers at the Institute of Educational Technology in The Open University, U.K., and the Learning In a NetworKed Society (LINKS) Israeli Center of Research Excellence (I-CORE).
The 2017 report lists crucial skills that are required to become proficient learners and citizens: problem solving, evidence evaluation, and making sense of complex information from various sources. The report also highlights STEM topics that can develop these skills and addresses current demands for STEM-skilled employees across job sectors. Therefore, one of the recommendations for today's students, who need to be prepared for a data-driven society, is to learn to work and think with data.
One of the ways to close knowledge gaps is to ask experts for their opinions, predictions, and estimations about the future impact of current technological changes. The Delphi technique is a known structured process that uses a series of questionnaire-based iterations to gather information from a chosen panel of respondents within their domain of expertise until group consensus is achieved.8 This technique applies to goal setting, policy investigation, and predicting future events.16 Thus, the Delphi technique is used for acquiring knowledge about new and emerging fields in education,6 work,18 and health14 information systems,35 as well as other life contexts—for example, Baumer et al.3 In the current research, this technique is used to identify which competencies are essential to meeting future challenges of the big data era—that is, data analytics competencies.
Mapping the competencies. The first stage of the Delphi study was a literature review of contemporary skillsets. We mapped the competencies found in the described literature and combined the lists (see a schematic, high-level view of the Delphi study in Figure 1). Next, we integrated this list with a list of competencies appearing in recruitment messages for data analytics-related jobs posted on the career-oriented LinkedIn website from January 2018 through June 2018. The recruitment messages we analyzed contained the phrases such as "data analytics," "data analyst," "data scientist," "data engineer," and "data mining."
The analysis was performed iteratively by date until no new competencies were found. We then manually classified the resulting list of competencies into five categories: cognitive competencies, intrapersonal competencies, interpersonal competencies, education, and technologies. We compared this new list to the original competencies list that was prepared based on the described literature, and added any new competencies found in the LinkedIn recruitment messages to the original list. The consolidated list includes 65 competencies (See Appendix at https://dl.acm.org/doi/10.1145/3467018).
Developing a survey of competencies. The second stage of the study was the development of a survey, which included the consolidated list and our three competency categories: cognitive, intrapersonal, and interpersonal. Each competency in the survey was accompanied by a definition based on several resources: The Oxford English Dictionary, the Merriam-Webster dictionary, and Wikipedia. These definitions are also presented in the online Appendix. The survey asked participants to classify the importance of each competency for working in data analytics, with the following possible responses: "Essential competency," "Important competency," "Less important competency," "Not an important competency," "I do not know the importance," and "I do not understand this item." The items were followed by open text questions, in which the experts were able to elaborate on their responses, as well as add their opinions and suggestions.
Looking for consensus. Following the survey development, 200 experts from industry, government and non-government organizations, and academia from around the world were invited to participate in the study. They were chosen according to predetermined criteria (see Table 1) to ensure their expertise in data analytics. The criteria included the type of employer they work for, their field of work, the international nature of their work, gender, education level, professional experience, domain expertise, and their areas of data analytics expertise, according to the CRISP-DM model phases.31 The experts were given one six-week interval and then two three-week intervals to complete each survey iteration. Three weeks separated each of the survey time frames, allowing the research team to analyze results and create the next survey. Typical in this type of study, as time progressed, participation requirements increased and so did attrition rates. Sixty-five experts accepted the invitation and responded to the first survey, and 35 ultimately completed the third survey.
After analyzing replies to the first survey, we used the interquartile range (IQR) method to set the consensus level according to the relatively strict level of 75% of replies.16,19 Since participant feedback on interim findings is one of the most important characteristics of the Delphi technique,35 the second survey's main goal was to reflect the opinions identified in the first survey and allow the experts to respond to these initial findings. Accordingly, the second survey was developed based on the consensus found in the first survey. It included two parts. In the first, we presented two updated definitions of competencies that were not clear in the first survey, as well as a list of 32 competencies that didn't reach the consensus level. At least 15% of the participants in the first survey were unclear about the definitions of "Executive function," "Self-regulation type 1," and "Self-regulation type 2." After consideration, we replaced these three competencies with one, "Self-regulation," and added a new competency: "Perceived self-efficacy." We asked the experts to reconsider the importance of these two competencies in addition to the 32 competencies that did not achieve consensus. Participants were asked to classify the importance of the competencies for working in data analytics, with the same possible responses from the first survey.
In the second part of this survey, we presented the experts with the 30 competencies that achieved consensus level, defined as more than 75% of the experts agreeing that a particular competency is important for data analytics, or more than 75% agreed that it is not. We then offered the experts a chance to share their thoughts and suggestions on this list. Forty-three experts responded to the second survey, with many of them expressing their satisfaction with the results of the first iteration and the relevance of the findings (see Table 2). With the help of the experts' comments, we were able to ensure they followed instructions and responded to the questionnaire in the context of data analytics. Following the second survey, the consensus calculation was repeated. Two additional competencies reached a consensus, leading to a minor adjustment of the findings and the creation of a third survey, which described the final list of competencies that reached consensus and again asked the experts for their opinion about the list.
The results of the first and the second surveys revealed 34 competencies that achieved a consensus of at least 75%. In the first iteration, experts agreed that 28 competencies are important (see Table 3) and four are not (see Table 4). The second survey reflected the opinions revealed in the first survey. It described the competencies that reached a consensus and asked the experts to reconsider the list of competencies that did not reach a consensus. Two additional competencies reached a consensus in the second survey on top of the first one (see Table 3: Oral communication, 77%; Proactive, 81%).
While the list of non-important competencies for data analytics (see Table 4) only contains four out of 18 (22%) intrapersonal competencies, the list of important competencies for data analytics (see Table 3) includes 10 out of 19 (53%) cognitive competencies, 12 out of 18 (67%) intrapersonal competencies, and eight out of 18 (44%) interpersonal competencies. The experts chose mostly cognitive competencies as important for data analytics.
The research team expected the list of competencies that did reach consensus to include all competencies that are part of more than one framework on which the study was based. Surprisingly, the list did not include the "empathy" or "negotiation" competencies, which are part of all three frameworks. Neither were mentioned on LinkedIn, which might explain their absence from the most important competencies list. Nor does the list include many competencies that exist in two frameworks, such as "creativity" and "flexibility." Not only do these competencies exist in two frameworks, but they were also listed on LinkedIn. This gap between experts' expectations and actual survey results emphasizes the diversity of opinions on this topic and the need for a specific data-analytics competency framework to ensure the focus is right.
One of the most surprising findings of this study is that data analytics experts did not choose "ethics" as one of the most important competencies. Nevertheless, the results did include the "integrity" competency. Ethical issues are constantly raised in new industrial, educational, and governmental data science and artificial intelligence (AI) projects,25 and though only integrity was included in the final list, it makes sense to emphasize the importance of the possibly more comprehensive competency of ethics. Researchers and organizations are facing new challenges that often lie outside their training and comfort zones.37 In addition, our results reflect the consequences of existing and past academic curricula, where the ethics course is not as important as, for instance, programming.7 Finally, they also reveal the industry's attitude toward ethics questions, as reflected in events such as Google's firing of Timnit Gebru, a former Google ethics engineer who co-authored a paper raising ethical and privacy concerns about the company.17
To explain these findings and to explore the reason why some competencies were absent from the final list, the research team analyzed the results of the three surveys according to the participants' affiliation. Affiliation was considered based on a variety of characteristics: national affiliation (place of work and research), field of work, gender, expertise level, years of expertise, managerial experience, domain of expertise, and characterization of data analytics work (CRISP-DM model phase) (Table 1). Differences between the groups were analyzed using cluster analysis and factor analysis methods. A Mann-Whitney U test was run—without correction for familywise error rate due to multiple comparisons—to determine if there were differences in the competency importance scores between the groups.
After correction for familywise error rate due to multiple comparisons, the analysis did not reveal any statistically significant differences, probably due to the large number of comparisons and the relatively small number of participants. Nevertheless, the differences that were detected (presented in Table 5) are interesting and suggest directions for future quantitative research. These results indicated that experts working in academia possibly find competencies such as "critical thinking," "interpretation," "social responsibility," and "ethics" more important than experts working in the industry. Respectively, experts working in the industry possibly find competencies such as "career orientation" and "handling multiple tasks" more important than experts working in academia. When compared to male experts, female experts possibly find the competencies "information literacy" and "being a proficient learner" to be more important. Experts with international experience possibly find that "scientific literacy" is more important compared to experts without international experience, while experts without managerial experience possibly find that "grit" is more important compared to experts with managerial experience.
To summarize, for most of the competencies, distributions of the competency importance scores were similar. Hence, for most of the competencies, there were no significant differences between the different groups.
The starting point of this research was several frameworks which list important general "21st-century competencies." The output of this research is a list of the most important competencies to succeed in data analytics assignments, based on a carefully confirmed consensus between data analytics experts. This consensus was strong, and no statistically significant differences were identified between the classifications of different groups of experts.
Interestingly, there was no consensus among the experts that "ethics" is one of the most important competencies for success in data analytics assignments. There was a preliminary indication that opinions on this matter differ in academia versus industry, suggesting further exploration of this issue. Scholars, funders, and regulators should seek opportunities for collaborative, integrative, and innovative approaches to ensure that big data analytics in science and industry are responsive to foundational ethical concerns.25 We also propose taking a closer look at competencies that were represented in more than one framework but still did not reach a consensus between the experts.
As concepts such as data science, AI, machine learning, deep learning, and distributed and quantum computing gradually enter the mainstream, knowledge workers who originally focused mainly on technology and programing will require a deeper set of competencies and a deeper understanding of the challenges associated with these concepts.33 Identifying the essential data analytics competencies is an important step towards overcoming this challenge. Yet, as expressed by expert #12 (Table 2), "…this is a good wish list," and there is still extensive work to be done to develop a workforce that possesses these competencies since, as suggested by expert #42 (Table 2): "…the problem with this list is that many of the competencies are not very trainable. Many are shaped by genetics and early life experience. By relying on these competencies, you are selecting for upper middle-class, well-socialized people who, by definition, are part of dominant culture that has defined the competencies," and expanding the workforce beyond this group is a key challenge for the educational system, academia, and industry.
This question of trainability should be the focus of future work, but assuming that many of these skills are trainable, we urge policymakers, academic institutions, and educators to establish programs and reassess existing curricula and materials to support the acquisition of these essential competencies by students and employees. Training programs should facilitate the systemic development, implementation, and evaluation of cognitive, intrapersonal, and interpersonal competencies. Focusing on all three categories of essential competencies will help institutions better prepare students and employees for existing and future roles as citizens, employees, managers, and educators.
This work was supported by the Open Media and Information Lab at The Open University of Israel (Grant Number 20184), and by two scholarships: one by the Ministry of Science & Technology of Israel (collaboration between academy and industry scholarship, 2020), and one by the office of The President of Israel (Scientific excellence and innovation scholarship, 2020).
1. August LinkedIn Workforce Report: Data science skills are in high demand across industries. LinkedIn (2018), https://news.linkedin.com/2018/8/linkedin-workforce-report-august-2018.
2. Basu, N. Analytics accelerates into the mainstream, Forbes Insights (2017), https://i.forbesimg.com/forbesinsights/d&b_enterprise_analytics/Analytics_Accelerates_Into_Mainstream.pdf.
3. Baumer, E.P., Xu, X., Chu, C., Guha, S., and Gay, G.K. When subjects interpret the data: Social media non-use as a case for adapting the Delphi Method to CSCW. In Proceedings of the 2017 ACM Conf. on Computer Supported Cooperative Work and Social Computing, 1527–1543.
6. Cateté, V. and Barnes, T. Application of the Delphi method in computer science principles rubric creation. In Proceedings of the 2017 ACM Conf. on Innovation and Technology in Computer Science Education, 164–169.
10. Ferguson, R., Barzilai, S., Ben-Zvi, D., Chinn, C.A., Herodotou, C., Hod, Y., Kali, Y., Kukulska-Hulme, A., Kupermintz, H., McAndrews, P., Rienties, B., Sagy, O., Scanlon, E., Sharples, M., Weller, M., and Whitelock, D. Innovating pedagogy. The Open University (2017), https://www.learntechlib.org/p/182004
11. Flowers, A. Data scientist: A hot job that pays well. Indeed Hiring Lab (Jan. 17, 2019), https://www.hiringlab.org/2019/01/17/data-scientist-job-outlook.
17. Jo, E.S. and Gebru, T. Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conf. on Fairness, Accountability, and Transparency, 306–316.
18. Jones, W., Capra, R., Diekema, A., Teevan, J., Pérez-Quiñones, M., Dinneen, J.D., and Hemminger, B. 'For telling' the present: Using the Delphi method to understand personal information management practices. In Proceedings of the 33rd Annual Conf. on Human Factors in Computing Systems (2015), 3513–3522.
20. Key findings from Gartner Marketing Analytics Survey 2018. Gartner (May 16, 2018), https://www.gartner.com/en/marketing/insights/articles/key-findings-from-gartner-marketing-analytics-survey-2018.
22. Marr, B. The 6 top data jobs in 2018. Forbes Magazine (2018), https://www.forbes.com/sites/bernardmarr/2018/05/09/the-6-top-data-jobs-in-2018/#2ce81210430d.
25. Metcalf, J., Keller, E.F., and boyd, d. Perspectives on big data, ethics, and society. Council for Big Data, Ethics, and Society (2020), https://bdes.datasociety.net/council-output/perspectives-on-big-data-ethics-and-society/
27. The definition and selection of key competencies: Executive summary. Organization for Economic Co-operation and Development; http://www.oecd.org/pisa/35070367.pdf.
28. Pellegrino, J.W. and Hilton, M.L. Education for life and work: Developing transferable knowledge and skills in the 21st century. National Research Council Division on Behavioral and Social Sciences and Education. Washington, D.C. (2012).
29. Radovilsky, Z., Hegde, V., Acharya, A., and Uma, U. Skills requirements of business data analytics and data science jobs: A comparative analysis. J. of Supply Chain and Operations Management 16, 1 (2018), 82.
32. The O*NET-SOC taxonomy. O*NET Resource Center (2019), https://www.onetcenter.org/research.html?c=Taxonomy
34. Topi, H., Valacich, J.S., Wright, R.T., Kaiser, K., Nunamaker Jr., J.F., Sipior, J.C., and de Vreede, G.J. IS 2010: Curriculum guidelines for undergraduate degree programs in information systems. Communications of the Association for Information Systems 26, 1 (2010), 18. https://eduglopedia.org/ais-global-is-education-report-2018-syllabus.
37. Zook, M., Barocas, S., Boyd, D., Crawford, K., Keller, E., Gangadharan, S.P., Goodman, A., Hollander, R., Koenig, B.A., Metcalf, J., Narayanan, A., Nelson, A., and Pasquale, F. Ten simple rules for responsible big data research. PLoS Computational Biology 13, 3 (2017).
More online: An online-only appendix for this article can be found at https://dl.acm.org/doi/10.1145/3467018.
Copyright held by author(s)/owner(s). Publication rights licensed to ACM.
Request permission to publish from email@example.com
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.
No entries found