Validity and Reliability in Data Science: An Interdisciplinary Perspective

In the context of research, the terms validity and reliability refer to the level of accuracy and truthfulness of the data collection tools, the data analysis, and the findings (Brink, 1993). The main concern of validity is whether the research tools indeed measure what they are intended to measure, and whether the data analysis results and findings represent the real world from which data were collected. The main concern of reliability is whether the research tool measurements, the data analysis results, and the research findings are persistent.

Validity, reliability, and data science

The interdisciplinarity of data science is one of the characteristics of data science whose implementation for data science education we study. Specifically, we examine, from the educational perspective, the essence of the components of data science (i.e., statistics, computer science, and the application domain), as well as their interrelations. As part of this examination, we realized that each component of data science interprets the terms validity and reliability differently, as presented in Table 1 and as explained below.

Table 1: Validity and reliability in different disciplines and research paradigms

Discipline	Validity	Reliability
Statistics	Bias	Variance
Computer science (Machine learning)	Training error	Test error
Qualitative research	Internal validity	External validity

Statistics: In statistical modeling, the terms bias and variance measure the accuracy of an estimator. Bias refers to the distance of the expected value of the estimator from the real value of the estimated parameter, and so it may be seen as analogous to validity. Variance measures the distribution of the values of the estimator around its expected value, and so variance can be seen as analogous to reliability.

Computer science: In machine learning, which was developed mainly in the context of computer science, the terms training error and test error refer to the expected prediction error of the model constructed by a machine-learning algorithm. The training error is the prediction error on samples that the machine-learning algorithm has trained on, and it reflects the accuracy of the model's representation of the training data. The training error can, therefore, be viewed as the validity of the learned model since it was trained on labeled data (that is, known data). The test error refers to the accuracy of the algorithm's predictions on data samples it was not trained on. Thus, test error indicates the research reliability.

Application domain: Data science is relevant for a variety of application domains and research paradigms, including qualitative research, in the context of which we examine the interpretation of validity and reliability. In such research, validity is based on the diversity of data collection tools and on the number and diversity of the research populations and research fields. Research reliability is based on the number of research cycles carried out during the research period to check the persistence of what is being measured and to fine-tune it if needed. In some cases, findings derived from qualitative data analysis are followed by a quantitative examination of the results. Although different terms are sometimes used to interpret the scientific merit of qualitative research (e.g., credibility, trustworthiness, truth, value, applicability, consistency, and confirmability (Brink, 1993)), we use the more common terms internal validity and external validity (Denzin, 2017) to determine research quality (both quantitative and qualitative). Specifically, in the context of qualitative research, internal validity refers to the extent to which research findings represent reality, and so this measure is similar to validity. External validity refers to the degree to which the representation of reality is applicable across groups, so it is similar to reliability.

Validity and reliability from the data science education perspective

This blog post further highlights the interdisciplinarity of data science and the diversity of approaches expressed in its creation as an interdisciplinary field. We believe that the interdisciplinarity of data science should not only be highlighted as much as possible in any data science education framework, but that such an approach, which increases awareness to the interdisciplinarity of data science, has the potential to elevate learners' attention to ethical considerations while they perform different activities related to data science development and usages. In more advance teaching frameworks of data science education, the interdisciplinary perspective on validity and reliability presented in this blog can be further discussed to increase learners' awareness to the wide implications of the interdisciplinary facet of data science.

References

Brink, H. I. (1993). Validity and reliability in qualitative research. Curationis, 16(2), 35–38.

Denzin, N. K. (2017). The research act: A theoretical introduction to sociological methods. Routledge.

Koby Mike is a Ph.D. student at the Technion's Department of Education in Science and Technology; his research focuses on data science education. Orit Hazzan is a professor in the Technion's Department of Education in Science and Technology; her research focuses on computer science, software engineering, and data science education.