Education

An Open Data Platform to Advance Gender Equality in STEM in Latin America

Focusing an open data platform on women in STEM gives researchers, policymakers, and decision makers access to reliable information.

By Cristiano Maciel, Indira R. Guzman, Rita Cristina Galarraga Berardi, Nadia Rodriguez-Rodriguez, Luciana Salgado, Luciana Bolan Frigo, Boris Branisa, and Elizabeth Jiménez

Posted Jul 16 2024

Data Layer Curation
Final Remarks
References
Footnotes

Expanding the involvement of women in Science, Technology, Engineering, and Mathematics (STEM) across Latin America is crucial for economic advancement, social equity, and global competitiveness; however, these efforts have proven to be challenging. Women in the region are underrepresented in STEM¹⁰ and even more so in leadership positions.¹⁷^,¹⁸ The limited availability of current information and the difficulties associated with obtaining reliable data to mitigate gender disparities create difficulties in implementing policies to reduce the gender gap in STEM. Researchers, organizations, and policymakers working to reduce the gender gap need access to dependable data to understand the root causes of gender disparities, promote evidence-based interventions, and increase accountability and transparency.

In the quest for solutions to these challenges, an international research network between Bolivia, Brazil, and Peru, “Equality in Leadership for Latin America STEM” (ELLAS), emerged in 2022.⁶ This network, formed by eight Latin American universities and one from the U.S., runs the research project entitled “Latin American Open Data for Gender Equality Policies Focusing on Leadership in STEM”, funded by the International Development Research Centre (Project ID #109798).^a

The project’s objective is to generate and promote the use of a cross-country comparable open data platform related to gender disparity within STEM in involved countries,¹³ with a focus on leadership.¹⁴ With this purpose, it is essential to define an architecture that can deal with the complete process of data curation.

In this article, we present an innovative architecture that allows for the curation of different data sources, from raw data to data consumption of individual users such as researchers, policymakers, and decision makers working on STEM and gender issues. This architecture alleviates the challenge for users in locating and accessing trustworthy information concerning gender policies, initiatives, and contextual factors, consolidating them into a single source. This contrasts with the scattered nature of such information across various formats, vocabularies, and sources.

The Open Data ELLAS Platform Architecture is composed of three layers, as presented in the accompanying figure. The data layer (from the bottom up) organizes two different types of data sources: “primary data,’’ which comprises mostly unstructured data in PDF formats (that is, academic papers), data from social media, and data collected via a survey—for which data fields have been identified about contextual factors, initiatives, and policies related to gender representation and leadership; and “secondary data,’’ which comprises semi-structured data about women in STEM in Latin America from various websites of national and international organizations.³^,¹²^,¹⁵^,¹⁶ This layer relies on the collaboration of multidisciplinary teams to curate the data, ensuring its readiness for integration into the subsequent layer.

The processing layer involves data collection of structured comma separated values (CSV) files for the process of ontology modeling that will represent the knowledge around policies, factors, and initiatives in three languages (Portuguese, English, and Spanish). The tool Protégé is used to model the ontology, which is created in Web Ontology Language (OWL). The next process is semantic mapping that materializes the knowledge graph⁷ where primary and secondary data structured in CSV files are instantiated into the OWL ontology and become resource description framework (RDF) data through mapping technologies like the Ontotext Refine tool. This process generates a mapping file in JavaScript Object Notation (JSON) format that can be reused to update data as new data is generated. These three processes form one complex pipeline orchestrated and integrated by Pentaho and Python technologies. This layer depends on the work of platform developers like app and ontology developers. The processing layer also includes the knowledge graph integration that involves triplification, where specific knowledge graphs from different data sources come together and are stored in GraphDB TripleStore.

Finally, the application layer allows users to search, understand, and use data. This layer mediates the access to data through an interface focused on end-users with no technical knowledge, but with interests in gender equality in STEM. Technical users also can access the knowledge graph in GraphDB to query the data using an application program interface (API) like SPARQL or with a non-specific language. The development of this layer follows human-centered design approaches, such as value-sensitive design⁸ and feminist theories.¹ All processes in ELLAS platform utilize cloud services.

We actively engage stakeholders such as policymakers and researchers to identify requirements for our platform and participate in potential interaction scenarios via quantitative and qualitative user studies.⁴

Data Layer Curation

In order to have the right amount of data integrated in the processing layer, we defined a rigorous and replicable methodology for data curation which includes identifying, collecting, and organizing primary and secondary data.² Here, we present the resulting instantiation of the data layer.

As shown in the accompanying table, for each kind of data, data sources were defined, as well as the appropriate collection techniques. Each collection of data was analyzed to select reliable and relevant data for our context. In addition, the table shows the number of instances in each data source.

All the selected data about policies,¹¹ initiatives,⁹ and contextual factors⁵ was transformed into a knowledge graph with more than 295.000 triples by the end of 2023.

Table.

Data Layer Curation Results

Kind of data	Data source	Collection Techniques	Analyzed data
Primary Data	Survey Data	Survey Design	10.000+ responses
	Academic Papers	Systematic Literature Review	352 about Latin American policies, 231 about international policies, 259 about contextual factors, 775 about initiatives, 74 about women leadership in STEM
	Social Media	Systematic Gray Literature Review	300+ profiles
	Gray literature (Governmental websites, official reports, and more)	Systematic Gray Literature Review	26
Secondary Data	Open Data websites	Web scraping	8

For access to the ELLAS platform and to learn more about the project, visit the ELLAS website.⁶

Final Remarks

In this article, we described the three-layer architecture of the open data platform and the resulting instantiation of the data layer. The establishment of an open-data platform focused on women in STEM that has been curated from different data sources allows users like researchers, policymakers, and decision makers access to reliable information. Once the platform is finalized and published on the ELLAS website, a significant challenge lies in effectively engaging stakeholders to utilize it. While scientific contributions from the project have been disseminated in more than 30 academic papers and conference presentations,⁶ this outreach is insufficient. Hence, we have initiated efforts to secure public endorsements from interested groups such as universities and international organizations. This strategy aims to enhance awareness of the platform and encourage its use. Ultimately, the use of the platform has the potential to promote informed decision-making, transparency, and active public engagement for the development of gender equality policies in leadership in STEM. While this project initiative began with three countries in Latin America, our aim is to expand to other countries in the region.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

An Open Data Platform to Advance Gender Equality in STEM in Latin America

View in the ACM Digital Library

DOI

10.1145/3653294

August 2024 Issue

Vol. 67 No. 8

Pages: 90-92

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Nov 20 2024

Security via AI

Mark Halper

Artificial Intelligence and Machine Learning

News Nov 14 2024

The AI Spy

Karen Emslie

Artificial Intelligence and Machine Learning

News Nov 13 2024

The Computation Behind This Year’s Nobel Prizes in Chemistry and Physics

Logan Kugler

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Data Layer Curation

Final Remarks

An Open Data Platform to Advance Gender Equality in STEM in Latin America

DOI

August 2024 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.