Latin America Regional Special Section
Education

An Open Data Platform to Advance Gender Equality in STEM in Latin America

Focusing an open data platform on women in STEM gives researchers, policymakers, and decision makers access to reliable information.

Posted
woman examines data charts on a giant screen, illustration

Expanding the involvement of women in Science, Technology, Engineering, and Mathematics (STEM) across Latin America is crucial for economic advancement, social equity, and global competitiveness; however, these efforts have proven to be challenging. Women in the region are underrepresented in STEM10 and even more so in leadership positions.17,18 The limited availability of current information and the difficulties associated with obtaining reliable data to mitigate gender disparities create difficulties in implementing policies to reduce the gender gap in STEM. Researchers, organizations, and policymakers working to reduce the gender gap need access to dependable data to understand the root causes of gender disparities, promote evidence-based interventions, and increase accountability and transparency.

In the quest for solutions to these challenges, an international research network between Bolivia, Brazil, and Peru, “Equality in Leadership for Latin America STEM” (ELLAS), emerged in 2022.6 This network, formed by eight Latin American universities and one from the U.S., runs the research project entitled “Latin American Open Data for Gender Equality Policies Focusing on Leadership in STEM”, funded by the International Development Research Centre (Project ID #109798).a

The project’s objective is to generate and promote the use of a cross-country comparable open data platform related to gender disparity within STEM in involved countries,13 with a focus on leadership.14 With this purpose, it is essential to define an architecture that can deal with the complete process of data curation.

In this article, we present an innovative architecture that allows for the curation of different data sources, from raw data to data consumption of individual users such as researchers, policymakers, and decision makers working on STEM and gender issues. This architecture alleviates the challenge for users in locating and accessing trustworthy information concerning gender policies, initiatives, and contextual factors, consolidating them into a single source. This contrasts with the scattered nature of such information across various formats, vocabularies, and sources.

The Open Data ELLAS Platform Architecture is composed of three layers, as presented in the accompanying figure. The data layer (from the bottom up) organizes two different types of data sources: “primary data,’’ which comprises mostly unstructured data in PDF formats (that is, academic papers), data from social media, and data collected via a survey—for which data fields have been identified about contextual factors, initiatives, and policies related to gender representation and leadership; and “secondary data,’’ which comprises semi-structured data about women in STEM in Latin America from various websites of national and international organizations.3,12,15,16 This layer relies on the collaboration of multidisciplinary teams to curate the data, ensuring its readiness for integration into the subsequent layer.

Figure.  Open Data Ellas platform architecture.

The processing layer involves data collection of structured comma separated values (CSV) files for the process of ontology modeling that will represent the knowledge around policies, factors, and initiatives in three languages (Portuguese, English, and Spanish). The tool Protégé is used to model the ontology, which is created in Web Ontology Language (OWL). The next process is semantic mapping that materializes the knowledge graph7 where primary and secondary data structured in CSV files are instantiated into the OWL ontology and become resource description framework (RDF) data through mapping technologies like the Ontotext Refine tool. This process generates a mapping file in JavaScript Object Notation (JSON) format that can be reused to update data as new data is generated. These three processes form one complex pipeline orchestrated and integrated by Pentaho and Python technologies. This layer depends on the work of platform developers like app and ontology developers. The processing layer also includes the knowledge graph integration that involves triplification, where specific knowledge graphs from different data sources come together and are stored in GraphDB TripleStore.

Finally, the application layer allows users to search, understand, and use data. This layer mediates the access to data through an interface focused on end-users with no technical knowledge, but with interests in gender equality in STEM. Technical users also can access the knowledge graph in GraphDB to query the data using an application program interface (API) like SPARQL or with a non-specific language. The development of this layer follows human-centered design approaches, such as value-sensitive design8 and feminist theories.1 All processes in ELLAS platform utilize cloud services.

We actively engage stakeholders such as policymakers and researchers to identify requirements for our platform and participate in potential interaction scenarios via quantitative and qualitative user studies.4

Data Layer Curation

In order to have the right amount of data integrated in the processing layer, we defined a rigorous and replicable methodology for data curation which includes identifying, collecting, and organizing primary and secondary data.2 Here, we present the resulting instantiation of the data layer.

As shown in the accompanying table, for each kind of data, data sources were defined, as well as the appropriate collection techniques. Each collection of data was analyzed to select reliable and relevant data for our context. In addition, the table shows the number of instances in each data source.

All the selected data about policies,11 initiatives,9 and contextual factors5 was transformed into a knowledge graph with more than 295.000 triples by the end of 2023.

Table. 
Data Layer Curation Results
Kind of dataData sourceCollection TechniquesAnalyzed data
Primary DataSurvey DataSurvey Design10.000+ responses
Academic PapersSystematic Literature Review352 about Latin American policies, 231 about international policies, 259 about contextual factors, 775 about initiatives, 74 about women leadership in STEM
Social MediaSystematic Gray Literature Review300+ profiles
Gray literature (Governmental websites, official reports, and more)Systematic Gray Literature Review26
Secondary DataOpen Data websitesWeb scraping8

For access to the ELLAS platform and to learn more about the project, visit the ELLAS website.6

Final Remarks

In this article, we described the three-layer architecture of the open data platform and the resulting instantiation of the data layer. The establishment of an open-data platform focused on women in STEM that has been curated from different data sources allows users like researchers, policymakers, and decision makers access to reliable information. Once the platform is finalized and published on the ELLAS website, a significant challenge lies in effectively engaging stakeholders to utilize it. While scientific contributions from the project have been disseminated in more than 30 academic papers and conference presentations,6 this outreach is insufficient. Hence, we have initiated efforts to secure public endorsements from interested groups such as universities and international organizations. This strategy aims to enhance awareness of the platform and encourage its use. Ultimately, the use of the platform has the potential to promote informed decision-making, transparency, and active public engagement for the development of gender equality policies in leadership in STEM. While this project initiative began with three countries in Latin America, our aim is to expand to other countries in the region.

    • 1. Bardzell, S. and Bardzell, J. Towards a feminist HCI methodology: Social science, feminism, and HCI. In Proceedings of the SIGCHI Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 675684; 10.1145/1978942.1979041
    • 2. Berardi, R. et al.  ELLAS: Uma plataforma de dados abertos com foco em lideranças femininas em STEM no contexto da América Latina. Anais do XVII Women in Information Technology. Sociedade Brasileira de Computação, 2023, 124135; https://sol.sbc.org.br/index.php/wit/article/view/25016
    • 3. CEUB. Comité Ejecutivo de la Universidad Boliviana, 2023; https://ceub.edu.bo/.
    • 4. Creswell, J. and Creswell, J.D.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. Sage Publications, 2022; https://us.sagepub.com/en-us/nam/research-design/book270550.
    • 5. Drummond, B. et al. Mapping Contextual Aspects that Influences Women in Computing in Latin America. Interfases 018, (2023), 1930; 10.26439/interfases2023.n018.6610.
    • 6. ELLAS NETWORK. Equality in Leadership for Latin America STEM, 2023; https://ellas.ufmt.br/
    • 7. Fensel, D. et al.  Introduction: What is a knowledge graph? Knowledge Graphs: Methodology, Tools and Selected Use Cases, 2020, 110.
    • 8. Friedman, B., Kahn, P., Borning, A., and Huldtgren, A. Value sensitive design and information systems. Early engagement and new technologies: Opening up the laboratory. Philosophy of Engineering and Technology 16, 2013. Doorn, N., Schuurbiers, D., van de Poel, I, and Gorman, M. (Eds.Springer, Dordrecht; 10.1007/978-94-007-7844-3_4
    • 9. Frigo, L.B. et al.  Mapping women STEM initiatives in Latin American countries: Bolivia, Brazil, and Peru. Information Technology and Systems. Rocha, A., Ferrás, C., Hochstetter Diez, J., and Diéguez Rebolledo, M. (Eds.)  Springer Nature Switzerland, Cham, 2024, 401409; 10.1007/978-3-031-54256-5_38
    • 10. Guzman, I.R. et al.  Gender gap in IT in Latin America. AMCIS 2020 Proceedings; https://aisel.aisnet.org/amcis2020/panels/panels/4
    • 11. Guzman, I.R. et al.  Gender equality policies in STEM in Latin America—A systematic literature review. Information Technology and Systems. Rocha, A.,  Ferrás, C., Hochstetter Diez, J., and Diéguez Rebolledo, M. (Eds.) Springer Nature Switzerland, Cham, 2024, 410419; https://link.springer.com/chapter/10.1007/978-3-031-54256-5_39
    • 12. INEP. Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira, 2023; www.inep.gov.br
    • 13. Keserű, J. and Kin-Sing Chan, J. The social impact of open data. In Proceedings of the 3rd Intern. Open Data Conf., (Ottawa, Canada, May 28–29, 2015); https://www.researchgate.net/publication/298646716_The_Social_Impact_of_Open_Data
    • 14. Maciel, C. et al.  Open data platform to promote gender equality policies in STEM. In Proceedings of the Western Decision Sciences Institute. (Portland, OR, USA, Apr. 2023) ; https://wdsinet.org/Annual_Meetings/2023_Proceedings/papers/198..pdf
    • 15. SIES. Sistema de Información de Educación Superior, 2023; https://www.gob.pe/minedu
    • 16. UNESCO. Core Data Portal, 2023; https://core.unesco.org/en/home
    • 17. Wang, M.-T. and Degol, J.L. Gender gap in science, technology, engineering, and mathematics (STEM): Current knowledge, implications for practice, policy, and future directions. Educ Psychol Rev 29, (2017), 119140; 10.1007/s10648-015-9355-x
    • 18. World Economic Forum. Global Gender Gap Report 2023; https://www.weforum.org/publications/global-gender-gap-report-2023.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More