Quality attributes of software systems,13 also known as system qualities, such as performance, security,2 and scalability, continue to grow in importance in industrial practice. The evaluation of quality attributes is critical to software development since optimizing a software system’s core attributes can provide marketing advantage and set a product apart from its competitors. Many existing studies of unsuccessful development projects report that lack of quality attribute evaluation is often a contributing factor of project failure.12,15 Therefore, continuous quality attribute evaluation, throughout the development process, is needed to ensure customers’ expectations and demands6 are met.
Manual evaluation of software attributes is common in many software development companies, but it has proven to be insufficient in meeting the demands of rapid releases and high-quality expectations from customers.7 Automated practices have therefore gained widespread popularity as a solution to enhance efficiency,15 reduce costs,8 and increase accuracy15 compared to manual evaluation.
One way to automate the evaluation is using continuous integration (CI)17 environments. The CI environment provides several benefits, such as fast feedback on code quality, early detection of quality defects, and visualization of system quality trends.18 As such, these environments inherently offer organizations the opportunity to continuously monitor the quality of their software systems.19 However, an immature automation process can result in negative outcomes,19 such as cost and schedule overruns,8 slow feedback loops, and delayed releases.8
To improve the evaluation process,8 prior studies have investigated different key areas, including knowledge,20 processes,7 tools,12 and metrics.3 While leveraging these areas can have a positive impact on quality evaluation, to the best of our knowledge, there is a lack of frameworks that link CI environment knowledge, metrics, and evolution together.
In this article, we aim to fill this gap by presenting the state-of-practice of using CI environments for the evaluation of quality attributes. This is achieved through an industrial study at four partner companies. Study results show that metrics acquired from CI components have a positive effect on evaluating quality requirements. Through analyzing these results, we propose a model by providing guidelines to mature existing CI environments that organizations can use for quality improvements.
As such, we claim the following contributions of this study:
A generic model of how CI environments contribute to quality attribute evaluation.
Empirical evidence that demonstrates how CI components can be used to produce data supporting the evaluation of quality attributes with metrics.
A model, derived from the study results, which provides decision support to evolve software quality evaluation through CI environments over time.
Study Design
This study aims to gain insights into how CI environments are utilized to evaluate quality attributes in industry, specifically investigating which attributes are commonly evaluated, the contributions of CI environments, and how they can be evolved to suit different evaluation needs.
Case companies: Our case companies are Qvantel Sweden, Ericsson, Fortnox, and Company Alpha.a Ericsson is a multinational company that produces software and services for telecommunications. The CI environment steers practices for global teams and provides a shared platform with processes and practices for automated verification. Qvantel Sweden is a fast-growing company that provides cloud-native business support services. The CI environment enables faster time to market and flexibility to adapt to changing business needs in terms of quality attribute evaluation. Fortnox offers a cloud-based platform that meets the needs of small businesses and accounting agencies managing their finances efficiently. The CI environment helps developers catch security issues earlier through automated CI jobs. Company Alpha is a Web service provider. The project in this company is a legacy system that concerns businesses using insurance services through Web interfaces. The CI environment increases transparency between software development teams and covers automated performance evaluation.
Research methodology: We designed a qualitative study based on semi-structured interviews. We selected five industrial projects from case companies, and selection criteria includes business domains, number of employees, and types of CI environments, aiming to the best of our abilities to make the study as representative of as many software development companies as possible. Business domains cover telecommunications systems, financial services, websites or mobile applications, and customer business support. The projects varied in number of employees, ranging from 50 to 2,500. Five types of CI environments were adopted in these projects to support continuous quality attribute evaluation.
We adopted convenience sampling,21 selecting participants from the projects based on their availability and willingness to participate in this study. Our final sample consisted of 22 participants with varying backgrounds in software engineering roles, including managers, software architects, product owners, developers, and testers. However, the distribution across these roles depended on convenience and availability rather than controlled selection. At the time of this study, 86% of selected participants had more than five years of work experience, which could be an indicator of the participants’ ability to answer our interview questions.
**Data was collected from 22 interviews that were carried out with selected participants. A questionnaire was used as a guide for the interviews, since they had a broad scope, including questions about detailed aspects on CI components, tools, and quality attributes. The evidence was gathered primarily from three of the interview questions, which were:
How does the existing CI environment look like in your project based on your perception?
How do you use CI environments for quality attribute evaluation?
Do you see any challenges in the current processes, techniques, tools, or practices for the evaluation?
Additionally, participants provided related information as part of their responses to other questions, which were also included in the analysis.
Data analysis followed a four-step thematic analysis approach:
Step 1: We coded descriptive information, for example the interview transcripts, using thematic coding.4
Step 2: We grouped codes into different categories based on semantic equivalence or a shared contextual relationship.
Step 3: We synthesized codes into categories in higher-level themes.4
Step 4: We drew conclusions using the synthesized themes. The results were validated through member checking with participants.
How Can CI Environments Contribute to Quality Attribute Evaluation?
To adequately use any technology or practice, it is essential to have a base framework of understanding.20 Figure 1 presents such a framework, connecting the concept of practitioner knowledge to metrics and data for quality attribute evaluation.
Participants in this study highlighted that CI environments contain various data produced from different sources such as source codes, CI tools, and artifacts (see Figure 1). This data can be extracted through CI jobs and used for continuous evaluation of quality attributes. For instance, an automated vulnerability assessment can be performed by a CI server on source code changes, resulting in measurements of the number of detected vulnerabilities, which in turn supports security evaluation. A deeper understanding/knowledge of the system’s attributes supported by the CI process could be helpful for quality managers to make more informed decisions regarding release candidates. This, in turn, enhances the decision-making process from a security perspective. Additionally, leveraging the CI data to inform software release decisions has the potential to optimize the efficiency of the quality attribute evaluation process at the organizational level. Consequently, this can lead to an enhanced experience in collecting data from CI environments. Figure 1 provides a visual representation of this entire process.
Some 73% of the participants in our study reported that they added new tools to generate more data, even though much of the data could be extracted from their existing environments, albeit with additional efforts, for example, by using scripts. This highlights the importance for practitioners to be aware of their environments’ potential for data generation, otherwise they risk implementing more and more tools over time, as well as new data extraction with overlaps.
Our study also found that knowledge of the environments’ capabilities was beneficial in making decisions about improving the environment. For example, to improve the detection of security vulnerabilities, 77% participants mentioned that static code analysis can be adopted as part of their CI process.18 However, this requires knowledge of how to add such a CI component to the environment and how to effectively use the newly generated data to draw valid conclusions. Therefore, there is a need for general knowledge about CI environments, their components’ capabilities, how to instantiate them using tools, what data can be acquired from the tools, and how to use the data effectively.
In summary, 86% of the participants in our study reported that data produced by CI environments could be used to support quality attribute evaluation. To do so effectively, knowledge is required on multiple levels of abstraction including CI tools, data, artifacts, metrics, and experience.
What is Current State-of-Practice for Using CI Environments for Quality Attribute Evaluation?
We explored the practices of how organizations use CI components for quality attribute evaluation in the selected companies.
Study participants utilized a diverse set of metrics to evaluate Maintainability, Security, Performance, Reliability, Scalability, and Traceability during software development, testing, and release phases. For example, Performance was evaluated by collecting metric data, such as “response time” and “CPU/memory resource usage,” from CI components. This form of evaluation through CI-generated data supports a more automated process, which is less error-prone and time-consuming10 than the manual collection of similar metrics.
As shown in the accompanying table, we identified seven quality attributes and 10 metrics associated with data and data sources. Moreover, references to relevant literature providing definitions of the identified metrics are also provided. To ensure the validity of our results through data triangulation, we have listed the percentage of study participants who provided information for each metric.
To gain insights into the relationship between metrics and CI components, we focused on gathering base metrics, which are the generic data types generated by the CI components, and their associated data sources. The rationale for focusing on base metrics is that derived metrics can be calculated by combining the base metrics (see metric definitions in Table 1). For example, the metric “lines of codes over time” can be referred to as “code churn.”3 As shown in Figure 2, we have identified 14 base metrics.
Quality attributes | Metric(s) | Metric data | Data source – CI components | Participants |
---|---|---|---|---|
Maintainability | Percentage of test coverage1 | Lines of codes | Source code management | 50% |
Static code analysis | ||||
Lines of codes covered by tests | CI server | |||
Average cyclomatic complexity on source files11 | Cyclomatic complexity number | Static code analysis | 55% | |
The number of source files | Source code management | |||
Security | The total number of vulnerabilities2 | The number of vulnerabilities | CI server | 77% |
Vulnerability tickets | Issue tracking | |||
Average number of security defects in source codes14 | Lines of codes | Static code analysis | 59% | |
The number of defects | Static code analysis | |||
Performance | Mean Response Time (MRT):9 MRT = (A1 + A2 + …+ An/n), where Ai is the response time of a request, and n is the number of requests. | Total response time of a service API request | CI server | 82% |
The number of service requests | Test environment | |||
Average CPU/memory usage18 | CPU/memory usage Timestamp | Test environment | 73% | |
Reliability | Rate of failure for building release candidates3 | The number of failed release versions | CI server | 18% |
Total number of release versions | Artifacts management | |||
Scalability | Mean recovery time (MRET):9 MRET = (A1 + A2 + …+ An/n), where Ai is the time to recover a service failure, and n is the number of failures. | Total time of fixing failed services | CI server | 41% |
The number of Service failures | Test environment | |||
Mean recovery time | Cloud platform | |||
Stability | Average transactions per second5 | Number of transactions | Test environment | 14% |
Traceability | Requirement traceability percentage16 | The number of tickets | Issue tracking | 9% |
Total number of tickets | Source code management |
The data sources used to collect these metrics as reported by the most of participants reflect three main types: built-in interfaces, test outputs, and command-line interfaces. Built-in interfaces, in this case, refer to APIs, or other technical interfaces, in the tools used for a specific CI component, which enable the data to be extracted directly from the tool for further analysis. Test outputs refer to log files or other artifacts that generally require parsing to be further analyzed. Finally, command-line interfaces refer to text-based user interfaces that enable developers to execute automated commands and scripts from a terminal or command prompt for extracting information.
For example, one company in our study utilized nightly CI jobs to evaluate system performance for each release candidate continuously, and the job took approximately four hours. Through this job, developers were able to monitor and compare the system performance (for example, the mean response time of a service request) in code changes (for example, the number of commits) between release candidates. To extract the necessary metrics, they utilized JMeter’sb built-in interfaces to collect performance results through Rest APIs, and Gitc command-line interfaces, specifically the git log –since=<time-period> –oneline
command, to get the number of commits during a specific time. When the performance output in a CI job showed a significant drop, developers were able to either revert their code commits or perform quality analysis to determine the cause of the issue. This approach enabled development teams to identify and address quality issues in a timely and efficient manner.
After understanding what base metrics were available and what data sources can be used to acquire them, we also looked at the use of derived metrics and used a practical example, taken from one of the companies in our study, as a demonstrator, example visualized in Figure 3.
In the example, two types of CI components that produce metric data are introduced. In the first example, the SCM component, instantiated with the tool Gerrit, utilizes a metric exporter plugin (for example, JMX) to collect data (for example, lines of code and source files) and transmits the data using its built-in interface to a data analyzer tool (for example, Prometheus) which is then used to visualize the results in dashboards (for example, in Grafana). In the second example, the test environment (TE) component (for example, tests run in a virtual machine) generates test results in files and logs that are parsed through an in-house developed solution (for example, a parser) to extract data in a format that the data analyzer tool (for example, Prometheus) can read, after which its visualized in the dashboard (for example, in Grafana). These examples highlight that some CI components can take advantage of existing metric exporter tools for data collection, while others may require additional development effort, for example, of specialized data extraction scripts.
The example also provides a practical process for aggregating metric data into centralized dashboards for quality evaluation. When using centralized dashboards, organizations can gain the following benefits:
A single, coherent, overview. Quality attribute results can be monitored in graphs with trends, providing decision-makers with valuable insights for identifying potential quality-improvement activities or process changes.
Flexible monitoring dashboards. The process supported by CI components enables the creation of various types of monitoring dashboards.
Comparative results. Quality trends can be monitored to see how development/change of one attribute affects the other. For example, improving security through encryption can have a negative impact on system performance.
However, it is important to note that a centralized quality dashboard might bring some challenges that may be considered drawbacks:
Quality graphs overflow. As the number of graphs and dashboards grows, important graphs may be lost in the myriad of graphs and the beneficial overview that was provided is lost. This can result in inaccurate representations of the data and can make it difficult to identify trends or patterns in the data.
Insufficient data. If every metric is visualized, including less accurate ones, it can result in incomplete or inaccurate graphs. This can cause difficulties to draw meaningful conclusions from the data which leads to misinformed decisions.
Lack of knowledge about metrics. If the metric data is misused in the analysis— intentionally or unintentionally—or used to derive misleading metrics, it can result in misleading or inaccurate conclusions.
In summary, metrics are beneficial for organizations to access software quality requirements, potentially providing more objective decision support to drive continuous improvements. To achieve this benefit, it is recommended that practitioners define metrics and acquire a clear understanding of their metric data to mitigate the reported challenges.
How Can Practitioners Evolve CI Environments to Meet the Needs of Quality Attribute Evaluation?
The metrics and CI components used to evaluate quality attributes can vary, depending on the size and maturity7 of an organization. For example, a small startup may have different needs and resources for quality evaluation compared to a large enterprise organization. Additionally, our study results suggest that there is an evolution from less to more mature CI environments, and this evolution seems to correlate with company growth.
To assist decision-makers in this evolution, we created a maturity model based on our study results. The model was developed through the following steps:
Identification of the relevant metrics for each quality attribute: We identified a set of metrics relevant for each quality attribute based on the results of our study. For example, for security, we may use metrics like the number of vulnerabilities found and the severity of vulnerabilities.
Mapping of the metrics to the relevant CI components: When we identified the relevant metrics, we mapped them to the CI components that can produce data to support these metrics. For example, the number of vulnerabilities can be measured using security scanning tools integrated with the CI environment.
Mapping the CI components to the quality attributes: We first identified the various CI components and their associated metrics. While some metrics produced by a given component may suggest its suitability for evaluating a certain quality attribute, we acknowledge that such a correlation is not always definitive. Therefore, we supplemented our analysis by conducting interviews and making direct observations to gain a deeper understanding of how specific components were used to evaluate specific quality attributes. By triangulating our findings from both the metrics and the qualitative data, we were able to establish a more accurate and reliable mapping between CI components and particular quality attributes for evaluation.
Definition of the maturity levels: We defined what each maturity level represents after we uncovered the connection between CI components and quality attributes. For example, a basic level of CI maturity for security may mean that security scanning tools are integrated into the CI environment, but vulnerabilities are not being actively addressed. An advanced level of maturity may mean that vulnerabilities are being actively addressed, and security is being continuously improved.
Our proposed model in Figure 4 displays quality evaluation capabilities using CI components in three levels: elementary, intermediate, and advanced. The x-axis represents the increasing needs of assessing quality attributes, while the y-axis represents the evaluation levels.
As shown in the figure, an elementary maturity level can be achieved through employing three CI components to deal with source files’ maintainability (see A). When there is a demand for more evaluation capabilities, new CI components (for example, static code analysis) can be added on top of the elementary level to reach an intermediate level (see B). Finally, an advanced maturity level can measure more quality requirements by using additional CI components (see C).
When applying the model, there is no strict order for adding or using several components. This flexibility is highlighted with dash-line boxes (for example, SCA, AM, TE, and CP components) in Figure 4. The selection of CI components can be optimized based on the testing needs of the organization, and our model provides the flexibility to select them to support stepwise improvements in quality evaluation. For instance, if the reliability evaluation of a product is prioritized, the AM component can be added to upgrade the elementary maturity level of an environment to the next level. Additionally, the model also includes a normalized maturity growing path (see dashed and curved connectors in Figure 4), which was identified as common to how the studied companies have matured their CI environments.
In summary, the level of quality evaluation maturity in a CI environment is associated with both the number of components employed and the extent of quality attribute coverage. Our model offers decision-makers a roadmap to guide the evolving process of their CI environment based on specific quality requirements. It illustrates the trend of how companies, typically, mature in terms of the need to evaluate certain quality attributes and what CI components support such evaluation.
Study Validity
In this study, we acknowledge the relatively small sample size of companies compared to the broader community. As a result, the findings of this study may primarily apply to the specific contexts and company sizes represented in the sample. However, it is crucial to emphasize that the goal of this study is not to achieve extensive generalization. Instead, the focus is on generating valuable evidence that can be integrated with prior research to contribute to a deeper understanding of quality attribute evaluation within the context of CI environments.
Furthermore, given the study is based on evidence, there are several threats that may have impacted the results. For instance, the way in which the interviews were executed or that some participants may not have provided some tacit information. To mitigate this threat, data triangulation process was used.
Conclusion
Our research focused on evaluating the use of CI environments for metrics, specifically looking at how CI components can generate data to support the evaluation of specific quality attributes. Key insights identified through this study are summarized here:
Insight 1. Some 86% of participants reported that CI environments produce data that can aid companies in evaluating specific quality attributes, and the use of metrics can standardize data collection and analysis in CI environments. This highlights the prevalence of utilizing CI environment data for quality attribute evaluation in metrics.
Insight 2. Across the five industrial projects, the number of quality attributes evaluated through CI environments ranged from 2 to 5, with an average of 3.4 attributes per project. This indicates there is room to expand CI usage for evaluating additional quality attributes.
Insight 3. Almost 73% of study participants stated that centralized dashboards are valuable for visualizing CI metrics data. Reliance on dashboards also underscores their importance for enabling data-driven quality analysis.
Insight 4. On average, study participants had over seven years of experience working with CI environments. This level of expertise is essential for leveraging CI data effectively and highlights the need for knowledgeable practitioners.
Insight 5. This study proposes a maturity model that can aid organizations in assessing their current level of maturity in utilizing CI environments and identifying potential improvements through a roadmap. 60% of selected industrial projects are currently in the intermediate maturity level, which indicates there is room for improvement and standardization of practices in the industry.
In summary, study results revealed trends in how CI environments can be used for quality attribute evaluation across different industrial contexts. Additionally, as more metrics are collected through CI environments, data-driven analysis becomes increasingly more applicable for enhancing the assessment of quality attributes.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment