Despite the introduction of UML 2.0, UML 1.X remains the workhorse of many object-oriented development efforts. UML 1.X consists of nine distinct diagramming techniques that support OO systems development; Use Case, Class, Activity, Statechart, Collaboration, Sequence, Object, Component, and Deployment Diagrams. Each of these diagrams necessarily and naturally possesses a considerably large number of constructs that make each diagram what it is, and differentiates and/or connects it from/to the other diagrams. Despite the standardization of UML by the Object Management Group, researchers and practitioners have often criticized UML’s complexity and the ambiguity of its constructs [10]. A set of complexity metrics developed by Rossi and Brinkkemper [8] was used in [11] to analyze the nine diagramming techniques in UML and compared them to other modeling methods.
In addition, the recent and ongoing proposals to revise and enhance the Unified Modeling Language (such as UML 2.0) can also be seen at least in part as yet another attempt to convince programmers, developers, clients, and educators to try to achieve the goal of executable modeling. Executable model capability means developers would, with the push of a button, transform models developed during the Systems Analysis and Design portion of the systems development process into working applications; this has been been a highly desirable goal for some developers for at least the last 20 years. While executable models may or may not be the real end goal of the recent revisions to UML, one has only to look at the new UML to observe one fairly evident characteristic: UML is larger, more complex than other OO modeling techniques, and we argue here, more difficult to learn and use.
The Setting
It is becoming increasingly difficult to develop useful and secure applications and systems. While UML is simply used as an example here, the ideas we propose can be applied to other modeling languages, programming languages, analysis techniques, or perhaps even other fields and disciplines. Before we can discuss levels of complexity, we must consider a few ideas that might shed some light on the core issues facing system developers (or anyone else dealing with varying levels of complexity in business).
Systems are becoming more complex, probably at least partially because of such influencing factors as required and enhanced functionality (Web interfaces), interoperability (runs on different platforms), security, as well as a variety of other reasons. Still other trends that impact the size and complexity of applications include systems such as enterprise resource planning, supply chain management, and customer relationship management. These types of systems are extremely large and complex, and require not only close internal cooperation for the implementing organizations individually, but also external cooperation and connection to their business partners up and down the supply chain, all the way from end customers to raw materials suppliers. Even such applications as operating systems have become larger and increasingly complex.
Systems development has traditionally been driven by the expediencies inherent to development in the first place, including such forces as time and money. This situation has led to a wide variety of development methods, some aimed at specific application types or sizes, and others of a more general nature. The mere existence of a large number and variety of development methods implies that one size does not fit all, or perhaps conversely, that a development method claiming to fit all or even more than a few applications will be necessarily very large and complex to be capable of doing what it says it will do.
Therefore it should not seem unreasonable to assume that systems development has become commensurately more complex simply to keep pace with the increased complexity of the applications being developed, and moreover, if the development process has not become more complex, we might want to question why. Kim, Hahn, and Hahn [5] indicated it is generally the case that as systems become more complex, so do the diagrams (models) that represent those systems in the development process.
Between 1989 and the mid-1990s, up to 50 different OO analysis and design methods appeared. While some of these methods were intended for specific and limited applications or system types, a number of them purported to be adequate for a wider variety of applications. If companies are to choose and utilize the best method for their organization in terms of systems development, they need a means to compare these methods [12]. In order to make sense of the huge number of available methods, Rossi and Brinkkemper [8] developed a set of metrics that would measure the different methods, and provide a means of comparison, at least in the area of complexity. The relative complexity of a development method assumes importance when considering usage and development costs.
Theoretical Complexity
Rossi and Brinkkemper’s [8] metrics are based on metamodeling techniques, and purport to measure the complexity of the method under analysis. According to [8], complexity is critical to measure because researchers believe complexity is closely related to how easy a specific method is to use, and also how easy the method is to learn. One of Rossi and Brinkkemper’s more crucial caveats is that the measured complexity of a given system does not solely translate into less complex methods being superior to more complex methods (that is, a “good” or “bad” method could be “good” or “bad,” regardless of how complex it is). Table 1 shows eight of the metrics and their composition [8].
Siau and Cao’s [11] research applied Rossi and Brinkkemper’s complexity metrics to UML and compared UML’s complexity with 36 modeling techniques from 14 methods, as well as each of the 14 methods in aggregate, finding that UML is from two to 11 times more complex than other modeling approaches. Table 2 shows a number of Siau and Cao’s results.
We can make (at least) two observations related to the complexity metrics developed by Rossi and Brinkkemper [8] and adopted by Siau and Cao [11]. Rossi and Brinkkemper developed, used, and presented what we define as the theoretical complexity of the modeling techniques. We further observe that theoretical complexity is the maximum value that complexity can assume using those definitions, because the metrics are related to or use all of the defined constructs of the modeling technique to which they are applied as a measure of complexity (using the Rossi and Brinkkemper measures). Theoretical complexity therefore represents the upper limit of complexity, because the metrics were formulated based on the total number of objects, relationships, and property types defined in the modeling techniques. In other words, all of the metrics are mathematically related to the total numbers of constructs used in the modeling technique.
Practical Complexity
We propose practical complexity as a subset of theoretical complexity in that as people use a modeling language, they do not always, nor likely most often use all of the available (or possible) constructs in the language. This is analogous to the idea that while the English language consists of hundreds of thousands of words, most English-speaking people only use a small fraction of these in their day-to-day discourse. Similarly, there are a huge number of functions provided in Microsoft Word but most people only use a small number of them in writing and formatting their documents. Therefore, we propose equating practical complexity with a use-based core (kernel) of the language or modeling system in use; we will return to this idea later.
Why Practical Complexity?
There are several possible reasons for the inadequacy of theoretical complexity in estimating complexity in practice. Although Siau and Cao’s [11] complexity indices indicated that class diagrams are about 2.5 times more complex than use case diagrams, the analysis was based on all objects, relationships, and property types as formulated by Rossi and Brinkkemper [8]. In practice, not all constructs in each diagram are used all the time. For example, the class diagram can contain many relationship types (association, aggregation, composition, generalization, dependency) and objects (abstract class, notes, constraints, packages, subsystems, interface), but a typical class diagram only uses a subset of these. By including all constructs whether they are used or not as a measure of complexity, theoretical complexity may not be the best measure of the complexity a user encounters in practice. Figure 1 shows the relationship between theoretical and practical complexity.
Another one of the typical reasons that complexity is a problem for people is the limited nature of short-term memory. Miller [7] argued that the primary bottleneck in human cognition is our (limited) ability to store seven-plus-or-minus-two chunks of information in short-term memory. Although short-term memory is a concern, decomposing a complex problem into sub-problems can help to alleviate the limitation [9]. For example, when a group of users was asked to understand the class diagram, which can have many object, relationship, and property types, the users would not attempt to understand every element in the class diagram simultaneously. Instead, the users would decompose the diagram into manageable sub-diagrams and understand each sub-diagram in turn. In this case, the short-term memory limitation is partially overcome [2]. Current complexity metrics do not take this into account. We argue here that the ability for us to decompose complex problems into sub-problems should be factored into complexity metrics formulation.
An additional reason for the inadequacy of theoretical complexity estimation is that there might be a need to assign weights to different constructs. For example, a construct that is more likely to result in a short-term memory problem should be assigned more weight in the complexity metrics than one that is less likely to result in a short-term memory problem. With respect to UML, for example, we would argue that objects are less likely, when compared to relationships, to result in short-term memory constraint as they are more or less “independent.” Relationships, on the other hand, must be interpreted with associated objects to make sense. Hence, one relationship will consume more short-term memory resources than an object [2].
Finally, and perhaps most importantly, we may also argue from an 80/20 perspective that practical complexity is more relevant to systems development than theoretical complexity. In essence, the 80/20 rule of thumb [6] says that 80% of common software solutions (software development projects) can usually be completely specified by using only 20% of the language constructs. If that is true, then we propose that only the most commonly used constructs constitute the majority of software development efforts, and that (approximately) 20% of the language should therefore define practical complexity. Moreover, if many of the constructs are rarely or ever used, it would not be necessary to learn the complete syntax of the language in order to develop the majority of systems.
The proliferation of different types of systems also impacts the area of complexity. For example, using UML, real-time systems might more heavily use a set of constructs that deals more with timers, clocks, and state changes, such as a those presented in Statechart diagrams in UML, and enterprise systems might depend more upon portraying more abstract and higher-level Activity diagrams, while Web-based systems might lean more toward some combination of the two. Figure 2 depicts the basic idea relating complexity to different system types.
Although formulating measures of practical complexity is difficult, and practical complexity depends on many circumstances (project domains, structured/semi-structured/unstructured), determining an estimation of practical complexity is possible and useful.
Related Research
A Delphi study conducted by the authors in 2004 investigating practical complexity provides the following details. The Delphi study assembled a panel of 29 globally diverse UML expert practitioners who were asked to identify the most important and most useful UML diagrams and within each diagram, the most important constructs. They were also asked to identify a use-based kernel of UML, which can be equated to practical complexity. The experts clearly identified a UML kernel and established a basis for the ideas proposed herein. The results for the nine UML diagrams are summarized in Table 3. Individual diagram results are also available, but are not shown here due to space limitations. The mean indicates the arithmetic average of the participant ratings on a 15 scale, the standard deviation a measure of dispersion, and the percentage Yes for Kernel a measure of the agreement or consensus level among the participants after three rounds of the Delphi.
After three rounds, the Delphi participants identified the most important diagrams as Class, Use Case, Sequence, and Statechart (see Table 3). In addition, at least 90% of the assembled experts agreed that those four diagrams should comprise a UML kernel. Applying the Rossi and Brinkkemper metrics to the kernel naturally results in a lower complexity assessment, which is not at all surprising. However, if the kernel diagrams truly represent the portion of UML that most people commonly use, then the reduction in measured complexity can be considered as one possible surrogate for practical complexity, perhaps a more realistic complexity than other metrical approaches.
Conclusion
We have argued here that theoretical complexity might not accurately predict complexity in practice. We have completed research that uses an existing metric set to estimate the complexity in practice of modeling methods. Although formulating measures of practical complexity is difficult, and practical complexity depends on many circumstances (project domains, structured/semi-structured/unstructured), determining an estimation of practical complexity is possible and useful. For example, the Function Point Analysis [1], Constructive Cost Model [3], and Activity-Based Costing model [4] are illustrations of the usefulness of estimation—even rough estimation.
From a practical perspective, it seems relatively certain that if increased complexity is a characteristic of new systems and systems development methods, then even more expertise will be required of the developers and the organizations and companies for whom the systems are being developed. However, does the increase in size and number of constructs of a modeling method really affect its complexity in practice? Do we use all the functions in Microsoft Excel for every spreadsheet analysis? Obviously we do not, but those functions are there if we need them, and as we gain proficiency with the basic functions, it usually becomes easier for us, through automaticity and task decomposition, to learn other functions (or new software releases) as we find it necessary, and work around our cognitive limitations.
In measuring the complexity of a systems development method, it is incomplete and misleading if we compute the complexity based on all possible object types and relationship types in the method. We do not use all the functions provided by SPSS for each statistical analysis. Similarly, we do not use all the constructs provided in UML for all modeling tasks. To provide a more complete picture and a more accurate account of how developers use modeling methods in practice, we must take into account the actual usage and practice of the modeling method. In other words, in addition to theoretical complexity, it should be possible to provide an estimation of the practical complexity of a modeling method based on the typical usage of the modeling method. A realistic estimation of the complexity in practice of a modeling language can provide and suggest better ways of learning and using the various and sundry development methods that are currently in use or under development.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment