Research and Advances
Artificial Intelligence and Machine Learning What UML should be

Make Models Be Assets

Needed first is a layered UML with a small, well-defined, executable, translatable kernel that enables and supports development of an extensible tool chain.
Posted
  1. Article
  2. Author

The Unified Modeling Language needs to be executable and translatable so developers’ models are interchangeable across the entire tool chain. UML models that are freely interchangeable, executable, and translatable represent the foundation of the Object Management Group’s Model Driven Architecture (MDA), making possible a world in which software expressed as UML models, not only as code, is an asset.

In a limited sense, UML models are already executable, translatable, and interchangeable. However, though programmers add code, translate the models into code headers, and use the XML Metadata Interchange (XMI) format for interchange, models are intermediate products, drawn as sketches or blueprints maintained separately from the final product, which is code. The models themselves have no lasting value.

One reason for this sad state of affairs is that UML specifies the abstract syntax of each diagram separately. Generating separate specifications is like specifying the syntax of a Java class and the syntax of a method, without saying how (or even whether) they relate to one another nor what each means in execution. Failing to relate syntaxes is not the same as having a single underlying language specification with defined execution rules so programmers know what a UML model really means.

Any specification of UML must separate out the meaning of a model from its representation. Model elements meeting a purely graphical need should be distinct from model elements representing the modeled system. UML should specify a presentation layer, allowing developers to represent the same meaning in different ways. For example, one developer could build a statechart diagram using only transition actions, another could build one using only entry actions, but both could still interchange models and edit them in their own favorite styles; the two styles are mathematically equivalent. Each of UML’s myriad diagrams is then merely a projection of the underlying language specification. The presence of one diagrammatic element implies certain facts about other diagrams. For example, a statechart diagram for an object that sends a signal X necessarily implies the same signal X on the collaboration diagram, and vice versa.

UML is supposed to be a family of languages, but one of the proposed UML2 standards—the one from the UML2 Partners (U2P)—is presently more like a Tower of Babel. To be a family, the meaning must include layering based on a limited number of first-class concepts and composition rules, so the resulting UML is known (and shown) to be coherent and orthogonal. An explicit mechanism must merge the diverse elements automatically. It is indefensible for anyone, including the specifiers of UML, to have to build the metamodel by hand. It is just too easy for exceptions and errors to creep in, and once orthogonality is lost, more and more complexity is required to fix the inconsistencies so introduced, as we see in UML1, as well as in some of the proposals for UML2.

Developers also need first-class access to the same reliable method for merging model fragments, so patterns and idioms can be combined into complete models. For example, the concept of containment (a switch contains circuits, a bank contains branches) appears repeatedly in multiple applications. It should be possible to model this concept once and apply it in each situation in which it appears.

For UML to be a family of languages, semantic variation must be strictly controlled in the specification. Each additional layer of meaning must conform to the previous layer in a manner analogous to the Liskov-Wing Substitution Principle. New meaning can be substituted only if it is consistent with the meaning of the primitive layers. For example, the language could specify a primitive, say, Execution, executing only when triggered, when it has all its data and control inputs, and when some predicate over a set of condition variables is true. A method invocation is then specified as a subtype of Execution, in terms of a default trigger (the call), data inputs based on parameters and no control inputs or condition variables. Similarly, a transition in a state machine is specified as a subtype of Execution in terms of the triggering event, with no additional control inputs. The condition variables are derived from the state of the state machine, and the data inputs travel with the event. A "semantic variation substitution principle" is at work here; method invocation and state transitions can be substituted for the Execution specified as a primitive.

The kernel for any version of UML must be specified for an executable UML model to also be translatable to multiple targets. Multiple translations require that functional computations be defined so they do not rely on a particular decision about data structure. Independence from data structure allows the reorganization of an implementation’s data structure without having to restate the meaning of the action. Actions must be specified so they execute concurrently, except where constrained by data and control flow, easing reorganization of control structures in distributed environments. Data flow also enables flow analysis for generating test cases and checking boundary conditions and state spaces. While the OMG’s adoption (September 2001) of semantics for actions includes these critical properties, simple-minded adherence to today’s language concepts and implementations could nevertheless threaten data flow in UML2.

The executable and translatable UML kernel allows for the definition of the meaning of the developer’s abstract solution to a problem; it does not define software structure. The kernel need not specify how a signal is sent in execution, only the abstract fact that it is to be sent when the send-signal action executes—only the abstract fact that something happens first, then that something else happens later. (A UML signal is just one way of representing desired cause and effect.) Separate decisions about software structure then determine the available mechanisms (such as remote procedure call, inline execution, and function calls) and which one applies to a particular type of model element in context.

A program can then automatically map a translatable UML model onto the chosen target software platform using a model compiler. Each model compiler targets a specific software platform, translating each UML model element into its counterpart as a desired component in the target. In the parlance of the OMG Model Driven Architecture (MDA), the platform-independent model (PIM) is mapped by a model compiler into a platform-specific model (PSM). The model compiler houses the software structure decisions determining the structure of the PSM.

An appropriately specified kernel—not too small to be unsable, too large to be implemented, or too specific to be confining—will enable an increasingly powerful tool chain containing:

Model builders. Providing graphical input of models;

Model verifiers. Interpreting a model with real values so users can determine whether the behavior is correct;

Model compilers. Compiling models onto diverse platforms;

Model debuggers. Executing compiled code so developers see the code compiled from the model in action;

Model analyzers. Finding paths through the models and unreachable states; and

Model testers. Generating and running test cases for models.


UML should specify a presentation layer allowing developers to represent the same meaning in different ways.


These tools are available today, but because each one employs its own subset of UML, true interchange is impossible. Moreover, each vendor has to build an adapter to give other vendors’ tools access to its capabilities. A vendor’s effort to build an adapter means added cost and risk; more important and worse for the developer, it means a truncated tool chain.

Interchange must therefore be specified by the UML standard in terms of meaning, not diagrams. But today, interchange is specified based on the abstract syntax of diagrams. Not only is this foolish, vendors cannot claim compliance unless they "implement" all the abstract syntax. This makes no sense, except for the pedestrian tools that only draw diagrams.

A layered UML with a small well-defined, executable, and translatable nucleus enables and supports development of a tool chain so developers can build, verify, and compile models to multiple targets as technology changes and employ other yet-to-be-invented tools to ensure model correctness and completeness. None of this is possible unless UML is layered and its specification built from a small set of components automatically merged together to ensure correctness. Only then can UML models be truly executable, translatable, and interchangeable assets, fully realizing the vision of MDA.

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More