BLOG@CACM
Computing Applications

Coping With Linux Distro Fragmentation (visualized in One Giant Diagram)

Posted
Philip Guo profile photo

Problem: Fragmentation hurts software portability

People now use thousands of different releases of hundreds of Linux distros. Each individual distro release contains a different set of pre-installed software, libraries, and kernel configurations. A ubiquitous problem is that software created on one distro often fail to run on other distros — and even on other releases of the same distro — due to incompatibilities in software, library, and kernel versions.

For example, if you are a scientist writing a piece of research software on a Linux-based OS, it’s often difficult for your colleagues to install, configure, and run that software on their computers unless they are using the exact same release of the same Linux distro that you’re using. This lack of software portability makes it harder for colleagues to reproduce and extend your work, thus hindering scientific progress.

The diagram below shows a family tree of Linux distros, with time laid out horizontally; click for full-sized image or see the original source here. Note that this diagram shows only time and ancestry relationships and doesn’t illustrate each individual release; an ideal diagram would render all releases of all distros as individual points along the lines.

Linux distros (400 width)

 

Now imagine two distro releases X and Y as points on this giant tree. The likelihood that a piece of software you create on distro X will work on someone else’s distro Y is proportional to how far apart X and Y are in the diagram:

  • The farther apart X and Y are horizontally, the less likely that your software will work. For example, software created on a 2012-era Ubuntu distro will probably not work on a 2004-era Ubuntu due to library and kernel incompatibilities.
  • The farther apart X and Y are in terms of “familial relationship” in the tree, the less likely that your software will work. For example, software created on Ubuntu is less likely to run on Fedora since they are from different branches (Debian and Red Hat, respectively).

One Potential Solution: CDE

Three years ago, as part of my Ph.D. dissertation, I created a piece of software called CDE that alleviates such cross-distro software incompatibility problems.

You can use CDE to package up your software and all of its dependencies in a portable format that runs on just about any Linux distro released within approximately five years of the distro you’re using. (I describe caveats and limitations in Section 2.3 of this research paper.)

Let’s visualize the benefits of CDE by zooming in on the Linux distro family tree. The (super-long!) diagram below is a slice of the tree showing distros released from 2006 to 2011.

If you use CDE to package up your software on any distro in this diagram, you can instantly run that packaged software on any other distro in the diagram. In short, CDE makes your software portable across thousands of Linux distro releases over a five-year period.

 

CDE is free and open-source software, hosted on GitHub. Although it was created by a single grad student (me!) as a research project, I’m excited that thousands of people have used CDE in the past three years to package up their Linux software, including:

  • research scientists at NASA,
  • scientists deploying experiments to the European Grid computing infrastructure,
  • engineers prototyping experimental code at software companies,
  • creators of open-source software packages,
  • CS researchers who want others to reproduce their experiments,
  • and CS teaching assistants packaging up class programming assignments for distribution.

Unfortunately, I no longer have time to maintain CDE, but I would love for this project to stay alive in some form. Visit the CDE website and read the research paper to learn more.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More