Revamping Python for an AI World

Python is one of the most popular programming languages in existence. Easy to learn and easy to use, it has been around for years, so there is a large community of Python developers to support each other. It has built up an ecosystem of libraries that allow users to drop in the functionalities they need. It does, however, come with downsides: its programs tend to run slowly, and because it is inefficient at running processes in parallel, it is not well suited to some of the latest Artificial Intelligence (AI) programming.

Hoping to overcome those difficulties, computer scientist Chris Lattner set out to create a new language, Mojo, which offers the ease of use of Python, but the performance of more complex languages such as C++ or Rust. He teamed up with Tim Davis, who he had met when they both worked for Google, to form Modular in January 2022. The company, where Lattner is chief executive officer and Davis chief product officer, provides support for companies working on AI and is developing Mojo.

A modern AI programming stack generally has Python on top, Lattner says, but because that is an inefficient language, it has C++ underneath to handle the implementation. The C++ then has to communicate with performance accelerators or graphics processing units (GPUs), so developers add a platform such as Computer Unified Device Architecture (CUDA) to make efficient use of those GPUs. “Mojo came from the need to unify these three different parts of the stack so that we could build a unified solution that can scale up and down,” Lattner says.

The result is a language that has the same syntax as Python, so people used to programming in Python can adopt it with little difficulty, but which, by some measures, can run up to 35,000 times as fast. For AI, Mojo is especially fast at performing matrix multiplications, which are used in many neural networks, because it compiles the multiplication code to run directly on the GPU, bypassing CUDA.

Lattner is no stranger to developing programming languages. For his master’s thesis at the University of Illinois at Urbana-Champaign, he and some colleagues created LLVM, a set of compiler and programming tools to optimize other programs. He also came up with the Swift programming language for Apple, which allows developers to write their own apps for Apple’s iOS operating system.

Jeremy Howard, an honorary professor of computer science at the University of Queensland, Australia, and a co-founder of fast.ai, a company that provides free coding courses and a software library for deep learning applications, says something better than Python is needed for implementing neural networks, which handle a lot of data and therefore need to run fast. Generally speaking, programmers write such programs in languages such as C, C++ or Rust, which then run 100,000 or 1 million times faster than Python, says Howard, who is also an advisor to Modular. “Trouble is that now you’ve got to do a whole lot of things other than just thinking about how to implement your neural network. You have to think about things like allocating memory and freeing it again and dealing with string termination,” he says. “If I want to write something in C, it’s going to take maybe 10 times, maybe 100 times longer than writing in Python.”

Additionally, GPUs and Tensor Processing Units (TPUs) can run C-based programs much faster than a Central Processing Unit (CPU) can. However, Howard says, it is harder to write C for a GPU or TPU than for a CPU. “So now we’re talking another couple of orders of magnitude slower development time.” While libraries can provide code to speed the development along, they are limited to operations other people already have created, which can stifle innovation, Howard argues.

Those are challenges enough for computer programmers, he says, but there needs to be a language that is usable by the general public, like Python. “Increasingly, code is not being written by computer programmers. It’s being written by doctors and journalists and chemists and gamers,” Howard says. “All data scientists write code, but very few data scientists would consider themselves professional computer programmers.”

Mojo attempts to fill that need by being a superset of Python. A program written in Python can be copied into Mojo and will automatically run faster, the company says. The speedup comes from a variety of factors. For instance, Mojo, like other modern languages, enables threads, small tasks that can be run simultaneously, rather than in sequence. Instead of using an interpreter to execute code as Python does, Mojo uses a compiler to turn the code into assembly language. Mojo also gives developers the option of using static typing, which defines data elements and reduces the number of errors.

One of the factors that slows down Python is its Global Interpreter Lock, which allows only one thread to be executed at a time. That made sense when Python was created in the early 1990s, Howard says, because most people had only one CPU core with which to work. While it is possible to create some parallel processes in Python, doing so is cumbersome, and Python cannot use multiple threads efficiently so it cannot take full advantage of the available hardware. “A phone will have eight CPU cores in it. A modern desktop will have maybe 16. If you can only use one of those, that means you’re getting 1/16 of the compute power of the system,” Lattner says.

Additionally, he says, “using a compiler instead of an interpreter gets a whole level of overhead out of the way.” That alone causes a program to run 10 to 20 times faster with no changes to the code. Other changes allow programs to run hundreds or thousands of times faster than they do in Python. The company used Mojo to create a Mandelbrot set, a fractal shape that has the same geometry at different scales. That is not a practical application, but it represents a benchmark, and Mojo was able to create the set 35,000 times as fast as Python.

Optional Types

Because Python is dynamically typed, the types are not checked until runtime instead of when the code is compiled, which makes the program slower. Mojo allows developers to continue using dynamic typing if they want to, but it also provides the option of static typing. “Static behavior is good because it leads to performance. Static behavior is also good because it leads to more correctness and safety guarantees,” Lattner says.

One innovation he added is autotuning, in which the programmer provides a range of values for various aspects of the program. They might, for example, specify that a tile could have a size of 2, 4, 8, or 16, or that a particular function could be implemented with any of a variety of methods. The compiler then implements all the different combinations of those variables and runs them to see which one is fastest. That way, the program can be optimized automatically for the particular hardware on which it is to run.

Guido van Rossum, the programmer who created Python and who was known as the language’s “benevolent dictator for life” until he stepped back from that role in 2018, says he is interested to watch how Mojo develops and whether it can hit the lofty goals Lattner is setting for it. “If you hear Chris talk about it, Mojo is slated to become a complete superset of Python, where whenever you write just Python code, it will execute in Mojo exactly the same way as it executes in Python, but much faster.” He is not yet sure that Mojo can achieve that, but he emphasizes that the language is in its early stages and, as of July 2023, Mojo had not yet been made available for download.

He thinks Mojo might prove more useful for experienced developers who already know how to write efficient code in C++ or Rust. “Someone who is a beginning Python user is not suddenly going to be able to write the type of Mojo code that executes much faster than it would in Python,” van Rossum says.

Modular made Mojo accessible to some users in a Jupyter notebook, an interactive development environment that allows people to play around with the code. The company said it expected to allow downloads sometime in the fall of 2023, with a full release perhaps in the summer of 2024.

Lattner says there may be pieces of Python that do not work in Mojo, but they will be insignificant. He says Mojo relates to Python in the same ways that C++ relates to C, with additions such as classes and templates that turned C into a higher-level language. “There are programs you can write in C that do not work the same way or don’t even compile in C++, but they’re so minuscule that it doesn’t matter. The same thing is true in Mojo,” he says. “Our goal is be as compatible as possible in all the cases that matter and make sure that we work with the existing ecosystem because we don’t want to break Python, we want to make Python better.”

Doug Meil, a software architect who has written about the proliferation of new programming languages, says Mojo is essentially Python++ for AI. “He’s trying very hard to support Python and meet people where they are, which is I think remarkably pragmatic,” Meil says. “They’re not coming up with an entirely new syntax, and it’s going to be way faster in scale across multiple hardware platforms. So that’s really cool.”