authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Michael is a distributed systems and fault tolerance expert, having worked with AT&T, E*Trade, Nokia and others.
PREVIOUSLY AT
Legacy code is everywhere. And as the rate at which code proliferates continues to increase exponentially, more and more of that code is being relegated to legacy status. In many large organizations, maintenance of legacy systems consumes more than 90% of information systems resources.
The need to modernize legacy code and systems to meet current performance and processing demands is widespread. This post provides a case study of the use of the Erlang programming language, and the Erlang-based CloudI Service Oriented Architecture (SOA), to adapt legacy code – in particular, a decades-old collection of C source code – to the 21st century.
Years ago, I was a big fan of text-based multiplayer online games known as Multi-User Dungeons (MUDs). But they were always riddled with performance problems. I decided to dive back into a decades-old pile of C source code and see how we could modernize this legacy code and push these early online games to their limits. At a high level, this project was a great example of using Erlang to adapt legacy software to meet 21st century requirements.
A brief summary:
All Massively Multiplayer Online Role Playing Games (MMORPGs) – like World of Warcraft and EverQuest – have developed features whose early origins can be traced back to older text-based multiplayer online games known as Multi-User Dungeons (MUDs).
The first MUD was Roy Trubshaw’s Essex MUD (or MUD1) which was originally developed in 1978 using the MARO-10 assembler language on a DEC PDP-10, but was converted to BCPL, a predecessor of the C programming language (and was running until 1987). (As you can see, these things are older than most programmers.)
MUDs gradually gained popularity during the late 1980s and early 1990s with various MUD codebases written in C. The DikuMUD codebase, for example, is known as the root of one of the largest trees of derived MUD source code, with at least 51 unique variants all based on the same DikuMUD source code. (During this timeframe, incidentally, MUDs became alternatively known as the “Multi-Undergraduate Destroyer” due to the number of college undergraduates that failed out of school due to their obsession with them.)
Historical C MUD source code (including DikuMUD and its variants) is riddled with performance problems due to existing limitations at its time of creation.
Back then, there was no easily accessible threading library. Moreover, threading would have made the source code more difficult to maintain and modify. As a result, these MUDs were all single-threaded.
During a single “tick” (an increment of the internal clock that tracks the progression of all game events), the MUD source code has to process every game event for every connected socket. In other words: every piece of code slows down the processing of a single tick. And if any computation forces the processing to span longer than a single tick, the MUD lags, impacting every connected player.
With this lag, the game immediately becomes less engaging. Players look on helplessly as their characters die, with their own commands remaining unprocessed.
For the purposes of this legacy application modernization experiment, I chose SillyMUD, a historical derivative of DikuMUD that has influenced modern MMORPGs and the performance problems that they share. During the 1990s, I played a MUD that was derived from the SillyMUD codebase, so I knew the source code would be an interesting and somewhat familiar starting point.
The SillyMUD source code is similar to that of other historical C MUDs in that it is limited to roughly 50 concurrent players (64, to be precise, based on the source code).
However, I noticed that the source code had been modified for performance reasons (i.e., to push its concurrent player limitation). Specifically:
CloudI was previously discussed as a solution for polyglot development due to the fault-tolerance and scalability it provides.
CloudI provides a service abstraction (to provide a Service Oriented-Architecture (SOA)) in Erlang, C/C++, Java, Python, and Ruby, while keeping software faults isolated within the CloudI framework. Fault-tolerance is provided through CloudI’s Erlang implementation, relying on Erlang’s fault-tolerant features and its implementation of the Actor Model. This fault tolerance is a key feature of CloudI’s Erlang implementation, as all software contains bugs.
CloudI also provides an application server to control the lifetime of service execution and the creation of service processes (either as operating system processes for non-Erlang programming languages or as Erlang processes for services implemented in Erlang) so that service execution occurs without external state impacting reliability. For more, see my previous post.
The historical C MUD source code provides an interesting opportunity for CloudI integration given its reliability problems:
With CloudI integration, server stability bugs can still be fixed normally, but their impact is limited so that the game server’s operation is not always impacted when a previously undiscovered bug causes an internal game system to fail. This provides a great example of the use of Erlang to enforce fault-tolerance in a legacy codebase.
The original codebase was written to be both single-threaded and highly dependent on global variables. My goal was to preserve the legacy source code functionality while modernizing it for present day usage.
With CloudI, I was able to keep the source code single-threaded while still providing socket connection scalability.
Let’s review the necessary changes:
The buffering of SillyMUD console output (a terminal display, often connected with Telnet) was already in place, but some direct file descriptor usage did require buffering (so that the console output could become the response to a CloudI service request).
Socket handling in the original source code relied on a select()
function call to detect input, errors, and the chance for output, as well as to pause for a game tick of 250 milliseconds before handling pending game events.
The CloudI SillyMUD integration relies on incoming service requests for input while pausing with the C CloudI API’s cloudi_poll
function (for the 250 milliseconds before handling the same pending game events). The SillyMUD source code easily ran within CloudI as a CloudI service after being integrated with the C CloudI API (although CloudI provides both C and C++ APIs, using the C API better facilitated integration with SillyMUD’s C source code).
The CloudI integration subscribes to three main service name patterns to handle connect, disconnect, and gameplay events. These name patterns come from the C CloudI API calling subscribe in the integration source code. Accordingly, either WebSocket connections or Telnet connections have service name destinations for sending service requests when connections are established.
The WebSocket and Telnet support in CloudI is provided by internal CloudI services (cloudi_service_http_cowboy
for WebSocket support and cloudi_service_tcp
for Telnet support). Since internal CloudI services are written in Erlang, they are able to leverage Erlang’s extreme scalability, while at the same time using the CloudI service abstraction that provides the CloudI API functions.
By avoiding socket handling, less processing occurred on socket errors or situations like link death (in which users are disconnected from the server). Thus, removing low-level socket handling addressed the primary scalability problem.
But scalability problems remain. For example, the MUD uses the filesystem as a local database for both static and dynamic gameplay elements (i.e., players and their progress, along with the world zones, objects and monsters). Refactoring the legacy code of the MUD to instead rely on a CloudI service for a database would provide further fault-tolerance. If we used a database rather than a filesystem, multiple SillyMUD CloudI service processes could be used concurrently as separate game servers, keeping users isolated from runtime errors and reducing downtime.
There were three primary areas of improvement:
So, with simple CloudI integration, the number of connections scaled by three orders of magnitude while providing fault-tolerance and increasing the efficiency of the same legacy gameplay.
Erlang has provided 99.9999999% uptime (less than 31.536 milliseconds of downtime per year) for production systems. With CloudI, we bring this same reliability to other programming languages and systems.
Beyond proving the viability of this approach for improving stagnant legacy game server source code (SillyMUD was last modified over 20 years ago in 1993!), this project demonstrates on a broader level how Erlang and CloudI can be leveraged to modernize legacy applications and provide fault-tolerance, improved performance, and high availability in general. These results hold promising potential for adapting legacy code to the 21st century without requiring a major software overhaul.
Seattle, WA, United States
Member since April 4, 2016
Michael is a distributed systems and fault tolerance expert, having worked with AT&T, E*Trade, Nokia and others.
PREVIOUSLY AT
World-class articles, delivered weekly.
World-class articles, delivered weekly.
Join the Toptal® community.