By    |   April 5, 2021
What can we learn from two major, unexpected failures?

On March 23, the bulbous bow of the massive container ship Ever Given lodged into the sandy bank of the Suez Canal in Egypt. It blocked cargo traffic for six days, cost the global economy six billion dollars. The situation inspired millions of memes like this these, largely about futility and ludicrous scale.

And then Ruby on Rails broke. The web application was taken down by an open-source license violation that blocked companies from releasing their own products or supplying required updates and features.

Both of these events made this very clear: the interdependence of our infrastructure is fragile. But only one of these events got the attention (and the memes) it deserved. Until now.

Why did Ruby on Rails break?

When Ruby on Rails went down, companies around the world attempted to install or update their instances of the framework and were completely blocked due to an error caused by one of its smallest dependencies. This stopped many companies from releasing their own products or supplying required updates and features. These companies were essentially powerless, in that the resolution of the problem was largely outside their hands. To fix the situation, they were at the mercy of the software equivalent of a couple people in a bulldozer in Egypt.

Ruby on Rails, like most modern software systems, is made up of hundreds of dependencies. These dependencies, called gems, are small, independently written open-source packages providing bite-sized pieces of functionality. These libraries allow Rails to load jpeg files, animate a button, or (in this case) figure out what type of file is being loaded by the system.

Rails used a third-party library, mimemagic, to identify file and media types using a quasi-standard called mime types. Using a database and some code, a system can look at a file and identify what type it is, and decide how to handle it. For example, a JPEG file might be shown on screen, while an MP3 file might be sent to a music player.

Licensing matters

The author of mimemagic was alerted by the author of another open-source library (shared-mime-info) that its code was taken and used contrary to its open-source license. The shared-mime-info library is licensed under the terms of the General Public License (GPL), a strong copyleft-style license that requires completely sharing the source code of applications that link or depend on it. Mimemagic, on the other hand, is licensed under the terms of the MIT license, an attribution-style license that primarily requires credit to be given if used.

The author of mimemagic quickly removed the open-source project in an apparent attempt to stop potentially infringing on the GPL-licensed code. This action caused a ripple effect on all projects that depended on it. While attempting to install or update themselves, these projects quickly found that they could not complete the process due to the missing but essential library.

Rails, which depended on mimemagic, is licensed under the terms of the MIT license and forbids copyleft style licenses in its dependencies. Very quickly, automated systems discovered the missing dependency and opened up a bug ticket for the “system down” situation.

Resolving the Rails blockage

Much like the container ship blocking the canal, all business that depended on Rails being installed or updated was blocked, without a clear expectation for the timing of a resolution.

The use of build and test automation in the Ruby on Rails project allowed them to get a head start on knowing that they had a “broken build” situation and alerted them to the location of the error. It also alerted them to the follow-up licensing problem caused by the initial GPL relicense of the mimemagic library.

Humans attempted to understand and resolve the situation. The author of mimemagic released updated versions under the GPL license, but this was not an acceptable fix for Rails, a system that cannot use copyleft licensed code.

In cases like this, downstream projects typically look at the original upstream open-source project to understand how (and if) they will fix their license or security problem. The original author typically understands the problem space and the code best, and is typically the right person to attempt a first fix. This may take a little time, especially since the original author may be in a different time zone, may have a “day job” that takes up their focus, or they may no longer be interested in working on the project.

Contributors came together to rewrite the mimemagic library to remove the GPL-licensed code and have released an updated version with the MIT license. The Ruby on Rails project is working to confirm this new version works as expected and are updating their dependency configuration files to pull in the correctly licensed version.

New documentation and test cases are being written and run, in order to confirm that the new code works similarly to the old code. The project authors are updating the users on what actions they may need to take in order to use the new code on their platform.

4 takeaways

Since the old library had an apparent GPL violation, many companies are looking at their installations and are deciding if they need to push a new update or refresh. They are getting advice from their legal staff and compliance teams to understand the impact, and are likely starting to get questions from their customer base. For now, it remains unclear if the original GPL-component author will force the issue of cleaning up the code in existing Ruby on Rails instances. Here are a few things to keep in mind going forward:

  1. Expect unexpected failure modes. Modern software systems are complex and may unexpectedly disappear for a few hours to days for a range of reasons. Even systems that are “generally recognized as safe” may have unexpected failure modes. We’ve previously have seen this with denial of service (DOS) attacks against hosting providers like GitHub, and due to security events like Heartbleed. It’s critical to have contingency plans in place to address.
  2. Examine outside dependencies. Assess how your business processes may be affected by the loss of access to externally hosted systems and repositories. For example, discover and monitor use of external component registries like Maven, npm, Pypi and DockerHub, or large complex frameworks or operating systems that are typically loaded into containers.
  3. Be vigilant about the licensing of the systems you use and build. How might these licenses affect downstream users? Staying compliant with licensure expectations is essential to your system’s reliability and could have significant ripple effects if not managed properly.
  4. Expect to see more supply chain problems due to license compliance and security issues in the future. Recall the recent SolarWinds attack, in which Russian hackers compromised the infrastructure of SolarWinds, a company that produces a network and applications monitoring platform, and then used that platform to distribute trojanized updates to users of that software. Any license compliance and security issues add infrastructure risk that can dramatically affect supply chain.

By understanding our use of open source, external systems and by creating contingency plans, we can reduce the business impact of events such as these. And maybe keep this from happening again:

Source: Imgflip.com

—-

Are you a fearless technologist with a passion for building the systems of tomorrow? Explore opportunities with our team.

 

Jeff Luszcz

About Jeff Luszcz

Jeff Luszcz, Director of Open Source, PEAK6