By Zach Bullough on 8/30/2021
At OMG, many of our backend microservices are written in the Go Programming Language. Among a few other things, Go is primarily famous for having extremely efficient, first class concurrency support.
Go’s primary form of concurrency is called a goroutine. Goroutines, like coroutines in other languages, allow for asynchronous programming. At a low level, Go’s runtime multiplexes these coroutines onto OS threads, providing parallelism in addition to concurrency. While many other languages have something similar, Go’s goroutines are primitives, built into the base language, and the Go runtime manages them extremely efficiently.
Parallel processing in mainstream computing boils down to splitting a task between either threads or processes. Threads are substantially more lightweight and are generally more desired than processes, especially when you need a lot of them. However, threads present a new problem: unlike separate processes, which all have their own, isolated memory space, threads all share the same memory space. A number of best practices and models can be utilized to avoid dangerous runtime race conditions, crashes or unexpected behavior.
While Go supports numerous ways to safely share memory, the preferred way of sharing information in Go is to use the channel primitives that Go provides. Channels actually eliminate the need to mutate shared memory across threads at all, and simply send information from one thread to another, safely. To demonstrate how channels work, we’re going to implement the Chudnovsky algorithm for calculating Pi.
If you would like to try this program for yourself and don’t have go installed on your system, you can run a slightly modified version on the Go Playground.
Executing this program with an input of 20 (or so) will yield an answer close to Pi: 3.141593
This is pretty straightforward: we build a pool of workers, which can process 8 tasks at a time. We do this by spawning a goroutine 8 times. The goroutine is going to do nothing but take input from a channel, call the provided function, then send the return value of that function to the output channel.
Here is a snippet of the most critical components, with comments for added clarity:
The huge plus to Go’s channel system is that there is actually no shared memory being mutated here. The “main” thread simply tells the worker threads what to do by writing to the input channel, and then reads from the output channel, waiting if necessary for the workers to send a value. This can be scaled to substantially more complicated implementations, selecting over multiple channels, sending complicated message bodies to implement an Actor model, and numerous other concurrency patterns.
However, while Go provides powerful primitives in the form of goroutines and channels, it is still easy to write code that will fail spectacularly at runtime. As an example, if I mistake the number of executions and have my output loop iterate one too many times, it will wait forever for another message that will never arrive, deadlock, and panic:
In addition, while using channels is indeed perfectly safe, in most cases Go won’t know or care if you also mutate shared state using normal variables. The responsibility is on the user and their tools to catch those mistakes and to avoid sharing references across goroutines, and strictly use channels for communication.
There are many different methods and patterns for sharing memory safely, but one that is particularly interesting is Software Transactional Memory. In short, Transactional Memory as a concept enforces database-like transactions for memory operations, effectively eliminating concerns about memory safety. Software Transactional Memory simply means that the transaction is implemented in software, as opposed to hardware (in the memory or memory controller itself).
Possibly the coolest implementation of STM is in the Clojure language, which is a functional, JVM-based Lisp dialect. It’s unique among mainstream languages for having STM built into the core language itself instead of requiring libraries or other external tools. Due to Clojure being a functional language, and thus mutating state is only permitted in strict, controlled situations, Clojure can prevent unsafe mutation of shared memory at compile time in most situations.
Here is the Chudnovsky algorithm again, in Clojure:
Here is a gist of both this source, and the package configuration for running locally. You will need to install Clojure and the Leiningen package manager on your system to run it.
While clojure can certainly be a bit more challenging to read unless you’re familiar with Lisp or functional patterns, the bulk of our parallelism logic is in these few lines:
The main advantage of STM over Go’s channel system is that it is far more similar to regular, synchronous programming. If you have a working synchronous version, you can typically convert it to an asynchronous version with fewer changes to the code’s structure than with channels. Clojure’s functional nature combined with native STM support also makes it more difficult to accidentally mutate shared memory in an unsafe way.
However, STM is comparatively slow. The entire object in memory (in this case, sum) is locked and all other, contending threads must wait for the transaction to commit or rollback before they can access that memory. In this case, the operation being performed is extremely fast, but if we had also calculated the itervalue inside the transaction instead of before it, our implementation would be much slower, and possibly slower than a synchronous implementation. So, in this case there is essentially no way for us to unsafely manipulate shared memory, but we could very easily write a program that is worse off for being parallel.
Zach Bullough – OMG’er #76