Flavors of shared state in Cats Effect
In this post, we will explore the various ways to share state in a Cats Effect application.
Introduction: why do we need shared state?
One may think that Functional Programming eliminates the need for shared state altogether - however, sooner or later, we have to interact with the so called "Real World". Turns out, the world is a JoJo reference a stateful thing, so interacting with it requires some form of effects.
As if that wasn't enough, sometimes we want to keep some state in our own application: even something like a "request counter", which keeps a running total of the requests handled by the service, requires some form of shared in-memory state.
Then we have more complex concepts such as connections, resource pools, queues, rate limiters and so on, all of which are by nature stateful as well. Let's take Cats Effect for a spin and see what tools we can use to build such mechanisms.
Working example: simple counter
Following the idea of a request counter, let's use this concept for our example.
Our counter will have two operations:
Starting simple: Ref
First, we'll define a helper to reduce the boilerplate in our implementations:
def makeCounter(inc: IO[Unit], retrieve: IO[Int]): Counter = new
Now let's try and implement the counter with our good old friend, cats.effect.kernel.Ref.
// conveniently aliased
val refCounter: IO[Counter] = Ref.of(0).map
We can now instantiate such a counter and use it, including in concurrent scenarios:
val useCounter = for {
counter <- refCounter
_ <- counter.increment.parReplicateA(2)
v <- counter.get
} yield v
Let's see what that gives us:
useCounter.unsafeRunSync()
// res0: Int = 2
Clearly, some sharing happened: the value was updated twice!
What are the characteristics of this counter?
- It can be atomically updated from multiple fibers
- The count is shared everywhere within the scope where the counter is visible, i.e. within the for comprehension.
That makes it suitable for a counter of all requests an application has received while it's up. However, let's imagine we need some more isolation: let's try to count all client requests that we've made while handling a server request.
We'll make the simplest client call possible:
def sampleRequest(client: Client[IO]): IO[Unit] = client.run(Request()).use_
and our request handler may look like this (with http4s):
def routes(client: Client[IO]): HttpRoutes[IO] = HttpRoutes.of
Can we use refCounter for this purpose? Sure, why not. We'll increment before the request, to avoid dealing with error handling. We'll also return the final value in the response:
def routes(client: Client[IO]): HttpRoutes[IO] = HttpRoutes.of
Let's try it out:
def testRoute(route: Client[IO] => IO[HttpRoutes[IO]]): IO[List[String]] =
// validate results
.flatTap
testRoute(client => IO.pure(routes(client))).unsafeRunSync()
// Success!
// res2: List[String] = List("2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
It... does work. Notably, we can't put the refCounter.flatMap part outside of our router, as that'd make the counter shared across all requests: remember, we wanted to isolate the counts of each request we handle.
Let's hide the increments in a client middleware, to declutter the code a bit:
def withCount(client: Client[IO], counter: Counter): Client[IO] = Client
Now here's the updated routing code:
def routes(client: Client[IO]): HttpRoutes[IO] = HttpRoutes.of
It still feels cluttered, doesn't it? Also, that's probably the most obvious problem with our example - but there's an even more serious one that's likely to bite us when we try to build something real. Let's discuss that.
In our example, the client calls are happening directly in the route (the request handler). However, in a real application it's very likely to be wrapped in at least one layer of abstraction: this could be a UserService, which would in turn use a UserClient, which would be the one actually using Client.
It might look a little like this:
def routes(userService: UserService[IO]): HttpRoutes[IO] = HttpRoutes.of
In such a design, the http4s Client is nowhere to be seen - it's encapsulated in the definition of UserService (and possibly even further). How do we tell it about the counter, then? Do we add a Counter parameter to all the methods of UserService and its dependencies, all the way until we have access to the http4s Client?
Well, that'd work, but it'd certainly go against the point of all this abstraction. Surely we can find something else.
All we need is to propagate the Counter from our request handler to the client. If this sounds to you like a Reader monad, that's certainly one tool to achieve this! However, we're not always going to be able to use it:
- it implies that
UserService's methods have a reader monad in their return types- this means that we either use polymorphic effects (AKA Tagless Final), or hardcode the exact reader monad with the exact type of context that we we need (
Counter)
- this means that we either use polymorphic effects (AKA Tagless Final), or hardcode the exact reader monad with the exact type of context that we we need (
- this approach is still "viral", i.e. it infects the interfaces of not only
UserService, but also its peers (theUserClientmentioned earlier)
So let's not do that here. If we're not going to pass the counter as parameters (or a reader monad), could we inject the counter to the UserService at construction time, then?
def routes(mkUserService: Counter => UserService[IO]): HttpRoutes[IO] = HttpRoutes.of
Looks like we can. However, this is a bit of a code smell: the fact that UserService depends (indirectly) on the counter is now visible to our routing. In addition, whenever we receive a request, we have to construct not only a UserService, but also every component that carries the Counter dependency. We'd have to measure the performance impact of such allocations on the hot path, and it'd certainly have severe implications if any of these components are stateful themselves.
We won't be doing that, then. So what are our demands?
- The counter dependency should be hidden from intermediate layers of abstraction (only the client and router need to know about it)
- The counter's state should be isolated between requests, including concurrent ones.
Readers with Java experience may recognize this as something similar to ThreadLocal. However, in Cats Effect we can't simply use a ThreadLocal, because due to its fiber-based concurrency model, a single request may be processed on any number of threads (and on non-JVM platforms like Scala.js, we might only have one thread to begin with).
What we need is more of a... "fiber" local.
Isolated state with IOLocal
Cats Effect provides an IOLocal. It does exactly what we want! Let's implement the counter with it:
val localCounter: IO[Counter] = IOLocal(0).map
Now, we can create one IOLocal-backed Counter for our entire application, and share it.
As a good practice we'll reset each counter after each server request finishes processing (just in case any fibers are recycled between serial requests), but in reality it's likely that we'll just get a new fiber for each request.
For this, we'll need to enhance our definition of localCounter with the ability to reset it after we're done. I'm choosing to do this by composition, although inheritance could also be a reasonable choice:
(c: Counter, withFreshCounter: IO ~> IO)
val localCounterR: IO[CounterWithReset] = IOLocal(0).map
We'll also need to enhance our router so that it actually resets the local. Let's make another middleware:
def withCountReset(r: HttpRoutes[IO], c: CounterWithReset): HttpRoutes[IO] = Kleisli
If you want to be more concise:
def withCountReset(r: HttpRoutes[IO], c: CounterWithReset): HttpRoutes[IO] =
r.mapF(_.mapK(c.withFreshCounter))
Now we're armed and we can get to the action:
def appR(rawClient: Client[IO]): IO[HttpRoutes[IO]] =
// 1
localCounterR.map
def routes(client: Client[IO], c: Counter): HttpRoutes[IO] = HttpRoutes.of
And we've achieved our desired goal! To sum up, here's what's happening:
- We create an
IOLocal-based counter and open a scope of sharing by mapping on it. The resultantIOcan be flatMapped on directly in your IOApp, as soon as you have a Client. If you onlyflatMaponce, you'll have a globally-sharedIOLocal(which is likely what you want). - We wrap the "real" http4s Client with middleware that increments the count before sending any request.
- We pass the wrapped client to the routes. In the
UserServiceexample above, at this point we could deal with the construction of anyUserServicedependencies, and it'd only happen once, instead of on every request. - We wrap the routes in middleware that resets the counter after processing each request.
Let's test that as well:
testRoute(appR).unsafeRunSync()
// Success!
// res7: List[String] = List("2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
So... is that it? Well, sort of.
Child fibers
In our desire to share the Counter instance across the entire app, we've forgotten something important: we actually want some isolation. And yes, we do have isolation between the requests processed by the application, but what if the request itself is split into multiple fibers?
What if the route actually looks like this?
def routes(client: Client[IO], c: Counter): HttpRoutes[IO] = HttpRoutes.of
It's a critical difference: the two requests will now be executed in parallel. Because IOLocal doesn't propagate changes from a child fiber to its parent, the global counter will now be unaffected by anything that's been forked! This could be a problem. And it is!
Some boilerplate to prepare for that test run
def appRWithParallel(rawClient: Client[IO]): IO[HttpRoutes[IO]] =
localCounterR.map
def routesWithParallel(client: Client[IO], c: Counter): HttpRoutes[IO] = HttpRoutes.of
testRoute(appRWithParallel).unsafeRunSync()
// Failure!
// res9: List[String] = List("0", "0", "0", "0", "0", "0", "0", "0", "0", "0")
Here's the thing: Ref didn't care about the fibers' family drama. If only we could pick and choose which parts of IOLocal and Ref we get...
Well, why don't we? Why don't we put a Ref inside IOLocal? Let's ponder:
IOLocal[A] ensures that updates of A are not propagated to parent/child fibers. This can be done without inspecting the internals of the A value, by relying on its immutability to avoid sharing changes.
If A contains mutable state (which Ref pretty much is), you could mutate that state without ever calling update on the IOLocal... and it'll be visible in all offspring of the fiber that inserted that value.
Here's what this trick would look like: we can use CounterWithReset to encapsulate the "swapping" of the Ref for each request:
val localRefCounter: IO[CounterWithReset] =
// 1
Ref.of(0).flatMap(IOLocal(_)).map
This may be a lot to take, so let's go through the steps one by one:
- We create a
Refas an initial value for theIOLocal. You could avoid this by making theIOLocalstore anOption[Ref], but I figured this would make it easier to implement the other parts. ThatRefis immediately used to create theIOLocal. - We make a
Counterinstance based on the composedIOLocalandRef. Neither of the methods of the counter actually update the Local: they both read from it and then behave just like a normalRef-based counter would. - Before each request, we'll create a fresh
Refand put it into the Local. Afterwards, we'll reset the Local to its previous state, for good measure. - Finally, we return the counter with its reset button.
The usage of localRefCounter is identical to that of localCounterR.
Some boilerplate to prepare for that test run
def appRWithParallelAndRef(rawClient: Client[IO]): IO[HttpRoutes[IO]] =
localRefCounter.map
def routesWithParallelAndRef(client: Client[IO], c: Counter): HttpRoutes[IO] = HttpRoutes.of
Does it resolve the issue?
testRoute(appRWithParallelAndRef).unsafeRunSync()
// Success!
// res10: List[String] = List("2", "2", "2", "2", "2", "2", "2", "2", "2", "2")
Apparently, it does!
What about the opposite?
Perhaps you were wondering if we can flip the order of "state carriers" around and have Ref[IO, IOLocal[A]]. I would be inclined to say no, because the standard Ref implementation relies on a CAS (compare and set) loop for its updates, and that might not play well with how IOLocal is implemented. However, you're welcome to try... in a controlled environment :)
Can we go deeper?
We've established that with the composition of IOLocal and Ref we have more control over when we "fork" the scope of our state than if we were using either of them separately.
Just how much control are we talking, exactly?
Turns out, we can freely "fork" whenever we want - just like we do in withFreshK - it's just a matter of exposing such a feature through our counter's API.
trait Counter {
def increment: IO[Unit]
def get: IO[Int]
+ def fork: IO ~> IO
}
It's worth noting that at this point we cannot really implement a valid Counter based on just a Ref - by giving out API more power, we've constrained the number of valid implementations - constraints liberate, liberties constrain.
Of course, if you want to control the value that forked counters will start with, you'll need to further adjust the API to accomodate that feature. This is left as an exercise for the reader.
Trivia: Natchez's Trace algebra
The Natchez library, which is used for distributed tracing, uses a similar idea in its Trace algebra:
Its two main implementations (based on Kleisli or IOLocal) both carry a Span, which is backed by mutable state - depending on the backend it's sometimes a Ref, sometimes a plain mutable object like an OpenTracing Span.
Summary
We've explored a bunch of options for sharing state, with various degrees of isolation. Which one should you use?
It depends on the problem you're trying to solve. If your goal is to share the state for the entire application (for example, for a client's rate limiter), Ref will do just fine. If you want to prioritize isolation and avoid any race conditions, IOLocal may work well for you. However, if you want more precise control and the ability to "disconnect" from a more "global" state, it's likely that you'll want to wrap some mutable state in an IOLocal - such as a Ref.
Personally, I believe for most usecases in HTTP applications that desire "reader monad" semantics of state, IOLocal + Ref should be preferred over just IOLocal. Using the latter is likely to lead to subtle bugs, such as changes not being propagated upwards when a library down the call stack starts to involve concurrency.
Keep in mind: Cats Effect has other building blocks for concurrent applications, and some of them might be better suited for solving certain types of problems. I encourage you to check out the documentation to learn more about them.
I hope this article helps you make an informed decision. Thanks for reading!
Links
Here are the links from this post: