Cassandra Read-Side Support

§Cassandra Read-Side support

This page is specifically about Lagom’s support for Cassandra read-sides. Before reading this, you should familiarize yourself with Lagom’s general read-side support.

§Query the Read-Side Database

Let us first look at how a service implementation can retrieve data from Cassandra.

import scala.concurrent.Future
import akka.NotUsed
import akka.stream.scaladsl.Source
import com.lightbend.lagom.scaladsl.api.Service
import com.lightbend.lagom.scaladsl.api.ServiceCall
import com.lightbend.lagom.scaladsl.persistence.cassandra.CassandraSession
class BlogServiceImpl(cassandraSession: CassandraSession) extends BlogService {

  override def getPostSummaries() = ServiceCall { request =>
    val response: Source[PostSummary, NotUsed] =
      cassandraSession.select("SELECT id, title FROM blogsummary")
        .map(row => PostSummary(row.getString("id"), row.getString("title")))
    Future.successful(response)
  }
}

Note that the CassandraSession is injected in the constructor. CassandraSession provides several methods in different flavors for executing queries. The one used in the above example returns a Source, i.e. a streamed response. There are also methods for retrieving a list of rows, which can be useful when you know that the result set is small, e.g. when you have included a LIMIT clause.

All methods in CassandraSession are non-blocking and they return a Future or a Source. The statements are expressed in Cassandra Query Language (CQL) syntax. See Querying tables for information about CQL queries.

§Update the Read-Side

We need to transform the events generated by the Persistent Entities into database tables that can be queried as illustrated in the previous section. For that we will implement a ReadSideProcessor with assistance from the CassandraReadSide support component. It will consume events produced by persistent entities and update one or more tables in Cassandra that are optimized for queries.

This is how a ReadSideProcessor class looks like before filling in the implementation details:

import scala.concurrent.ExecutionContext
import scala.concurrent.Future
import akka.Done
import com.datastax.driver.core.BoundStatement
import com.datastax.driver.core.PreparedStatement
import com.lightbend.lagom.scaladsl.persistence.AggregateEventTag
import com.lightbend.lagom.scaladsl.persistence.EventStreamElement
import com.lightbend.lagom.scaladsl.persistence.ReadSideProcessor
import com.lightbend.lagom.scaladsl.persistence.cassandra.CassandraReadSide
import com.lightbend.lagom.scaladsl.persistence.cassandra.CassandraSession
import scala.concurrent.Promise
class BlogEventProcessor(session: CassandraSession, readSide: CassandraReadSide)(implicit ec: ExecutionContext)
  extends ReadSideProcessor[BlogEvent] {

  override def buildHandler(): ReadSideProcessor.ReadSideHandler[BlogEvent] = {
    // TODO build read side handler
    ???
  }

  override def aggregateTags: Set[AggregateEventTag[BlogEvent]] = {
    // TODO return the tag for the events
    ???
  }
}

You can see that we have injected the Cassandra session and Cassandra read-side support, these will be needed later.

You should already have implemented tagging for your events as described in the Read-Side documentation, so first we’ll implement the aggregateTags method in our read-side processor stub, like so:

override def aggregateTags: Set[AggregateEventTag[BlogEvent]] =
  BlogEvent.Tag.allTags

§Building the read-side handler

The other method on the ReadSideProcessor is buildHandler. This is responsible for creating the ReadSideHandler that will handle events. It also gives the opportunity to run two callbacks, one is a global prepare callback, the other is a regular prepare callback.

CassandraReadSide has a builder method for creating a builder for these handlers, this builder will create a handler that will automatically handle readside offsets for you. It can be created like so:

val builder = readSide.builder[BlogEvent]("blogsummaryoffset")

The argument passed to this method is the ID of the event processor that Lagom will use when it persists offsets to its offset store. The offset store is a Cassandra table, which will be created for you if it doesn’t exist. You can manually create this table yourself if you wish, the DDL for its creation is as follows:

CREATE TABLE IF NOT EXISTS offsetStore (
    eventProcessorId text,
    tag text,
    timeUuidOffset timeuuid,
    sequenceOffset bigint,
    PRIMARY KEY (eventProcessorId, tag)
)

§Global prepare

The global prepare callback runs at least once across the whole cluster. It is intended for doing things like creating tables and preparing any data that needs to be available before read side processing starts. Read side processors may be sharded across many nodes, and so tasks like creating tables should usually only be done from one node.

The global prepare callback is run from an Akka cluster singleton. It may be run multiple times - every time a new node becomes the new singleton, the callback will be run. Consequently, the task must be idempotent. If it fails, it will be run again using an exponential backoff, and the read side processing of the whole cluster will not start until it has run successfully.

Of course, setting a global prepare callback is completely optional, you may prefer to manage Cassandra tables manually, but it is very convenient for development and test environments to use this callback to create them for you.

Below is an example method that we’ve implemented to create tables:

private def createTable(): Future[Done] =
  session.executeCreateTable("CREATE TABLE IF NOT EXISTS blogsummary ( " +
    "id TEXT, title TEXT, PRIMARY KEY (id))")

It can then be registered as the global prepare callback in the buildHandler method:

builder.setGlobalPrepare(() => createTable())

§Prepare

In addition to the global prepare callback, there is also a prepare callback. This will be executed once per shard, when the read side processor starts up. It can be used for preparing statements in order to optimize Cassandra’s handling of them.

Again this callback is optional, here is an example of how to prepare a statement for updating the table:

private val writeTitlePromise = Promise[PreparedStatement] // initialized in prepare
private def writeTitle: Future[PreparedStatement] = writeTitlePromise.future

private def prepareWriteTitle(): Future[Done] = {
  val f = session.prepare("INSERT INTO blogsummary (id, title) VALUES (?, ?)")
  writeTitlePromise.completeWith(f)
  f.map(_ => Done)
}

And then to register them:

builder.setPrepare(tag => prepareWriteTitle())

§Registering your read-side processor

Once you’ve created your read-side processor, you need to register it with Lagom. This is done using the ReadSide component:

class BlogServiceImpl(
  persistentEntityRegistry: PersistentEntityRegistry,
  readSide: ReadSide,
  myDatabase: MyDatabase) extends BlogService {

  readSide.register[BlogEvent](new BlogEventProcessor(myDatabase))

§Event handlers

The event handlers take an event, and return a list of bound statements. Rather than executing updates in the handler itself, it is recommended that you return the statements that you want to execute to Lagom. This allows Lagom to batch those statements with the offset table update statement, which Lagom will then executed as a logged batch, which Cassandra executes atomically. By doing this you can ensure exactly once processing of all events, otherwise processing may be at least once.

Here’s an example callback for handling the PostAdded event:

private def processPostAdded(eventElement: EventStreamElement[PostAdded]): Future[List[BoundStatement]] = {
  writeTitle.map { ps =>
    val bindWriteTitle = ps.bind()
    bindWriteTitle.setString("id", eventElement.event.postId)
    bindWriteTitle.setString("title", eventElement.event.content.title)
    List(bindWriteTitle)
  }
}

This can then be registered with the builder using setEventHandler:

builder.setEventHandler[PostAdded](processPostAdded)

Once you have finished registering all your event handlers, you can invoke the build method and return the built handler:

builder.build()

§Underlying implementation

The CassandraSession is using the Datastax Java Driver for Apache Cassandra.

Each ReadSideProcessor instance is executed by an Actor that is managed by Akka Cluster Sharding. The processor consumes a stream of persistent events delivered by the eventsByTag Persistence Query implemented by akka-persistence-cassandra. The tag corresponds to the tag defined by the AggregateEventTag.

Found an error in this documentation? The source code for this page can be found here. Please feel free to edit and contribute a pull request.