akka-docs/src/main/paradox/serialization.md
@@@note The Akka dependencies are available from Akka’s secure library repository. To access them you need to use a secure, tokenized URL as specified at https://account.akka.io/token. @@@
To use Serialization, you must add the following dependency in your project:
@@dependency[sbt,Maven,Gradle] { bomGroup=com.typesafe.akka bomArtifact=akka-bom_$scala.binary.version$ bomVersionSymbols=AkkaVersion symbol1=AkkaVersion value1="$akka.version$" group="com.typesafe.akka" artifact="akka-actor_$scala.binary.version$" version=AkkaVersion }
The messages that Akka actors send to each other are JVM objects @scala[(e.g. instances of Scala case classes)]. Message passing between actors that live on the same JVM is straightforward. It is done via reference passing. However, messages that have to escape the JVM to reach an actor running on a different host have to undergo some form of serialization (i.e. the objects have to be converted to and from byte arrays).
The serialization mechanism in Akka allows you to write custom serializers and to define which serializer to use for what.
@ref:Serialization with Jackson is a good choice in many cases and our recommendation if you don't have other preference.
Google Protocol Buffers is good if you want more control over the schema evolution of your messages, but it requires more work to develop and maintain the mapping between serialized representation and domain representation.
Akka itself uses Protocol Buffers to serialize internal messages (for example cluster gossip messages).
For Akka to know which Serializer to use for what, you need to edit your configuration:
in the akka.actor.serializers-section, you bind names to implementations of the @apidocakka.serialization.Serializer
you wish to use, like this:
@@snip SerializationDocSpec.scala { #serialize-serializers-config }
After you've bound names to different implementations of Serializer you need to wire which classes
should be serialized using which Serializer, this is done in the akka.actor.serialization-bindings-section:
@@snip SerializationDocSpec.scala { #serialization-bindings-config }
You only need to specify the name of @scala[a trait]@java[an interface] or abstract base class of the messages. In case of ambiguity, i.e. the message implements several of the configured classes, the most specific configured class will be used, i.e. the one of which all other candidates are superclasses. If this condition cannot be met, because e.g. two marker @scala[traits]@java[interfaces] that have been configured for serialization both apply and neither is a subtype of the other, a warning will be issued.
@@@ note
If @java[you are using Scala for your message protocol and] your messages are contained inside of a Scala object, then in order to
reference those messages, you will need to use the fully qualified Java class name. For a message
named Message contained inside the @java[Scala] object named Wrapper
you would need to reference it as Wrapper$Message instead of Wrapper.Message.
@@@
Akka provides serializers for several primitive types and protobuf @javadoccom.google.protobuf.GeneratedMessage (protobuf2) and @javadoccom.google.protobuf.GeneratedMessageV3 (protobuf3) by default (the latter only if depending on the akka-remote module), so normally you don't need to add configuration for that if you send raw protobuf messages as actor messages.
If you want to programmatically serialize/deserialize using Akka Serialization, here are some examples:
Scala : @@snip SerializationDocSpec.scala { #imports }
Java : @@snip SerializationDocTest.java { #imports }
Scala : @@snip SerializationDocSpec.scala { #programmatic }
Java : @@snip SerializationDocTest.java { #programmatic }
The manifest is a type hint so that the same serializer can be used for different classes.
Note that when deserializing from bytes the manifest and the identifier of the serializer are needed.
It is important to use the serializer identifier in this way to support rolling updates, where the
serialization-bindings for a class may have changed from one serializer to another. Therefore the three parts
consisting of the bytes, the serializer id, and the manifest should always be transferred or stored together so that
they can be deserialized with different serialization-bindings configuration.
The @apidoc[SerializationExtension$] is a Classic @apidoc[akka.actor.Extension], but it can be used with an @apidocakka.actor.typed.ActorSystem like this:
Scala : @@snip SerializationDocSpec.scala { #programmatic-typed }
Java : @@snip SerializationDocTest.java { #programmatic-typed }
The first code snippet on this page contains a configuration file that references a custom serializer docs.serialization.MyOwnSerializer. How would we go about creating such a custom serializer?
A custom Serializer has to inherit from @scala[@apidocakka.serialization.Serializer]@java[@apidocakka.serialization.JSerializer] and can be defined like the following:
Scala : @@snip SerializationDocSpec.scala { #imports }
Java : @@snip SerializationDocTest.java { #imports }
Scala : @@snip SerializationDocSpec.scala { #my-own-serializer }
Java : @@snip SerializationDocTest.java { #my-own-serializer }
The identifier must be unique. The identifier is used when selecting which serializer to use for deserialization.
If you have accidentally configured several serializers with the same identifier that will be detected and prevent
the @apidoc[actor.ActorSystem] from being started. It can be a hardcoded value because it must remain the same value to support
rolling updates.
@@@ div { .group-scala }
If you prefer to define the identifier in cofiguration that is supported by the @apidoc[akka.serialization.BaseSerializer] trait, which
implements the def identifier by reading it from configuration based on the serializer's class name:
Scala : @@snip SerializationDocSpec.scala { #serialization-identifiers-config }
@@@
The manifest is a type hint so that the same serializer can be used for different
classes. The manifest parameter in @scala[@scaladocfromBinary]@java[@javadocfromBinaryJava] is the class of the object that
was serialized. In @scala[fromBinary]@java[fromBinaryJava] you can match on the class and deserialize the
bytes to different objects.
Then you only need to fill in the blanks, bind it to a name in your configuration and list which classes should be deserialized with it.
The serializers are initialized eagerly by the @apidoc[SerializationExtension$] when the @apidoc[actor.ActorSystem] is started and
therefore a serializer itself must not access the SerializationExtension from its constructor. Instead, it
should access the SerializationExtension lazily.
<a id="string-manifest-serializer"></a>
The Serializer illustrated above supports a class based manifest (type hint).
For serialization of data that need to evolve over time the @apidoc[SerializerWithStringManifest]
is recommended instead of Serializer because the manifest (type hint) is a @javadocString
instead of a @javadocClass. That means that the class can be moved/removed and the serializer
can still deserialize old data by matching on the String. This is especially useful
for @ref:Persistence.
The manifest string can also encode a version number that can be used in @scala[@scaladocfromBinary]@java[@javadocfromBinaryJava] to deserialize in different ways to migrate old data to new domain objects.
If the data was originally serialized with Serializer and in a later version of the
system you change to SerializerWithStringManifest then the manifest string will be the full
class name if you used includeManifest=true, otherwise it will be the empty string.
This is how a SerializerWithStringManifest looks like:
Scala : @@snip SerializationDocSpec.scala { #my-own-serializer2 }
Java : @@snip SerializationDocTest.java { #my-own-serializer2 }
You must also bind it to a name in your configuration and then list which classes should be serialized by it.
It's recommended to throw @javadocIllegalArgumentException or @javadocNotSerializableException in
fromBinary if the manifest is unknown. This makes it possible to introduce new message types and
send them to nodes that don't know about them. This is typically needed when performing
rolling updates, i.e. running a cluster with mixed versions for a while.
Those exceptions are treated as a transient problem in the classic remoting
layer. The problem will be logged and the message dropped. Other exceptions will tear down
the TCP connection because it can be an indication of corrupt bytes from the underlying
transport. Artery TCP handles all deserialization exceptions as transient problems.
Actor references are typically included in the messages. All ActorRefs are serializable when using @ref:Serialization with Jackson, but in case you are writing your own serializer, you might want to know how to serialize and deserialize them properly.
To serialize actor references to/from string representation you would use the @apidoc[akka.actor.typed.ActorRefResolver].
For example here's how a serializer could look for Ping and Pong messages:
Scala : @@snip PingSerializer.scala { #serializer }
Java : @@snip PingSerializerExampleTest.java { #serializer }
Serialization of Classic @apidoc[akka.actor.ActorRef] is described in @ref:Classic Serialization. Classic and Typed actor references have the same serialization format so they can be interchanged.
The recommended approach to do deep serialization of internal actor state is to use Akka @ref:Persistence.
Akka is using a Protobuf 3 for serialization of messages defined by Akka. This dependency is
shaded in the akka-protobuf-v3 artifact so that applications can use another version of Protobuf.
Applications should use standard Protobuf dependency and not akka-protobuf-v3.
Java serialization is known to be slow and prone to attacks of various kinds - it never was designed for high throughput messaging after all. One may think that network bandwidth and latency limit the performance of remote messaging, but serialization is a more typical bottleneck.
@@@ note
Akka serialization with Java serialization is disabled by default and Akka itself doesn't use Java serialization for any of its internal messages. It is highly discouraged to enable Java serialization in production.
The log messages emitted by the disabled Java serializer in production SHOULD be treated as potential attacks which the serializer prevented, as they MAY indicate an external operator attempting to send malicious messages intending to use java serialization as attack vector. The attempts are logged with the SECURITY marker.
@@@
However, for early prototyping it is very convenient to use. For that reason and for compatibility with older systems that rely on Java serialization it can be enabled with the following configuration:
akka.actor.allow-java-serialization = on
Akka will still log warning when Java serialization is used and to silent that you may add:
akka.actor.warn-about-java-serializer-usage = off
It is not safe to mix major Scala versions when using the Java serialization as Scala does not guarantee compatibility and this could lead to very surprising errors.
A serialized remote message (or persistent event) consists of serializer-id, the manifest, and the binary payload.
When deserializing it is only looking at the serializer-id to pick which @scala[Serializer]@java[JSerializer] to use for @scala[@scaladocfromBinary]@java[@javadocfromBinaryJava].
The message class (the bindings) is not used for deserialization. The manifest is only used within the
@scala[Serializer]@java[JSerializer] to decide how to deserialize the payload, so one @scala[Serializer]@java[JSerializer] can handle many classes.
That means that it is possible to change serialization for a message by performing two rolling update steps to switch to the new serializer.
Add the @scala[Serializer]@java[JSerializer] class and define it in akka.actor.serializers config section, but not in
akka.actor.serialization-bindings. Perform a rolling update for this change. This means that the
serializer class exists on all nodes and is registered, but it is still not used for serializing any
messages. That is important because during the rolling update the old nodes still don't know about
the new serializer and would not be able to deserialize messages with that format.
The second change is to register that the serializer is to be used for certain classes by defining
those in the akka.actor.serialization-bindings config section. Perform a rolling update for this
change. This means that new nodes will use the new serializer when sending messages and old nodes will
be able to deserialize the new format. Old nodes will continue to use the old serializer when sending
messages and new nodes will be able to deserialize the old format.
As an optional third step the old serializer can be completely removed if it was not used for persistent events. It must still be possible to deserialize the events that were stored with the old serializer.
Normally, messages sent between local actors (i.e. same JVM) do not undergo serialization. For testing, sometimes, it may be desirable to force serialization on all messages (both remote and local). If you want to do this in order to verify that your messages are serializable you can enable the following config option:
@@snip SerializationDocSpec.scala { #serialize-messages-config }
Certain messages can be excluded from verification by extending the marker @scala[trait]@java[interface]
@apidocakka.actor.NoSerializationVerificationNeeded or define a class name prefix in configuration
akka.actor.no-serialization-verification-needed-class-prefix.
If you want to verify that your @apidoc[akka.actor.Props] are serializable you can enable the following config option:
@@snip SerializationDocSpec.scala { #serialize-creators-config }
@@@ warning
We recommend having these config options turned on only when you're running tests. Turning these options on in production is pointless, as it would negatively impact the performance of local message passing without giving any gain.
@@@