Dual Writes often happens in a microservices based architecture. Whenever a piece of data changes in one place, we need to persist or react on it on multiple places.
Imagine a user bought a product on our website and we need to save order in the database and inform a delivery service that they should prepare the order, and because we keep the count of products left in stock de-normalized in a cache, we want to decrease the number of available products. The delivery service is a separate application and communication happens using a queue (it may be Apache Kafka), cache is distributed and it may be Redis.
After the entry is saved and in the database, caches are updated and before message is persisted in the queue our system can fail, application may restart, OS can crash, someone accidentally can unplug the wrong computer from power…
After the fail occurs, we are in an inconsistent state, we have a order but the Delivery department was not informed that they should prepare that order, the bad thing is that we may not even observe this until custom will complain about it.
Another similar problem that can occur with dual writes is when two clients, concurrently, are trying to update the same key:
First App is sending a request to the first DB to update count value to 1, then it is sending a request to the second database in the meanwhile, the second app is updating the same member but with another value and it’s doing this in another order, you can observe that after updates are applied data is not consistent in both data stores.
These are real problems that can occur in real world applications and problems that are caused are not easy to debug. You may argue that these problems were solved decades ago by transactions “A” from ACID, this is true but it doesn’t work for distributed applications. Some systems support distributed transactions but most of them don’t support it.
How do we make sure that all data end up consistent in all the right places?
Dual writes isn’t the solution because it can introduce inconsistency because of partial failure or of race conditions.
A possible simple solution is to keep all writes in a fixed order and apply them in that order in all the places they need to go.
If you keep this queue append only and reduce any concurrency you have removed the potential for race conditions.
So the proposed solution is to keep a log when whenever something changes, you append it to the log and then there are consumers that are looking to get changes from the queue and apply to their data store.
You can use Apache Kafka as the queue store (it also supports transactions in order to guarantee exactly once delivery) and then apps are connecting using different consumer-groups to the topic, consume messages and apply changes, they will commit the message to Kafka once they are sure that data was updated to their data store.