Cosmos DB has loads to offer, which make it a preferred choice in so many scenarios. Recently I was working on an audit log which is expected to be reasonably immutable (theoretically, every information on digital media is mutable). After looking at a few choices, we chose Cosmos DB along with Azure Service Bus and Key Vault (AKV).
Other Options Considered:
Blockchain platforms like Ethereum could have been a good option, but that would be an overkill for our requirements. Unlike blockchain’s proof-of-work, the scenario I was working on was more like “proof-of-access”.
I am not sure if this is a known term, pattern or concept, but it is something that fits quite well with our requirements. It relies on the audit system to be the only system that has access to the secret being used to sign the content and its hash.
So we have created this application which would use CosmosDB as a data store in the background. The systems that need to write to this immutable data store will have to send a command message. This message will be stored in two forms:
Signed Document: To ensure that the content of any individual document is not altered.
Hashed Block List: To ensure that any individual document is not deleted from the list.
This is done in two phases:
A subscriber listens to the commands sent by various systems and signs the content in the commands with a secret from AKV. It then adds the signature to the document before storing it in the Cosmos DB collection, let’s call it “AuditLogs”
I have found this blog post a quite good read on how to use AKV to sign and verify content.
Cosmos DB Triggers and Change feed are the tools that are embraced in this phase. Change Feed listens every 2 seconds to the documents being added to the “AuditLogs” collection. For every 10 documents, it creates a block. For each block:
- It verifies the signature of each document in the block.
- Calculate the hash of the block and add its signed copy to the block itself.
These blocks are then stored in the collection called “HashBlockList”.
The following picture illustrates this process:
Hashed Block Chain
Taking inspiration from the BlockChain platforms, this collection is a flavour of the linked lists data structure. Every block stores the signed hash of the previous block which is then used in calculating the hash of that block.
Block Hash = Documents + Hash Of the Previous Block
- Block 1 (Documents)
- Block 2 (Block 1 Hash + Documents)
- Block 3 (Block 2 Hash + Documents)
- and so on
Now if you alter any block e.g. Block 2, its hash will not be the same anymore. Therefore, Block 3’s copy of the hash of block 2 will not be in sync with it. And if you now change Block 3’s copy of Block 2 hash, then Block 4 will not be in sync with it. This would mean you will need to change the entire list. Furthermore, the hash is signed with the secret that is only accessible to this application only.
One of the challenges was to ensure this list does not end up in more than one branches of this linked list. Cosmos DB triggers came to the rescue here. Using a trigger I could implement a strategy to append document sequentially. I have used a document to store meta information about the list (i.e. First Block, Last Block Ids), this document helped to ensure that only one document can be appended at a time.
NOTE: Microsoft has just released an immutable data store. Had it been there when we were developing the above solution, that could have saved us some time and effort. But in a hindsight, developing this solution was an interesting exercise and quite enlightening in many ways.