ComplianceDB is a DevOps Compliance Journal for storing a record of compliance controls. It helps financial institutions, medical device manufacturers, automotive and other mission-critical development teams to prove conformance to their software process.
A key concept in ComplianceDB is the immutable, append-only journal which provides a tamper proof audit trail. This tech blog explains how we came to design the durable storage technology in ComplianceDB.
Persisting Compliance Data
As a tool for recording software process data automatically, ComplianceDB works by providing a REST API to devops pipelines. These pipelines can then communicate with ComplianceDB via scripting languages, curl commands, and more.
At its core ComplianceDB is a database, and the data must be stored in a durable manner. Given the mission critical nature of the domain, trust in the integrity of the data forms a key part in the value proposition.
It is tempting to think the unique requirements of audit would lead us to implement a custom persistence solution. But writing databases is hard, so first we surveyed the proven persistence technologies that might fit for our purpose:
- Relational Databases
- Document databases
When adding a database to an application, the natural default is to use a relational database such as Postresql or MySQL. These tried-and-trusted technologies are very well suited to the record-based data required by many applications.
However compliance data has some unique requirements that are hard to implement in relational databases.
The purpose of an audit trail is to prove that a process is followed by providing evidence. We want the proof to be immutable, i.e. it shouldn’t be possible to change or tamper with evidence without knowing what and when it has changed.
And secondly, as a compliance database, we don’t know the structure of evidence users will want to record as proof. Shoehorning user data into table structures is not a good fit.
So the strengths of RDBMS, namely table-based mutable data, are actually weaknesses in this domain.
Moving closer to the evidence needs, there is an alternative database approach that might be more appropriate: the Document-oriented database. Popular examples are MongoDB and Amazon DocumentDB.
Document-oriented databases are great for storing the kind of semi-structured data we use for compliance, but they still lack the immutability that we would like. To add versioning and immutability to a document database solution is hard to secure and complex to implement.
Going further down the rabbit hole, a new technology which solves the problem of immutable journaling has arrived: blockchain.
In fact, the most popular applications of blockchain are to record transactions of different types in a distributed, decentralized public ledger.
It’s strength is in proving transactions in a zero-trust model. However, in the case of compliance data, what we would like is a centralized record of truth of evidence rather than transactions.
What we’d really like is a centralized, efficient, ledger designed for versioning documents….
Enter git. Git provides the same append only cryptographic basis for a journal as blockchain, but also the document oriented foundation for schema less data that our domain generates.
Sidenote: Similar to blockchain, git uses Merkle Trees to encrypt an immutable append only journal. However we represent projects in compliancedb using a git repository with a single branch so technically we are using Hash Chains rather than Merkle Trees, but hey-ho ¯_(ツ)_/¯
In addition to this, git is an open standard that integrates well with other development tools and leads to interesting possibilities for further integrations. As an added bonus, developers understand git. As a development tool it fits neatly into everyone’s existing knowledge.
By mapping our domain model onto filesystem and git operations, we can get an immutable, append only journal, together with versioned updates, all in an open standard format.
Persistence technologies all have a context in which they provide the best solution. Each of the database options we looked at are best-in-class for a given problem domain, however for our application git was the best fit.
Since developing ComplianceDB, we have found git to be an extremely secure and flexible foundation for our solution.