New! Check out our free BigQuery cost analyzer - built with Hashboard BI-as-code

Back to Blog

Your data artifacts should be immutable

Anna Nixon
Anna Nixon
April 4th, 2024
Your data artifacts should be immutable

At Hashboard, we're constantly exploring innovative ways to enhance data tooling for both end users and data engineers alike. Recently, we've been tinkering away to create versioning features that are flexible and friendly to users from a range of technical backgrounds. But to pull it off, we realized we needed to go back to the drawing board with some of our core data structures.

Relationships between data artifacts

In Hashboard, we conceptualize data artifacts as a graph. At the core lies the data model, a lightweight semantic layer defining attributes and aggregations (measures) for user exploration. These models interconnect via join relationships. Users can derive key metrics and data visualizations (saved explorations) from these models. Finally, at the leaf of the project graph, reside dashboards—comprising a collection of metrics, views, and additional contextual information for the data.

In the past, these resources were all stored as a flat list of mutable data structures, without any tracking of changes over time. This caused a problem: If you made a change to an upstream resource, causing an impact on downstream resources, the system had no way to track that or surface that impact to the user.

So, we decided to change the underlying data structures to better represent the relationships between resources. In the new design, a project is now stored as an immutable graph, containing pointers to the most up to date version of each resource. In this way, every change is tracked in our persistence layer, so resource history can be traced for everyone, without the user having to lift a finger.

How does it work?

Each time a resource is created, deleted or updated, we create what we call a ‘ Change Set’ — essentially, the diff that was applied to the project state to make an update. That diff can then be applied to the project, which is now just a dictionary of pointers from resource ID to the most current version of the resource.

Storing the project itself as a graph unlocks a bunch of technical wins for us as engineers. It provides stronger guarantees on the correctness of a project and lets us ensure that dead resources get cleaned up so that your project isn’t littered with unusable views and dashboards.

Updating a model column that is used by dashboards and views downstream? We can surface what resources will be affected. When a change is applied, those resources are cleaned up instead of being left unusable. Change sets now allow us to batch these updates together, making updates in Hashboard more efficient and more transparent.

What does this mean for our users?

With our new change management architecture, users get an in-depth look into their project history like never before. They can easily track every change made to their resources and see any bulk updates that occurred. This feature makes it a breeze to pinpoint any changes that might have caused unexpected issues downstream and to rebuild resources if needed.

Now Hashboard users can see line by line changes to all the resources in their project and can also revert to previous configurations directly from the web application.

Users can also see grouped updates when changes to a single resource have cascading downstream effects allowing for better visibility into the origin of changes.

What's more, this revamped architecture offers a unique level of control and visibility into a project’s history. Hashboard not only supports Git-based version control but now also provides an alternative method with minimal overhead, catering to a variety of user preferences. It also allows users to easily track changes that have cascading effects, giving them a comprehensive view of project evolution that's unmatched by its counterparts.

Where are we going next?

These architectural changes opened up a huge range of possibilities for us to build out features such as restoring resources in the UI and undoing destructive changes without ever having to interact with Git. Power users still have the ability to utilize familiar Git workflows to maintain core resources but even non technical users will now have an easy way to get insight into the effects of their changes and travel back in time to view their project history.

These architectural changes will also play a core part in bringing performance improvements to our existing CLI workflows, enabling safe automated project clean up, as well as introducing new workflows that will allow users to draft and preview changes within the UI. This change represents a significant advancement for us and we’re excited to continue bringing powerful features to your fingertips in the future.