Get started with SQLMesh

Various tools have introduced software engineering practices to data engineering, but gaps remain, particularly in areas like testing and workflows. SQLMesh aims to break new ground in these areas, addressing challenges where competing products like dbt have yet to offer robust solutions.

In this article, I’ll provide a quick introduction to SQLMesh, sharing insights into its capabilities and potential impact.

The setup

I’ve created an example SQLMesh project available on GitHub, which you can use to follow along with the examples in this article.

git clone https://github.com/data-max-hq/sqlmesh_example
cd sqlmesh_example

python3 -m venv .venv
source .venv/bin/activate

# SQLMesh uses extras to install specific modules 
# that are used in the project. 
# This allows you to quickly extend SQLMesh 
# with the desired capabilities 
# and at the same time ensure a lean footprint.
pip install "sqlmesh[postgres,web]"

In the README.md of the Github project you will also find a more comprehensive setup guide to help you set up the project and it’s dependencies.

The scaffold

If you have prior experience with dbt, the SQLMesh project scaffold will feel familiar. The developers have ensured that SQLMesh is backward-compatible with dbt, allowing for a seamless migration of existing dbt projects.

config.yml — At the root level of your project, you’ll find a straightforward YAML-based SQLMesh configuration file. This file is concise and primarily focuses on setting up the gateway(s) and defining the default SQL dialect for models.
Audits — Validating data is an essential capability that should give both the producer and the consumer of the data the confidence that the data fulfills the expectations. A capability that dbt also offers under the name “data test”.
Macros — Another feature now considered standard is the ability to enhance static SQL statements with programming-like functions, such as if statements, for loops, and filter expressions. This functionality is typically achieved through a macro system, as seen in dbt with Jinja, a popular Python-based templating engine. SQLMesh also supports Jinja but goes a step further by offering its own macro system. Unlike simple string replacement, it interprets SQL semantics, making it a more powerful and flexible tool. In my opinion, this added capability is worthwhile not just for its elegant syntax but also for its robust functionality.
Models — What sets SQLMesh apart in defining data models is how metadata is managed. Instead of using a separate YAML file, the metadata is defined directly within the SQL code. During my testing, I found this approach particularly intuitive and developer-friendly, making the modeling process smoother and more efficient.
Seeds — It might not be a frequent occurrence, but we all know it’s bound to happen eventually: integrating CSV or JSON-based datasets. To address this, SQLMesh, like dbt, includes support for seeds. In SQLMesh, the dataset file is stored in the designated folder, and the corresponding model is then defined in the Models folder.
Tests — I wholeheartedly appreciate SQLMesh’s approach to implementing unit tests for data modeling. Rather than traditional auditing, it focuses on a straightforward input-versus-output check that can be executed quickly and easily without significant dependencies. This method not only accelerates the feedback process but also equips data engineers with a practical tool to adopt concepts like test-first development in the context of data. Since version 1.8 dbt also offers “unit-tests”.

The workflow

This is where another strength of SQLMesh truly shines. The developers have tackled a critical challenge in dataset development: managing multiple versions of a data model.

SQLMesh introduces the concept of Virtual Data Environments, which operates on two levels: physical and virtual.

(source: https://tobikodata.com/virtual-data-environments.html)

At the physical level, when a new project begins and the plan is executed for the first time, a snapshot (version 1) is created for each model. After all tests and audits are successfully completed, the virtual level comes into play. A view is created for the respective environment, pointing to the latest snapshot. This approach ensures effective version management and seamless updates.

# sqlmesh plan [environment name]
# if you provide no name it will 
# by default use "prod" as environment.
sqlmesh plan

When explicitly specifying an environment, such as “dev”, the process will create the virtual layer for dev and will point the view on the latest snapshot version of the model.

sqlmesh plan dev

When a change is made to a model, SQLMesh employs a sophisticated fingerprinting system that not only analyzes the model itself but also the entire dependency tree. This ensures that all affected models are validated. If the change is first applied in the ‘dev’ environment, a new snapshot version is created after successful testing and auditing, and the view pointer in the ‘dev’ environment is updated to the new version.

In the event of an operational error, a rollback to the previous snapshot version can be easily performed with a single command, without the need to reprocess data. That’s ingenious!

sqlmesh rollback

If you’re wondering what happens to old snapshots that are no longer in use, SQLMesh has a solution: a garbage collector called Janitor. This process runs automatically with every plan execution or can be triggered manually via a CLI command when needed.

sqlmesh janitor

Conclusion

To enhance the quality of data engineering, it’s essential to adopt established software engineering practices and principles. SQLMesh is making significant strides in this direction, introducing innovative concepts like the Virtual Data Environment. I also appreciate its focus on improving testing efficiency through its unit test approach. In practice, I found it particularly convenient to define mocks in YAML format and validate the logic of SQL models quickly and effectively, without heavy dependencies.

However, like any new contender entering an established market, SQLMesh must prove itself against the industry leader, dbt. While I believe SQLMesh has strong potential to carve out a solid position, it remains to be seen whether dbt users with extensive usage will transition to SQLMesh. That said, I recommend taking a closer look at SQLMesh’s concepts and evaluating the value they could bring to your organization.

At Data Max, we specialize in the development and operation of advanced data and AI systems. We’re here to support you in evaluating new tools and concepts, helping businesses of all sizes enhance their data literacy and drive continuous improvement.

Get started with SQLMesh

The setup

The scaffold

The workflow

Conclusion

Recent Posts

Comments

Services

Sectors

Topics