![]() ![]() TL DR on primary key support across warehouses īigQuery + Databricks don’t support primary keys, Redshift + Snowflake support primary keys but don’t enforce them fully, and Postgres fully supports + enforces primary keys. Let’s walk through primary key support + access across the major cloud data warehouse platforms. Does your warehouse support primary keys? ĭoes your warehouse even support primary keys at all? If it does, how can you actually find out if a table has a primary key set, and what that primary key is? ![]() Having tests configured and running in production using the dbt test command unlocks your ability to do things like send Slack alerts on test failures, so you’ll be the first to know when PK issues arise. It really is as simple as adding those two tests to the primary keys of all of your tables, and then you have a built-in safeguard against bad data in your primary keys. Put together, a not_null + unique test would look like so: yml configuration files, so you can define a batch of tests across models from one file. These tests are specified inline in your models’. Not surprisingly, these two tests correspond to the two most common errors found on your primary keys, and are usually the first tests that teams testing data with dbt implement: Today, you can add two simple dbt tests onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data. This has lead to a lot of unnecessary angst and loss of trust in data. In the days before testing your data was commonplace, you often found out that you had issues in your primary keys by you (or worse, your boss) noticing that a report was off. There are rows where the primary key is not unique (duplicate values)Īs you’ll see below in the “warehouse support for PKs” section, while some warehouses allow you to define primary keys, but will not enforce either not-nullness or uniqueness on values.There are rows where the primary key is null.See, most of the time that primary keys get messed up, they do so because: This is, of course, what makes your primary keys so powerful. It’s one of the most common causes of time-sensitive headaches in analytic-land. Invalid data getting into your primary keys is one of the biggest data problems - it can lead to rows getting dropped or miscounted and for all kinds of weird results to pop into your data. ![]() Why do we test primary keys? īut what happens when blank or duplicate data finds its way into your primary keys? As I mentioned up top, it can create quite a firedrill. There is just about no more ironclad law in the land of analytics than you must have a primary key on every table. Without a primary key, you’ll find yourself constantly struggling to identify duplicate rows and define the expected grain of your tables. Primary keys are critical to data modeling. A primary key is a column in your database that exists to uniquely identify a single row. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |