Referential Integrity
- Chris Barber

- Oct 12
- 3 min read
Referential integrity is a common problem encountered when data modelling which can impact understandability, performance, reliability, scalability, extendibility and security. This article covers:
What is referential integrity
How referential integrity
How to identify referential integrity issues
What is Referential Integrity
Referential integrity is defined by Kimball and Ross as:
When all the keys in the fact table correctly match their respective primary keys in the corresponding dimension tables, the tables satisfy referential integrity
DataWarehouse toolkit 3rd edition
As such, when keys do not match the model is said to have referential integrity issues. These issues fall under 2 types:
Missing surrogate keys
Missing foreign keys
In both the above cases, Power BI doesn't stop you from building calculations or visuals, but compensates by showing values which don't have a corresponding Surrogate Key or Foreign Key against a blank line. This means that totals still add up, but you have some figures which are unexplainable by the model. Below uses the Hospital example to explain the issues:
Hospital Example: Referential Integrity

Issue 1: Missing Surrogate Keys
The Hospital dimension is missing a surrogate key; the key 100010 appears within the "Hospital ID" foreign key, but not within the "Hospital ID" surrogate key. Therefore, when analysing any measures in the fact by any hospital attribute, there will be unexplained values. For instance, if evaluating by "Hospital Name" some values will show against "Stratford" and the remaining will show against a blank line.
Issue 2: Missing Foreign Keys
The Hospital fact table is missing a foreign key; all keys appear within the "Department ID" surrogate key, but not within the "Department ID" foreign key. Therefore, when analysing any measures in the fact by any department attribute, there will be unexplained values. For instance, if evaluating by "Department Name" some values will show against "Maternity" and "Dermatology", but the remaining will show against a blank line.
How Referential Integrity Affects our Non-Functional Requirements
Business users get figures appearing in their reports that may be incomplete and will not understand what the blank values mean or why the total doesn't represent the sum of the known dimension.
Performance
Power BI will not perform as well with Referential Integrity issues. The query patterns the engine sends will utilise left outer joins rather than inner joins which are less performant.
Reliability
Figures may disappear and re-appear if keys get added
Scalability and Extendibility
Any attempt to scale or extend a model with referential integrity exacerbates the situation. For instance, adding more rows into the fact when you've got issues with existing rows, or adding columns to dimensions which already have referential integrity gaps just leads to more integrity issues and more confusion for end users.
Security
From a NFR security perspective, if you have Referential Integrity issues with your security table then users won't be able to access their figures. It is, therefore, critical that you don't have any issues on tables which you're using to propagate security throughout your model
How to spot referential integrity issues
DAX Studio - the free third party add-in for Power BI - allows you to analyse the model and it flags any referential integrity issues using the "View Metrics". It is strongly recommended that this is done before releasing any model to users.
Conclusion
Not ensuring referential integrity in your model is one of the big errors that (particularly beginner) data modelers make when building their Power BI report. It is best to tackle these head on and resolve any issues before going into a Production environment.



Comments