Disclaimer!
This blog post contains my thoughts surrounding using prime numbers for performance optimisation. I am not advocating you do this, nor am I claiming that I have all the answers. I have also not done any extensive testing outside the examples shown in this article.
My thought process
Lower cardinality (number of unique rows) and value encoding can help improve the performance of our DAX measures
Natural numbers can be expressed as a product of a prime number, i.e., 1 = 2⁰ or 2 = 2¹ or 8 = 2³ or 12 = 3² + 3¹
Therefore, instead of a fact table with a single "value" column of natural numbers, a fact table could be created with 2 columns: 1 containing prime numbers, and 1 containing the power
The cardinality of the prime numbers column in the fact table will typically be lower than the cardinality of the single "value" column, i.e., the values 1, 2, 3, 4, 8, 9, and 16, can all be achieved using just prime numbers 2 and 3 plus the required power
The cardinality of the power column will typically be low, i.e., 2²⁰ = 1,048,576 and 3²⁰ = 3,486,784,401, so the number 20 already gets us above values of 1m when the lowest prime number is used
The number of rows in the fact table with the prime numbers will never be lower than the fact table with a single "value" column (and typically will be higher).
Therefore, where we have natural numbers which we only want to perform sums on, we can trade off increasing rows in our fact table for typically reducing cardinality and increasing the probability of value encoding
Benefit
The below figure shows the cardinality of a fact table with a single "value" and the cardinality of a fact table with the single "value" broken into the prime number and power:
The prime number table is significantly smaller (8% of the original value fact table in total size) and the prime number column is value encoded.
Challenges
1) Converting results to prime numbers
The "value" has to be converted into the prime number and power. For simple examples this is easy, i.e., 4 = 2². However, this quickly gets more complicated, i.e., 28 = 3³ + 2⁰. There are also many ways of calculating certain numbers, i.e., 28 = 2⁴ + 3² + 3¹.
2) The number of rows will likely increase
In the above example, 28 can be contained in a single row in the "value" fact table. The most efficient conversation of 28 from a number of rows perspective is 3³ + 2⁰; this results in 2 rows in the prime number fact table.
3) Callback
The DAX using the prime numbers method creates a callback:
However, the DAX using multiplication does not create a callback:
Conclusion
This approach could optimise sum calculations for natural numbers; cardinality is likely to be lower and value encoding is more likely. On the downside, the number of rows will likely increase.
There are, however, some fundamental hurdles:
An optimised process (Fabric notebook?) for converting from the "value" to the prime numbers and power
Storage engine to accept power calculations of a column as well as multiplication with other columns without requiring a callback
Potentially applications include using credits and debits as natural numbers, these:
do not contain negatives as a negative credit is a debit
are sum calculations
There are also potential applications for when we need a high level of precision, i.e., 20 decimal places. In these cases, decimals could be converted to natural numbers using multiplication and back again using division.
As mentioned at the beginning, I don't have all the answers. I am, however, interested in any comments you may have!
Comments