Power Query: Intelligent Folding
26 May 2021
Welcome to our Power Query blog. This week, I look at how Power BI Desktop uses query folding in the Power Query engine.
Last time, I looked at what query folding in Power Query is and how it can make queries more efficient. This time, I wish to consider some of the circumstances where Power BI Desktop needs query folding to take place. The requirements for query folding depend upon the storage mode for the table linked to the query.
The storage mode describes how Power BI caches the data. It can be viewed or set from the Model view in Power BI Desktop.
For example, in the image below, I have the table ‘Customers Access’, which I have created by importing it from an Access database. Whilst I can view the current storage mode by hovering over the table, the option to potentially change it is hidden in the Properties pane under the Advanced options.
I can see that the options for Storage are Import, DirectQuery and Dual:
Microsoft describes the three options as follows:
- Import: imported tables with this setting are cached. Queries submitted to the Power BI dataset that return data from Import tables can be fulfilled only from cached data
- DirectQuery: tables with this setting aren't cached. Queries that you submit to the Power BI dataset (for example, DAX queries) and that return data from DirectQuery tables can be fulfilled only by executing on-demand queries to the data source. Queries that you submit to the data source use the query language for that data source, for example, SQL
- Dual: tables with this setting can act as either cached or not cached, depending upon the context of the query that's submitted to the Power BI dataset. In some cases, you fulfil queries from cached data. In other cases, you fulfil queries by executing an on-demand query to the data source.
The reason that the other options appear greyed out for the ‘Customers Access’ query is because once set to Import, a table cannot be changed to DirectQuery or Dual. DirectQuery and Dual are only available for databases where Power Query supports the native database query. MS Access is supported for query folding, but you are unable to access the database using a native query.
If I have a table that is DirectQuery or Dual, then the associated Power Query query must achieve query folding. This means that the whole query can be defined by one SQL statement.
Queries are most commonly in Import mode, and the recommendations from last week’s blog apply here. Ideally the whole query should be foldable.
If there is a step which is preventing query folding, then there are a couple of approaches that may be taken. It may be possible to transform the data in the original data source so that the step is not needed. If this is not possible, that step could be moved to a native query. There are some limitations to this according to Microsoft:
- for a DirectQuery model table, the query must be a SELECT statement, and it can't use Common Table Expressions (CTEs) or a stored procedure
- incremental refresh cannot use a native SQL query; it would force the Power Query mashup engine to retrieve all source rows, and then apply filters to determine incremental changes.
Native SQL queries have the potential to change or delete data. In order to prevent this, the account running the query should only have read access.
In summary, when using relational databases as a source for Power BI then query folding is desired, but for DirectQuery and Dual mode, it is required.
Come back next time for more ways to use Power Query!