Power Query: Merging Matters
17 March 2021
Welcome to our Power Query blog. This week, I look at solving a problem with a nested join and the M function Table.Buffer().
It’s time for that evergreen tent data:
I want to get the descriptions in the same table as the charges data. In order to do this, I will be merging my data in a couple of different ways, and then I will look at a way to make this merge more efficient.
I start by extracting my tables to Power Query. I will use the charges data as an example, but the process is the same for descriptions.
I start by using ‘From Table / Range’ on the ‘Get & Transform’ section of the Data tab. I accept the usual defaults to create a new query.
To simplify the connection between my tables, I create an index column which I will call Table_Key.
I name my table Charges, and only create the connection for my table for now, so that I avoid loading data until I am ready.
I can now repeat the process for my Descriptions table.
The most obvious way to bring the data together is to merge the tables. The merge option is in the Home tab, and I choose to ‘Merge Queries as New’ as I wish to keep my queries.
I can then choose how to merge my tables.
This will create a new table with links from the Charges table to the matching row in the Descriptions table.
I can then expand the Description column to get the data I need.
This will give me the result I want, but I would like to investigate using a column to get the same result instead.
If I go back to my Charges query, I create a duplicate of the query and create a custom column to pull in the data I need from Descriptions.
I click ‘OK’ to view the results.
This custom column tries to point to the Descriptions table, but I am getting errors because although the syntax is fine, Power Query needs more help to get the field from the correct table. I need to go into the Advanced Editor and view the code to see what is happening.
The M code generated is:
Let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Order_Key", Int64.Type}, {"Order_Line_Number", Int64.Type}, {"Charge_Line_Number", Int64.Type}, {"Amount", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom",
each Table.SelectRows(Descriptions,
Value.Equals(Descriptions[Table_Key],[Table_Key])))
in
#"Added Custom"
I have separated the lines
each Table.SelectRows(Descriptions,
Value.Equals(Descriptions[Table_Key],[Table_Key])))
because I am going to add some more code to this.
The part that is confusing Power Query is ‘each’. In this context, each can only mean Charges, but I need it to consider Description as well. I will create two variables, eachCharge and eachDescription. I am going to create a nested join.
(eachCharge) =>
Table.SelectRows(Descriptions,
(eachDescription) =>
Value.Equals(eachDescription[Table_Key], eachCharge[Table_Key])))
The whole M code now looks like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Order_Key", Int64.Type}, {"Order_Line_Number", Int64.Type}, {"Charge_Line_Number", Int64.Type}, {"Amount", Int64.Type}}),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Changed Type", {{"Order_Line_Number", type text}, {"Order_Key", type text}, {"Charge_Line_Number", type text}}, "en-GB"),{"Order_Line_Number", "Order_Key", "Charge_Line_Number"},Combiner.CombineTextByDelimiter(":", QuoteStyle.None),"Table_Key"),
#"Added Custom" = Table.AddColumn(#"Merged Columns", "Description",
(eachCharge) =>
Table.SelectRows(Descriptions,
(eachDescription) =>
Value.Equals(eachDescription[Table_Key], eachCharge[Table_Key])))
in
#"Added Custom"
If I now apply these changes,
I have a column of tables which can be expanded to give me the description.
However, this is not evaluating quickly enough for my purposes. I have a large dataset, and this method is proving less efficient than the simple merge. Since I am working with flat files, and not a relational database, I can use Table.Buffer() as I have no query folding.
Table.Buffer(table as table) as table
This buffers a table in memory, isolating it from external changes during evaluation.
I am going to hold the contents of Descriptions in RAM, so that it is not affected by any external changes. To do this, I need to add a step, and I will do this in the Advanced Editor. The line I will add is before the changes I just made to the Added Custom step:
BufferedTable = Table.Buffer(Descriptions)
I need to change the reference to Descriptions to use BufferedTable instead:
BufferedTable = Table.Buffer(Descriptions),
#"Added Custom" = Table.AddColumn(#"Merged Columns", "Description",
(eachCharge) =>
Table.SelectRows(BufferedTable,
(eachDescription) =>
Value.Equals(eachDescription[Table_Key], eachCharge[Table_Key]))),
When I apply these changes,
the merge step evaluates more quickly. However, Table.Buffer() should be used with caution, especially with large datasets. The buffering uses RAM, and the amount of RAM that is held by Power Query is limited, which means that if the limits are exceeded, the RAM has to come from the hard drive, which could seriously slow things down. Therefore, Table.Buffer() should only be applied where necessary, and models using it should be tested before full scale implementation.
Come back next time for more ways to use Power Query!