Power Query: Combining Forces
24 July 2019
Welcome to our Power Query blog. Today, I look at a way to reduce overheads when combining numerous files.
I looked at combining files from a folder in Power Query: One Folder, One Query. In that query, I used the Graphical User Interface (GUI) functions available to combine files.
To get to this point, I used the ‘From Folder’ option which is in the dropdown from ‘From File’, located in the ‘New Query’ dropdown in the ‘Get & Transform’ section of the ‘Data’ tab in Excel.
However, there is another path I can take if I am concerned about the overhead of combining lots of files from a folder. Instead of combining the files, I can choose to ‘Transform Data’.
I don’t need the columns of my data at this point; all I want is to keep the Content column. I select this column and right click to remove the other columns. Now I have a single column, I choose to ‘Drill Down'.
This gives me a list of files:
This is useful to me because it means I can use an alternative method to combine my files, namely Table.Combine():
Table.Combine(tables as list, optional columns as any) as table
This returns a table that is the result of merging a list of tables. The resulting table will have a row type structure defined by columns or by a union of the input types if columns is not specified.
Before I combine my files, there is another step I can take, using the M function List.Transform():
List.Transform(list as list, transformation as function) as list
This performs the transformation on each item in the list and returns the new list.
The particular transformation I want to use is Binary.Buffer:
Binary.Buffer(binary as nullable binary) as nullable binary
This buffers the binary value in memory. The result of this call is a stable binary value, which means it will have a deterministic length and order of bytes.
What I am doing is to make the storage of my binary files more efficient. I can apply the required M code to the formula bar.
This has no impact visually, since the function is working on the way the binaries are stored in memory.
I now have a list of binaries which are stored efficiently. In order to use Table.Combine(), I need a list of tables. The easiest way to achieve this is to create a transformation that List.Transform() can use. I do this by creating a function. I save my current query as ‘Combining_Files’ and create a new blank query.
My function is a version of Csv.Document() which receives a parameter:
Csv.Document(source as any, optional columns as any, optional delimiter as any, optional extraValues as nullable number, optional encoding as nullable number) as table
The M code I have created is:
=(salesfile as binary) => Csv.Document(salesfile, [Encoding=1252])
The “1252” encoding is explicitly requiring single-byte characters that are the defaults for English and other Western languages’ versions of Microsoft Windows.
Having created my function, I can apply it to Combining_Files.
Once I click to execute this step, my column is transformed:
I now have a list of tables, and I can use Table.Combine().
When I execute this step, my files are combined.
I can now transform my data. I can now apply the usual transformations to remove blank rows / headers and fill down where necessary.
Come back next time for more ways to use Power Query!