Power Query: Combining Forces
24 July 2019
Welcome to our Power Query blog. Today, I look at a way to reduce overheads when combining numerous files.
I looked at combining files from a folder in Power Query: One Folder, One Query. In that query, I used the Graphical User Interface (GUI) functions available to combine files.
data:image/s3,"s3://crabby-images/7cc06/7cc069f24a77bbda717fa6d8bd52bd3c7eadd982" alt=""
To get to this point, I used the ‘From Folder’ option which is in the dropdown from ‘From File’, located in the ‘New Query’ dropdown in the ‘Get & Transform’ section of the ‘Data’ tab in Excel.
However, there is another path I can take if I am concerned about the overhead of combining lots of files from a folder. Instead of combining the files, I can choose to ‘Transform Data’.
data:image/s3,"s3://crabby-images/f9225/f9225f0373d736c707778f297f29a2cee180c53e" alt=""
I don’t need the columns of my data at this point; all I want is to keep the Content column. I select this column and right click to remove the other columns. Now I have a single column, I choose to ‘Drill Down'.
data:image/s3,"s3://crabby-images/26a16/26a1636f4e5ed16c9622451cbcc4158be967068c" alt=""
This gives me a list of files:
data:image/s3,"s3://crabby-images/6cc76/6cc7656b99ce58ca2fa6aa23aa25908be2c6e8aa" alt=""
This is useful to me because it means I can use an alternative method to combine my files, namely Table.Combine():
Table.Combine(tables as list, optional columns as any) as table
This returns a table that is the result of merging a list of tables. The resulting table will have a row type structure defined by columns or by a union of the input types if columns is not specified.
Before I combine my files, there is another step I can take, using the M function List.Transform():
List.Transform(list as list, transformation as function) as list
This performs the transformation on each item in the list and returns the new list.
The particular transformation I want to use is Binary.Buffer:
Binary.Buffer(binary as nullable binary) as nullable binary
This buffers the binary value in memory. The result of this call is a stable binary value, which means it will have a deterministic length and order of bytes.
What I am doing is to make the storage of my binary files more efficient. I can apply the required M code to the formula bar.
data:image/s3,"s3://crabby-images/976d9/976d9560e24dd806f224918b89651ec7c818399d" alt=""
This has no impact visually, since the function is working on the way the binaries are stored in memory.
I now have a list of binaries which are stored efficiently. In order to use Table.Combine(), I need a list of tables. The easiest way to achieve this is to create a transformation that List.Transform() can use. I do this by creating a function. I save my current query as ‘Combining_Files’ and create a new blank query.
My function is a version of Csv.Document() which receives a parameter:
Csv.Document(source as any, optional columns as any, optional delimiter as any, optional extraValues as nullable number, optional encoding as nullable number) as table
data:image/s3,"s3://crabby-images/445dd/445dd4e46be05848ff051e39150d4e4570b9f875" alt=""
The M code I have created is:
=(salesfile as binary) => Csv.Document(salesfile, [Encoding=1252])
The “1252” encoding is explicitly requiring single-byte characters that are the defaults for English and other Western languages’ versions of Microsoft Windows.
Having created my function, I can apply it to Combining_Files.
data:image/s3,"s3://crabby-images/631c7/631c7103881a4aaf59a2e81e9b3ea6b5f2c2ad67" alt=""
Once I click to execute this step, my column is transformed:
data:image/s3,"s3://crabby-images/ef0fc/ef0fcd26d9bc0c0879888d90c7b6de060d91e886" alt=""
I now have a list of tables, and I can use Table.Combine().
data:image/s3,"s3://crabby-images/46783/46783a6edd3ff8723a9917cd7e6d78e9925f5373" alt=""
When I execute this step, my files are combined.
data:image/s3,"s3://crabby-images/cfc2b/cfc2b36ab32c7975e419cfb087461146320bb9b5" alt=""
I can now transform my data. I can now apply the usual transformations to remove blank rows / headers and fill down where necessary.
Come back next time for more ways to use Power Query!