Power Query: PDF Pandemonium – Part 5
13 October 2021
Welcome to our Power Query blog. This week, I continue transforming some data that is coming in from a PDF file, turning my attention this week to function development.
You are possibly over the tent business, but it continues to do well, and the UK division maintain their plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I created a query which I am planning to convert to a function to transform the Stores data.
I am ready to create the transformations required and convert this query. I start by getting rid of any empty rows. I can do this by using ‘Remove Blank Rows’ from the Home tab:
This simplifies my table: for this store it is one row, but for other stores there may be more.
I need the store information to be a separate column. I can do this by employing the ‘Unpivot Column’ feature from the Transform tab.
I now have a column with the store data, and one for the pay scales and percentage increase of workforce. I will rename these columns later.
I need to keep in mind that this function needs to cope with the data for all the stores. This means I need to carry out some replacements:
- ‘All’ needs to be replaced by ‘A,B,C’
- ‘ and ‘ needs to be replaced by ‘,’
I can then use comma (,) as my delimiter to split up the pay scales. I begin with ‘Replace Values’ on the ‘Transform’ tab, or else by right-clicking:
First I replace ‘All’:
Then I replace ‘ and ‘:
My data is now ready to be split:
I am going to split the column into pay scales and increase percentage. I can do this by right-clicking, and choosing to ‘Split Column’ ‘By Non-Digit to Digit’:
This gives me an extra column.
Now I can split up the pay scales: I will do this using Comma as a Delimiter, and I will choose to ‘Split into Rows’ in the advanced options.
This gives me more rows:
I keep the automated ‘Changed Type’ step and rename the columns.
My query is ready to be converted to a function. I need to parameterise, so I view the M code in the Advanced Editor, available from the Home tab.
The only line I need to change is the Source step. I am going to introduce a parameter p_store, which will receive any of the store columns as a list. The M code before the ‘let’ statement will be:
(p_store as list) =>
and the Source step will change from:
Source = #"Store 1",
to:
Source = p_store,
As soon as I click ‘Done’, Power Query recognises that my query is now a function.
It is prompting me for a column (field). This means that this function will now create a table from any of the store columns in the Stores query.
I can test the function on Column1:
Invoking this query will give me a table:
This looks good – next time I will apply this to all the columns with store data in the Stores query.
Come back next time for more ways to use Power Query!