Power Query: PDF Pandemonium – Part 2
22 September 2021
Welcome to our Power Query blog. This week, I start to transform some data from a PDF file.
The tent business is doing well, and the UK division have plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I imported my data, viz.
data:image/s3,"s3://crabby-images/c32e6/c32e68104054ac73fccf5e02ad8cd88f9f7ea0a5" alt=""
The key to making my transformations as immune to change as possible is to keep the data I need rather than delete the data I don’t. Looking at the columns, the easiest way to see if there is any useful data in there is to use the filter icon; Column1 is clearly very useful.
data:image/s3,"s3://crabby-images/06b63/06b636e348bfb1d2914e9f36ea2d7f7d6abbdfa4" alt=""
However, Column8 is not:
data:image/s3,"s3://crabby-images/89d14/89d14d22a741a8bb43ef6741a6339804dc384f4c" alt=""
However, rather than delete Column8, I should keep what I need. On the Home tab, there is an option to ‘Choose Columns’:
data:image/s3,"s3://crabby-images/a8bad/a8bad5afa90c054d9824d22d20703540e7958f69" alt=""
I can use this to specify columns I want to keep. It’s much easier than selecting them all for large tables!
data:image/s3,"s3://crabby-images/2da8d/2da8d0dc763c7b7047c67f3081093f5172994209" alt=""
I choose to select the first seven [7] columns.
data:image/s3,"s3://crabby-images/3d020/3d0206353693140a0dae702f9d2e88a3ee42b77b" alt=""
I can see that the heading data from the tables is in Column1, which suggests that transposing my data would be useful. I can do this from the Transform tab.
data:image/s3,"s3://crabby-images/c6eff/c6effd6622688b6551f7e3646cea3f8175f4fad9" alt=""
This swaps the rows and the columns and is much closer to the format I want to see.
data:image/s3,"s3://crabby-images/4b9d3/4b9d330dbefa854955ce8393d1b66bd1d915779f" alt=""
I can check the data in my columns again to see which ones I want to keep. However, it is clear that this time the column names will change with the extra text that is present in my source data.
data:image/s3,"s3://crabby-images/d635d/d635de9316e0335d6cbf3ca969dcb82854cedc2f" alt=""
Before I decide which columns to keep, I need some way of identifying them. I will promote the first column to the column headings, which I can do from the Transform Tab.
data:image/s3,"s3://crabby-images/0ec48/0ec4880942c942d633da3fb3a85963663f041f0d" alt=""
I choose ‘Use First Row as Headers’:
data:image/s3,"s3://crabby-images/54be4/54be44da1a917b7910528510cc286f38a22d918b" alt=""
Power Query has created a ‘Changed Type’ step, but this references column names, so I delete it. I can pick the columns I want to keep in the same way as I did earlier.
data:image/s3,"s3://crabby-images/af07b/af07bd42942c0d4dbf9bd6d557df2546934b9389" alt=""
I have the data I want to keep, but there are two tables in here: the store data and the pay scales.
data:image/s3,"s3://crabby-images/4a72e/4a72e3cff4715640bb52506c751633e0ba59a3f9" alt=""
I can keep this query, which I will call All Data, and make Reference queries: one for the store table and one for the pay scales table. I can create reference queries from the ‘Home’ tab.
data:image/s3,"s3://crabby-images/f2942/f294243afd3f55868efe32b741d78b7a97bb8d32" alt=""
I described the benefits of using reference queries in the blog Reliable References. I call this Reference Query Pay Scales.
data:image/s3,"s3://crabby-images/e88bc/e88bc745d5b2ad08431dd7435dd2e00ecfe2062c" alt=""
I also create another Reference Query, Stores. I will transform Pay Scales next time…
Come back next time for more ways to use Power Query!