Power Query: Project Population – Part 5
17 May 2023
Welcome to our Power Query blog. This week, I complete the first query that I downloaded from a public source.
I have found some information on population growth provided by The World Bank, which I am using as an example of how to transform real-life data.
I have been transforming the data, and last week, I created and tidied the Latest population census column.
I also kept the Population census notes, which I need to tidy up. To begin, I move the column after Latest population census. I can do this by selecting the column and dragging it to the end, or by right-clicking and choosing to Move ‘To End’:
I examine the contents of the column by looking at the values on the Filter dialog:
I need to remove the full stops (.) from the beginning of the some of the values. I can do this by using ‘Replace Values’ from the right-click menu:
When I click OK, my data looks better, but I need to remove the leading spaces:
I can do this by using ‘Trim’ from the Format dropdown on the Transform tab:
My data looks better. However, the recent steps are a little ambiguous:
Renaming them will make it easier to see what I have done:
Moving on to the National accounts base year column, a quick look at the data might suggest that I use the same approach that I used for Latest population census, where I converted the column to a number to easily identify any values that are not years in order to remove them.
However, if I view the values on the Filter dialog, I see that there are combined year values, which cannot be expressed as a number:
In order to preserve this data, I am going to use a different approach. If the first four [4] characters are not numeric, then I want to replace the current value with null. I go to the ‘Add Column’ tab, where I choose to Extract ‘First Characters’:
This takes me to a dialog where I can select the first four [4] characters. Note I cannot choose the name of the new column, but as it will be temporary, this doesn’t matter here.
This gives me a new column First Characters containing the first year, which I can change to data type ‘Whole number’:
This has identified the values that did not contain a year as errors. I can right-click and choose to ‘Replace Errors’. I shall replace them with null:
Now I can add a ‘Conditional Column’ from the ‘Add Column’ tab:
In the new column name, I have capitalised ‘Accounts’ so that I am not trying to use the same name as the existing column. If the temporary column First Characters is null (meaning that National accounts base year did not have a year in the value), then the new column is set to null, otherwise it will have the same value as National accounts base year.
This gives me a new column, and I can delete National accounts base year and First Characters.
I tidy up by changing the new column to data type Text, and rename the steps.
Now that all the columns in this query have consistent data, next time, I will look at the data that I can add from the other queries I imported.
Come back next time for more ways to use Power Query!