Power Query: Project Population – Part 7
7 June 2023
Welcome to our Power Query blog. This week, I continue to prepare my second query from a public data source.
I have found some information on population growth provided by The World Bank, which I am using as an example of how to transform real-life data.
I have been transforming the data, and last time, I started to transform the Country-Series query. This query gives me information about the source of the population data in the Country query, and I want to have a single description of the source for each country. I have removed any duplicate entries:
However, there are still multiple descriptions for some countries. Last time, I decided to use a trick to combine the text. I used ‘Group By’ which is available on the Home tab, and here, on the Transform tab:
I wanted to ‘Group By’ CountryCode, and concatenate the DESCRIPTION rows. However, since I had both columns selected, the dialog was in ‘Advanced’ mode, and prompting me to ‘Group By’ CountryCode and DESCRIPTION. I will sort this out later.
I created a new column, Description, where I summed the DESCRIPTION rows:
Unsurprisingly, Power Query was not happy about summing a text value:
Let’s start by having a look at the M code that has been generated:
= Table.Group(#"Removed Duplicates",
{"CountryCode", "DESCRIPTION"},
{{"Description.1", each List.Sum([DESCRIPTION]), type nullable
text}})
The part of this code that is trying to ‘sum’ the text is:
List.Sum([DESCRIPTION])
I need to replace this with a way of concatenating the text instead.
We can see how Power Query would do this by combining two [2] columns of text on the ‘Add Column’ tab:
This generates a step:
If I look at the M code for this step, I see the function I need to use:
= Table.AddColumn(#"Grouped Rows", "Merged",
each Text.Combine({[CountryCode], [DESCRIPTION]}, ""), type text)
The section of the code that combines the text is:
Text.Combine({[CountryCode],
[DESCRIPTION]}, "")
This means that I need to use the function Text.Combine() instead of List.Sum() in the ‘Grouped Rows’ step. I can also delete the ‘Inserted Merged Column’ step. I change the Grouped Rows’ step to:
= Table.Group(#"Removed Duplicates",
{"CountryCode", "DESCRIPTION"},
{{"Description.1", each Text.Combine([DESCRIPTION]), type nullable
text}})
This gives me the desired result:
Now I have shown that the substitution works, I can amend the step to remove the DESCRIPTION column and rename Description.1 to Description:
This doesn’t quite look right; I need a delimiter. If I look at the Text.Combine() M code generated when I created the merged column:
Text.Combine({[CountryCode],
[DESCRIPTION]}, "")
I can see that I can pass another parameter to this function to define the delimiter. I change ‘Grouped Rows’ step from
= Table.Group(#"Removed Duplicates", {"CountryCode"},
{{"Description", each Text.Combine([DESCRIPTION]), type nullable
text}})
to
= Table.Group(#"Removed Duplicates",
{"CountryCode"}, {{"Description", each
Text.Combine([DESCRIPTION],", "), type nullable text}})
This will put a comma and a space ( , ) between my data sources.
The content of Description is too long. I change the name of the column created in the ‘Grouped Rows’ step to Data Sources and I replace the phrase ‘Data source:’ with an empty space by right-clicking on Data Sources and choosing ‘Replace Values’.
Now I am happy with the Country-Series query. Next time, I will link to this information from the Country query.
Come back next time for more ways to use Power Query!