Power Query: Fuzzy Fixing Part 5
18 December 2024
Welcome to our Power Query blog. This week, I head to Power Query Online to see the fuzzy options available in the user interface.
Having looked at the options available for fuzzy column matching in the last few blogs, this week, I will look at what is available in Power Query Online. My last blog on Power Query Online was last year and some new functionality has been added since then. I am accessing the query Country Data, which I am going to use to show the fuzzy functionality available. On the ‘Add Column’ tab, there is a new icon:
I am going to investigate how I may use ‘Cluster values’. Clicking on the button invokes a dialog:
The information icon at the top of the screen takes me to the Microsoft help pages. I am required to ‘Specify the column to create the clusters from.’. I will use Country Code. I am then prompted to enter the name of the new column is I wish, and there is a dropdown for me to specify the ‘Fuzzy cluster options’:
These options are familiar; they are the same ones I encountered when using Table.AddFuzzyClusterColumn(). However, here I may choose the options via the user interface. I call the new column FuzzyCountry and accept the default values.
I have dragged the new column next to Country Code for comparison. The help for ‘Cluster values’ is to ‘Create a new column that contains canonical values clustering similar values in the selected column’. The results are not what I might expect, but let’s have a look at the M code generated:
Table.AddFuzzyClusterColumn(#"Changed column type 2", "Country Code", "FuzzyCountry", [IgnoreCase = true, IgnoreSpace = true])
It is very similar to the M code I started with in Part 1. However, it is much easier to change the settings. I click on the cog (gear) icon next to the ‘Clustered values’ step:
I have opted to ‘Show similarity scores’ and I’ve raised the ‘Similarity threshold’ to 0.9:
I have dragged the new column across again. Not having the option to name the similarity data column has led to the catchy default name Country Code_FuzzyCountry_Similarity!
Let’s look again at the M code generated:
Table.AddFuzzyClusterColumn(#"Changed column type 2", "Country Code", "FuzzyCountry", [IgnoreCase = true, IgnoreSpace = true, SimilarityColumnName = "Country Code_FuzzyCountry_Similarity", Threshold = 0.9])
This is an easier way to create the code: hopefully, it will be added to the desktop version of Power Query soon.
Next time, I will look at more fuzzy functionality that is available in Power Query Online.
Come back next time for more ways to use Power Query!