Power Query: Blog List – Part 4
20 December 2023
Welcome to our Power Query blog. I would like to get a list of all the Power Query blogs I have posted. Today, I improve the query to extract data from all pages by extracting the last page number from the data.
I started writing this blog series way back in 2016:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image1-1699896183.png/e1645165afec7bc1dc2f103f40c1a81d.jpg)
There have been many developments since then, some of which I have reported upon. Since I am going to revisit more of the areas that have been improved, I am going to start with a reminder of what we have covered so far. In order to do this, I am going to use Power Query in Power BI, since there are some functions that will help me to access web data, which are not yet recognised in Power Query (Get & Transform) in Excel.
In Part 1, I noted that to get a full list, I am going to need several pages:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image2-1699896241.png/7202f0632f528ebd0cb326adfd57aef5.jpg)
I extracted and tidied up the data for the first page:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image3-1699896257.png/744556b4f2eaf5c62fc173ddc506d835.jpg)
In Part 2, I changed the Source step of my query to accept a parameter P_PageNumber, which allows me to get the data from a specific page. I used the following M code:
= Web.BrowserContents("https://www.sumproduct.com/blog/power-query-blogs?page="&Number.ToText(P_PageNumber))
I checked that this still gave the same results for the query:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image4-1699896368.png/6a6bb9d09e109cfac76aa6dc9b2741d7.jpg)
Last week, I converted this query into a function, and generated a list of page numbers to concatenate my data:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image5-1699896445.png/b5c3bc6bbd76ec49e6f871b327356acd.jpg)
This time, I start by going back to the function behind the query to review the step that I used to remove repetitive data from Description:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image6-1699896463.png/bbf36c89d5a03f9bb7658f949efa687b.jpg)
I remove the final step to see what is happening to the data in Description on the earlier rows in my combined query:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image7-1699896480.png/114992672da51d97ee965ef7ddfbba22.jpg)
Clearly, I haven’t always been as consistent! A better approach would be to replace any repetitive text with empty space. I can do this in the combined query, currently called Query1:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image8-1699896504.png/8d8666ca3475a4dd7192c969eca70566.jpg)
I could improve my query further by removing the hard coded page number from my list generation and instead, extract the number of pages from the data. To do this, I begin by making a Duplicate of Table1:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image9-1699896520.png/9213f8f6768e1f2e7a24dc1c41aa34e5.jpg)
I (optimistically) call my new query P_LastPage. I only want to keep the Source step:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image10-1699896538.png/99ae3d5387c202a1419afc9b1f0bdd1c.jpg)
I have highlighted the text I need, and I need the total number of pages from this:
<p>Page 1 of 37.</p>
Now, I need a way of extracting the number I want. My options at the moment appear to be limited:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image11-1699896630.png/cba87c1d30857c998feed6afda097155.jpg)
I want to look for text before the delimiter “<p>Page 1 of”, and after the delimiter “.</p>”. To access this functionality, I need to convert the text to a Table:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image12-1699896649.png/74b09ad546405ed43a71e47bdffb561b.jpg)
This makes the data harder to see, but I know the delimiters I need to use:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image13-1699896668.png/a607f050d0cbddc62f2313dac2d3b218.jpg)
On the Transform tab, I choose to Extract ‘Text Between Delimiters’:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image14-1699896685.png/07bca7f4c2e53edb0f4c230d8f3e83d9.jpg)
I enter the required delimiters in the dialog:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image15-1699896721.png/a3289d42eb0e8053b7b4f4e4ab2d5208.jpg)
This gives me the data I need, and I change the datatype to ‘Whole Number’ to allow me to use it in the list generation.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image16-1699896743.png/a7afea1ae20f2702b471984bb3108d99.jpg)
Now, I can right-click on and ‘Drill Down’ into this value:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image17-1699896758.png/fd57f3d33f7a86afeb73a4067caf96ea.jpg)
This gives me my parameter P_LastPage:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image18-1699896775.png/2d81b5195cd3e0f04be116aac78d90f1.jpg)
I have renamed my main query to Extract Blogs. I need to change the current Source step:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image19-1699896790.png/e04dfde6fdf842ed2c77361345b3a285.jpg)
Instead of the hard-coded value 40, I use P_LastPage:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image20-1699896806.png/1d57c4afb90f91f577033fa78aa55b0e.jpg)
Now I have the exact number of pages, I no longer need the ‘Filtered Rows’ step which was removing empty rows; therefore, I delete it and ignore warnings about deleting an intermediate step:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image21-1699896831.png/7ea2f88ed3857117c1269cbf62977e9c.jpg)
My query is ready to load, so I select all my data and ‘Detect Datatypes’ from the Transform tab:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image22-1699896855.png/7d011864db9c936864608bd609caee7d.jpg)
Now the list is complete:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/./image23-1699896874.png/df9f0ecff5d4068defe4a9b89c4537de.jpg)
Next time, I will look at why this cannot be achieved in Power Query for Excel.
Come back next time for more ways to use Power Query!