Power Query: A Tangled Web Part 3
12 March 2025
Welcome to our Power Query blog. This week, I continue looking at the web data I extracted to the Power Query editor.
When extracting data from the web Power Query utilises M code to access the data in a form that can be used in reports. Over the next few blogs, I am planning to examine the M code produced. The data I am going to use to investigate web data is our very own website:

This page contains a table of upcoming courses, along with various buttons and text:

In Part 1, I used the web connector to examine the data in the Navigator dialog.

I also created more tables using the ‘Add Table Using Examples’ functionality.

I found that I cannot combine data from inside a table and data outside of that table in the same custom table. I therefore added two [2] custom tables:

The training contact data has been captured in Table 2 but not in Table 1.
Last time, I loaded all the queries into the Power Query editor:

I examined the M code in the HTML Code and Displayed Text queries. I found that HTML code used the M function Web.BrowserContents(), and Displayed Text began in the same way, but went on to use the Html.Table() function too.
This time, I will look at the remaining queries: Upcoming Courses, Table 1 and Table 2. Let’s start with Upcoming Courses:

This is the query for the table that was extracted automatically into the Navigator dialog from the webpage. The Source step is the same as the Source step for HTML Code and Displayed Text:

The second step ‘Extracted Table From Html’ is similar to the step with the same name in Displayed Text:

For Upcoming Courses, the M code is more complicated, so let’s unpack it:
= Html.Table(Source, {{"Column1", "TABLE.table > * > TR > :nth-child(1)"}, {"Column2", "TABLE.table > * > TR > :nth-child(2)"}, {"Column3", "TABLE.table > * > TR > :nth-child(3)"}, {"Column4", "TABLE.table > * > TR > :nth-child(4)"}, {"Column5", "TABLE.table > * > TR > :nth-child(5)"}, {"Column6", "TABLE.table > * > TR > :nth-child(6)"}}, [RowSelector="TABLE.table > * > TR"])
I looked at the syntax for Html.Table() last time:
Html.Table(html as any, columnNameSelectorPairs as list, optional options as nullable record) as table
This means that for this example, I have the specified CSS selectors columnNameSelectorPairs:
{{"Column1", "TABLE.table > * > TR > :nth-child(1)"}, {"Column2", "TABLE.table > * > TR > :nth-child(2)"}, {"Column3", "TABLE.table > * > TR > :nth-child(3)"}, {"Column4", "TABLE.table > * > TR > :nth-child(4)"}, {"Column5", "TABLE.table > * > TR > :nth-child(5)"}, {"Column6", "TABLE.table > * > TR > :nth-child(6)"}}
I also have options specified:
[RowSelector="TABLE.table > * > TR"]
This step has extracted the information from the section of the html code that defines the table:
<table class="table"> <thead> <tr> <th>Location</th> <th>Course</th> <th>Course Date</th> <th>Local Time</th> <th>UTC</th> <th class="td-last">Duration</th> </tr> </thead> <tbody class="list"><tr> <td>Sydney Australia</td> <td class="title"><a href="https://www.sumproduct.com/courses/excel-tips-tricks">Excel Tips and Tricks</a></td> <td>10 March 2025</td> <td>09:00 - 17:00 AEDT</td> <td>09 March 2025 22:00 UTC -<br>10 March 2025 06:00 UTC</td> <td class="td-last"> 1 Day </td> </tr><tr> <td>Virtual (Australia)</td> <td class="title"><a href="https://www.sumproduct.com/courses/chatgpt">ChatGPT Part 1</a></td> <td>24 Mar 2025</td> <td>09:00 - 12:30 AEDT</td> <td>23 Mar 2025 22:00 UTC -<br>24 Mar 2025 01:30 UTC</td> <td class="td-last"> 1 Day </td> </tr><tr> <td>Sydney Australia</td> <td class="title"><a href="https://www.sumproduct.com/courses/financial-modelling">Financial Modelling</a></td> <td>24 March 2025 - 25 March 2025</td> <td>09:00 - 17:00 AEDT</td> <td>23 March 2025 22:00 UTC -<br>25 March 2025 06:00 UTC</td> <td class="td-last"> 2 Days </td> </tr><tr> <td>Virtual (Australia)</td> <td class="title"><a href="https://www.sumproduct.com/courses/chatgpt">ChatGPT Part 2</a></td> <td>7 Apr 2025</td> <td>09:00 - 12:30 AEST</td> <td>6 Apr 2025 23:00 UTC -<br>7 Apr 2025 02:30 UTC</td> <td class="td-last"> 1 Day </td> </tr><tr> <td>Sydney Australia</td> <td class="title"><a href="https://www.sumproduct.com/courses/power-pivot-power-query-and-power-bi">Power Pivot, Power Query and Power BI</a></td> <td>10 April 2025 - 11 April 2025</td> <td>09:00 - 17:00 AEDT</td> <td>9 April 2025 22:00 UTC -<br>11 April 2025 06:00 UTC</td> <td class="td-last"> 2 Days </td> </tr></tbody> </table>
Since the query is already called Upcoming Courses, there has already been some extracting before this point to recognise that a table is being extracted under this heading information:
<h2>Upcoming Courses</h2>
We can see that <tr>, <td> and <th> appear frequently in the html section. <tr> indicates the table row containers, which contain the table data cells <td> and table header cells <th>. The RowSelector in the M code indicates that the function should be extracting the rows of TABLE.table, which is the table identified in this query.
The columnNameSelectorsPairs is extracting the html data into six [6] columns. This also implies that some processing has occurred prior to the steps in this query to determine how many columns there are. The algorithms also determine how the data is accessed: in the original table on the webpage, we may access a URL via a hyperlink to see more data about the course, but that URL is not extracted here.

We were able to choose to include the hyperlink destination URL when we created a custom table Table 1 in Part 1, so this could have been included if the algorithms were programmed to do so.
The final step of the Upcoming Courses query simply detects the data type for each column. Next time, I will compare the code for this query with the tables I created using ‘Add Table Using Examples’.
Come back next time for more ways to use Power Query!