Power BI Blog: Get Data from PDFs
6 December 2018
Welcome to this week’s Power BI blog. Today, we’re going to look at the new preview feature, Get Data from PDFs.
PDFs have long been a source of pain for analysts, with most copy/paste functions seeking to get data from PDFs being highly unreliable, bringing numbers and values in unformatted, or worse, incorrectly formatted patterns, and generally needing a lot of massaging to set up in a reasonable way. This feature should change all of that.
First thing’s first – make sure you enable it in the “Preview features” options menu.

Now, when you Get Data, under File, you can see the PDF option in Beta:

Conveniently, our monthly newsletter came out earlier this week, so we’re going to connect to that as our data source. If you don’t have a copy, you can scroll down to the bottom of the page and sign up to receive it!
This is what we see once we connect through to it:

Interestingly, there are far more tables than I would have envisaged. Clicking through them, it looks like it’s converted all of the bullet points into their own table format.

It just so happens that I know that there is a list of training course dates near the back of the PDF though, so I’m going to skim to page 40 to find that list:

Editing the data brings it into the Power Query editor, allowing me to promote the first row to be the column headings and bring it into my dataset.

All looks pretty straightforward! However, if you go to the Navigation step, you’ll see how it refers to this table in the file:

“Table030” isn’t a great way to reference it, especially if this PDF were to be different next month. So far, this looks like a good way to bring an ad-hoc table into Power BI, but it doesn’t quite work for long-term monthly reporting. Oh well, still better than nothing!
Join us next week for more tips and tricks in Power BI!