Previous entries on our blog have shown how easy it is to get data from APIs into Excel using Gridarrow. But what happens if the data you need isn’t easily accessible via an API, but is available as html tables. That’s where web scraping comes in!
Excel does have some useful features for reading html tables from the internet using PowerQuery, however particularly in cases where the data you need is spread across multiple pages, or needs tidying up in other ways, using Python is far more powerful.
What is Gridarrow?Gridarrow is the missing server for Excel. It connects just about any data source to Excel with simple Python scripts.
Why is it awesome?
- It's real-time, so as soon as the data is updated in the Gridarrow App it's updated in Excel, no need to poll or refresh.
- It's multi-user so your colleagues can connect to the same data at the same time.
- It simplifies your spreadsheets by moving data retrieval and pre-processing into Python.
There are lots of tutorials on the internet for learning web scraping with Python, but in this blog post we’re going to use the Pandas library, simply because it’s the easiest way to go from web data to Excel with the minimum of fuss, and the maximum flexibility.
The code below will parse the big table of economic indicators from the Trading Economics website and make it available for all the users in your Gridarrow workspace to connect to.
Simply assigning the Dataframe
.values to our summary
grid is enough to make the data flow to Excel.
The previous example already allowed us to get some useful data into Excel, with the latest value for 11 economic indicators across the 50 biggest economies in the world. However, the trading economics website has more than 100 indicators that we can use to get deeper with our analysis.
Here’s an example dashboard where we’ve used Gridarrow and Pandas to get these detailed indicators into Excel, showing the recent change in the indicator, and where the indicator is in its historic range.
You can download this spreadsheet, and add the Trading Economics app to your workspace from the Gridarrow catalog, so you can check it out right away.
Don’t forget to update the name of the workspace in cell
D2 to reflect your own workspace.