STORY

Python Pandas or PowerBI?

Joep Sanders, director/consultant at SkyClear shares his experience with BI tools, which he prefers to use and why:

In recent years I have been fortunate enough to have a taste of a wide range of BI tools. From transformation in Alteryx to dashboards in Qlik, SQL queries, Cognos models, reports in Business Objects and of course the better tinkering in MS Access and Excel. Two tools stand out in my toolbox: MS PowerBI and Python Pandas. If given a choice (and luckily it often is), I’ll use one of these two… or both.

Of course, every BI solution has its advantages. If you set them up properly, they will all help you run your company in a smarter way. If you want to get started with creating a dashboard or performing a data analysis, you have to make a choice in the jungle of tools.

Microsoft PowerBI and Python Pandas are my favorites. Of course it’s comparing apples with oranges, but I still want to make an effort to list when I use which solution and why.

Visualization

Although there are many Python data visualization libraries available, I prefer to use PowerBI for visualizations. By default, PowerBI offers the most important charts, tables and KPI views that you need to build a beautiful dashboard. If that’s not enough for you, you can always look for a ‘visual’ in AppSource, or get started with a ‘custom visual’ yourself.

A major advantage of PowerBI is the ease with which you can create a chart. With a few clicks you can try out different visualizations and choose the most suitable one for the information you want to convey. An absolute plus is the interactive nature of dashboards in PowerBI. By clicking on a visualization you can zoom in on the information relevant to you and filter other visualizations. Of course, standard filters and slicers are also super fast and accessible in PowerBI.

Transformation

With DAX you can build beautiful formulas and metrics in PowerBI that you can filter and merge super fast. To download resources and perform transformations, switch to the query editor and Power Query ‘M’. The beauty here is the graphical interface that allows you to perform extensive transformations without having to write a line of code. Many different types of data can be easily retrieved, such as text files, Excel, SQL Server, APIs. And if that is not enough, you can still get started with a Custom Connector that you can also write in “M”.

You can’t think of it that crazy or you can build it in PowerBI, believe me: I’ve tried it :). Retrieve data from a Rest API with OAUTH authentication flow, a beautiful dimensional data model with transformations and complex calculations in “M”. All can be realized, but you run into the limits of the application. And you notice that in the stability and performance of your dashboard. In that case, refreshing the data and running the queries will take a very long time, and unfortunately occasionally crashes. In addition, PowerBI does not offer such a nice development environment if you delve deeper into the code.

So for the heavier transformation work I switch to Python Pandas. Not as ‘entry-ready’ as PowerBI, but with an online course, a nice environment such as Jupyter and the better Google work you will come a long way. If you are somewhat handy with Python, the possibilities are limitless: you can read in all types of data, work with large data sets, perform complex operations and all of this is super stable and lightning fast. Because Python is very popular among data engineers and scientists, there is a lot of documentation and community content available online. Other people have already struggled with most issues and there are several solutions available.

Analysis

Depending on the type of analysis I want to perform, I prefer one of the two solutions. PowerBI is a good way to visually zoom in, compare and filter data. In this way, an analysis is also easy to share with other people.

The big advantage of data analysis using Python is the speed. With a simple script you dig through a large mountain of data, make groupings and merge data. No ‘refresh’ data, no switching between query editor, data model and metrics. You can run your script directly and get immediate results. If you work a bit in a structured way, you can reuse your code the next time you have the same or a similar question. This is especially useful if you perform some more advanced operations such as text analysis or complex joins. Unfortunately, working with python pandas does require some practice before it really saves time.

Accessibility

PowerBI is by far the best when it comes to accessibility. The application is fully designed for the use of the graphical interface. If you then create a DAX formula, you will be helped in the formula editor with the correct syntax and tips. The basis of PowerBI are two tools that were already available in Excel: PowerQuery and PowerPivot, and the functions connect to the office applications. It is a self-service BI tool par excellence and you don’t even need to have the application installed: you can come a long way with Dashboards published in the cloud. Of course you can make it complicated with advanced DAX or transformations in M, but you don’t have to do that to make a nice dashboard.

As for Python accessibility, you can make it a bit easier on yourself when setting up your environment using a platform like Anaconda. As a development environment, I personally prefer Jupyter Lab when it comes to simplicity. Jupyter Lab offers you a logical and clear interface, cleanly formatted output, auto-complete and function explanation. However you look at it, using Python for data analysis requires basic knowledge and a well-designed development environment and is therefore much less accessible than PowerBI.

The best of both worlds

If I had to choose, I wouldn’t :). And I don’t think it is necessary: it is possible to add Python scripts to PowerBI. At the moment I haven’t come across a use case yet, but who knows, it might become a useful application in a future project.

Do you need context and overview?

Read more about our approach here.