Analysts and developers, proficient in R, have been able to incorporate R visuals in their Power BI projects for quite some time now. It made sense that Microsoft started with R implementation in Power BI, as they acquired Revolution Analytics and its implementation of the R language back in 2015. However, Python has been on the rise, even bypassing R in the realm of data science.
So it makes sense that Microsoft would eventually implement Python into the Power BI platform, and we need not wait any longer. It’s here! As of August 2018, you can now incorporate Python visuals in your Power BI projects.
What Makes Python So Great?
As a data analyst who has spent quite a bit of time cleaning, transforming, analyzing, and visualizing data in Python, not to mention conducting some machine learning experiments, one of the things I love about Python is its clear, readable syntax. This makes it easier to learn and progress; even the less-technical have a better chance of understanding what’s going on in the code.
Another thing I love about Python is the Pandas library. Pandas allows you to work with dataframes and streamlines the data manipulation and analysis process.
Lastly, the ability to produce beautiful looking charts with relatively few code is high on my list. Matplotlib and Seaborn (which is based on Matplotlib) are two libraries that can be utilized to develop professional looking charts with very little code. When you’re just exploring the data for insights and not too concerned with formatting, a single line of code is usually all it takes.
There’s certainly more great things about Python but I think you get the point. Let’s get on to the demos!
Setting Up Python
There are two prerequisites you will have to ensure are in place before you can start working with Python in Power BI. The first is to ensure that you already have Python installed on your computer. There’s different approaches to doing this; however, I recommend installing the Anaconda Distribution (choosing the latest version of Python). With this single download you pretty much get everything you’ll need for data analysis in Python.
Since Python is currently a “preview feature” in Power BI, the second prerequisite is ensuring that you have Python support checked in the preview features section of Power BI Desktop. Within the Power BI Desktop, go to File > Options and settings > Options.
Python Visuals in Power BI
Now that you have Python installed and enabled, you need to click on the Python visual icon under Visualizations.
The data you place in the Values area will automatically be converted into a dataframe – essentially a table or 2-dimensional data structure with columns and rows. The gray area at the bottom, where it says “Python script editor,” is where you’ll write or copy/paste your Python script/code. Once you’re done writing your script, or if you want to test your script, click on the “Run script” icon.
Following are three data visualizations developed with Python in Power BI, along with the code used. Don’t worry if you don’t understand what’s going on in the code. The point here is not to teach you Python but to demonstrate how you can use Python visuals to produce more advanced reports in Power BI. One of the things you’ll notice is that it doesn’t take much code to produce these visuals. These are just three examples of virtually endless opportunities for visualizations. The first visual uses the Seaborn library, which is a popular data visualization library built on Matplotlib. I used the seaborn.pairplot() function to create a pairplot of project cost, work, and duration. The scatterplots show the correlations between the variables. Since a variable perfectly correlates with itself, a histogram is used across the diagonal, allowing you to see the distribution of that variable.
Python Code:
# setup environment import matplotlib.pyplot as plt import seaborn as sns # rename fields dataset.columns = ['Project','Cost','Work','Duration'] # generate pairplot to show correlations and distributions sns.pairplot(dataset, diag_kws={'color':'darkgray','edgecolor':'white','lw':0.25}, plot_kws={'color':'#00688b','lw':0,'alpha':0.5}) # adjust plot plt.tight_layout() # show plot plt.show()
This next plot is known as a dotplot or a stripplot. This uses Seaborn’s stripplot() function. It shows the distribution of projects by project owner and project cost. It is color formatted based on cost variance. If the cost variance is positive then the project is over budget. If the cost variance is negative then the project is under budget.
Python Code:
# setup environment import matplotlib.pyplot as plt import seaborn as sns # rename fields dataset.columns = ['Owner','Project','Cost','Cost Variance'] # define hue format dataset['cv_format'] = dataset['Cost Variance'].apply(lambda x: 'Over Budget' if x > 0 else 'Under Budget') # plot dotplot/stripplot sns.stripplot(x='Owner', y='Cost', hue='cv_format', size=8, data=dataset, palette=['red','#00688b'], jitter=True, alpha=0.6) sns.despine() plt.legend(frameon=False) plt.tick_params(labelsize=12) # adjust layout plt.tight_layout() # show plot plt.show()
This last example is one of my favorites. I love small multiples. This visual is a little harder to produce, as there is currently no small multiples function. In order to produce this visual, I had to essentially loop through the various categories for each subplot (small plot within the overall plot) and plot their values. This is a great visual for showing comparisons. Showing the other category values grayed-out makes it even easier to see the comparisons.
Python Code:
# setup environment import matplotlib.pyplot as plt # set style plt.style.use('seaborn-dark') # initiate figure fig = plt.subplots(3, 3, sharex=True, sharey=True) # define variables num = 0 categories = dataset['Category'].unique().tolist() months = len(dataset['Date'].unique()) min_y = min(dataset['Value']) - 25 max_y = max(dataset['Value']) +25 # loop thru categories and generate plots for c in categories: # increment num by 1 for subplots num += 1 # select subplot to plot on plt.subplot(3, 3, num) # create axes variables x = range(1, months + 1, 1) y = dataset['Value'][dataset['Category']==c] # plot other categories, grayed out for v in categories: plt.plot(x, dataset['Value'][dataset['Category']==v], linewidth=0.5, color='gray', alpha=0.3) # plot main category plt.plot(x, y, color='#00688b', linewidth=1.5, alpha=0.9) plt.title(c, loc='left') # set y limits and ticks plt.ylim(min_y, max_y) plt.yticks(range(100, 300, 50)) # set x ticks plt.xticks(range(1, 9,1)) # only keep ticks on outer subplots if num in range(7): plt.tick_params(labelbottom='off') if num not in [1,4,7]: plt.tick_params(labelleft='off') # axes titles if num == 8: plt.xlabel('Months') if num == 4: plt.ylabel('Values') # adjust layout plt.tight_layout() # show plot plt.show()
Conclusions
As we have just seen, Python is a powerful tool for data analysis and visualization that can be utilized to extend reporting in Power BI. I certainly don’t expect Python to replace DAX, the Query Editor, or Power BI’s built-in visuals, nor would I want it to. However, I do see it becoming a popular supplement to the Power BI platform. I look forward to seeing how others utilize Python in their own reports. Here at Stratada, we’ll certainly be thinking of ways to use it in our reports and our clients’ reports.
I hope you have found this to be a useful demo of Python in Power BI. Please let us know if there is any way we can help you with your data analytics and reporting needs.