We show how to combine data sourced from SQL Server with data in Excel files and how to work with unexpected data formats. The result is a beautiful report with great analytics functionality.
In our previous article, we outlined how the data needs of medical sales reps and their organizations can quickly become complex to fulfill in an analytics solution. While the primary requirement may be to know the Rx patterns of the physicians and hospitals they visit, other needs arise: data from third-party must be blended in, and operational and organizational data feeds must be considered for compliance and targeting; and this data is often in disparate formats and does not come correlated. Moreover, for any reporting and visualization solution to be effective, information needs to be up to date, presented in actionable format, and rendered in a compelling visual style across a variety of devices. When a rep is preparing for a visit to a physician or hospital, she needs the latest information, clearly displayed and delivered seamlessly to the device she is carrying.
We propose that Power BI affords the following five key technologies to build such as solution:
1. A data mashup engine that can accept inputs from more than 50 source types (e.g., SQL Server databases, Excel files, etc.) and allows syntactic and semantic transformations on the data.
2. A data modelling functionality that permits building data models regardless of the sources of data.
3. A reporting and dashboarding component with dozens of charting and configuration options.
4. A renderization engine that allows data visualizations to be displayed beautifully and seamlessly across PCs and mobile devices.
5. Data gateways to extract and update data in real time or on a schedule.
In this article we will begin to outline a sample solution and illustrate points 1 and 2. In our next and final article, we will demonstrate the other features.
In the sample we are going to build we will combine data from a SQL Server database, namely prescriptions, physicians, and products, with data from Excel files: sales representatives, physician assignments, Rx goals, and sales (monetary) actuals and goals. This is a scenario we have experienced repeatedly.
The objective is to build a solution that will answer basic questions like:
- Who are the top prescribing physicians?
- Are sales reps meeting their Rx goals?
- Are we meeting our sales goals?
- How adherent to our brand are our patients?
- What’s the flow of incoming and outgoing patients from our brand?
Want to skip the details and see what the final report looks like? Click here.
Getting the Data from SQL Server and Excel
To import data from SQL Server we will use queries that already exist in the company’s database. First, we choose SQL Server as a data source:
After this we need to provide credentials and indicate the query to extract the data. In the case of physicians, the query is like this:
Immediately a preview of the data is shown:
Notice how all the names are in upper-case letters. To make them prettier, we will apply one of a myriad of transformations available in Power BI to shape the data before building visualizations:
In this case, we will use the “capitalize each word” text transformation:
And this is the result:
For each transformation, Power BI is building a script behind the scenes. These are the steps for this simple data import and transformation (under “APPLIED STEPS” on the right):
In this window, steps may be rearranged or eliminated. Also, the entire script itself, written in a language called M, is available for direct editing. The benefit is that when data is refreshed, the steps are automatically applied to the new data.
From SQL Server we will also extract product and Rx data in the same fashion.
Data sources are not homogeneous anymore. Power BI includes more than 50 “connectors” to bring together data from disparate sources and create seamless data models and reports.
For other types of data sources, such as Excel, the same transformation and editing functionality is available. In our example, we have two Excel files to import. The first has the names of the sales representatives and the physicians they are assigned to, in two sheets. As before, first we choose Excel as the data source. When we indicate the Excel file to import, we get the following dialog and we select both sheets present in the file for import:
Notice that the matching of reps and physicians is specified using the reps’ Ids and the physicians’ prescriber Ids. In the data model we will have to make the corresponding associations to relate them to the master tables.
What if the Data Format is Challenging?
Our next Excel file contains sales targets and actuals (in monetary terms) for the entire organization, and Rx targets (in terms of units) for each sales rep. The sales information is formatted in an unorthodox (but no uncommon) way and we will have to apply several transformation steps to shape it into a standard table structure with fixed columns. In Excel, this data looks like this:
The challenge here is that it is formatted in a pivot-table fashion with a variable number of columns (namely, for each new month two columns will be repeatedly added); we need the data in fixed-column tabular structure. Here’s the resulting shape after several transformations:
Power BI includes advanced features to mash and shape data in very sophisticated ways. It’s just a fact: in every project there’s always data that must be transformed, converted, or cleansed.
Designing the Data Model
Once we have imported the data, we can build a data model (or, to be specific, a “tabular” data model). In the model we define the relationships between the tables. These relationships allow for the automatic filtering and cross-filtering of data in the report’s functionality. For example, in the page titled “Salesperson-Physician Browser”, when you click on the bar for sales rep Ingrid Báez, the table on the left is cross-filtered to show only physicians associated with Ingrid. No coding was necessary to achieve this.
If you are familiar with SQL, notice that the effect is that no “joins” need to be written to get the data from a related record in a different table. This navigation, if you will, is done automatically based on the relationships declared in the data model. Here’s a segment of the data model:
Moving the mouse over the connecting lines causes the related fields to be highlighted:
Power BI can detect some of the relationships naturally present between tables by matching fields with the same names. Also, relationship cardinality and filter direction (given by the arrows) is also detected whenever possible. Relationships, cardinalities, and directions may all be edited.
In the final installment of this series, we will demonstrate how create data visualizations and publish the report to the Power BI Service (at PowerBI.com). Once published, we can schedule data refreshes and share the report with others. Look here for the finished report.
By: Adolfo J. Socorro, Ph.D.
SQL Server and Power BI Architect