-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the data-quality-report wiki! Let's start with the download. Once you download the Tool you can just run the .exe file or unzip it manually and follow the instructions of the manual guide.
Once you made that you will find the ICON in the external tools menú inside Power Bi Desktop.
In order to start using the tool you can just one a new blank Power Bi Desktop, get data from the sources you are adding as tables to analyze them. Of course if you already have a dataset you want to analyze go ahead, but it might be more interesting to analyze it in early stages of projects. With your tables loaded you can go to External Tools tab and just click on Data Quality Report icon.
This will run a batch process to start a new Power Bi Desktip file from a template with all the analysis we want. It might take a some minutes depending on the size of your tables (rows and columns). Consider that analyze a distribution for each text (getting count of values) or numeric column (getting percentiles) it's an intense process. You should be worried about the size of our Data Quality Report because it will only analyze the data loading only the result and not the data itself.
The authentication to continue our process is the one that says "Windows Credentials" and current logged in account. Power Bi needs an authentication that's why it's asking for it. Because we have a local environment with an instance of analysis services runing at the back it should be enough.
Once we open it shows a welcome page and other pages with analysis. The welcome page has a smal instruction guide in case we have problems with privacy levels. We can ignore them like the instruction suggests. Let's take a look at the pages.
In the first look we can see a summary of the tables in three sections. On the left side we have the distribution of datatypes in a table in a 100% Stacked Bar Chart that can be changed to a table with exact amounts. In the middle we have amount of rows per table. On the right we have a quick look on the percentage of valid and messy data. We can also drilldown the bar chart to get a deeper look on the errores or whites of specific columns
The distribution analysis is separated in two, text and numeric columns. Both of them can start filtering the table in the top right slicer. The text ones or date ones will show the available values, a top 5 most repeated categories of data in the column and a detailed information about the columns.
As the visuals suggest we can take a deep dive on a column if we just click on one of them in the table.
On the next page, for the numeric analysis, we can just change page with the button on the top (ctrl+click) or at the bottom with PowerBiDesktop. Here we have a similar page as before with small changes. Instead of top 5 categories we will see a distribution of the numeric values in regular ranges as an histogram. It is difficult to see the normal distribution if your data is highly concentred in close values, but don't be afraid because the table will help. The table has a different perspective because it is showing percentiles of the columns that will help us understand a messy distribution on the graph and a better understanding if we have outliers.
Here we can find everything related with the schema of the table to get a different understanding like datatypes or size of columns. With a single click on a column we will see again the distribution of the previous tables with the difference that text ones will show all its values and not just top 5. We can also see the valid vs errors proportion and cards with counts.
Because it's a nice friendly visual to impact the most repeated category in our column we didn't want to miss this visual. We can navigate through tables and columns to the most repeated text. In order to see a word it must appear at least 10 times in the data to improve performance. Now we can easy check the most frequent word.
For people loading cleaner datasets and not just individual tables, we have added this page. It contains a table to get a better understanding of relationships. You will find warnings on errors for FKs, bidirectional or many to many.
That's the end of the road for now. We hope you enjoy the Data Quality Report that let us improve our knowledge of our data regarding distributions, issues, etc. We hope it become handy for you.