Member-only story

3 Tools for Fast Data Profiling

Quickly analyse and summarise your data with these Python tools

Rebecca Vickery

Published in

TDS Archive

6 min readSep 19, 2022

Data profiling is one of the first steps in any data science project. It is a form of exploratory data analysis which seeks to analyse, describe and summarise a dataset to gain an understanding of both its quality and fundamental characteristics.

The data profiling task is used to inform further steps in a data science project such as the type and extent of data cleaning that is required and any other preprocessing techniques that might need to be applied. Data in the real world is rarely ready for a task such as machine learning without at least some basic treatment applied first.

Many of the steps involved in data profiling are common across different datasets and projects. Data profiling typically includes tasks such as applying descriptive statistics to each column, determining the volume of missing values and understanding interactions and correlations that exist between variables.

As these tasks can be quite routine there are a number of open-source Python libraries that seek to automate the task of data profiling. In this article, I will give a brief introduction, with code examples, to three Python packages that vastly simplify and speed up this initial exploratory analysis.

TDS Archive

3 Tools for Fast Data Profiling

Quickly analyse and summarise your data with these Python tools

Create an account to read the full story.

Published in TDS Archive

Written by Rebecca Vickery

Responses (4)