visit
The “Maybe Just a Quick One” series title is inspired by my most common reply to “Fancy a drink?”, which, may or may not end up in a long night. Likewise, these posts are intended to be short but I get carried away sometimes, so, apologies in advance.
conda create -n autoviz python=3.8
conda activate autoviz
python -m pip install autoviz
conda install scikit-learn
from autoviz.AutoViz_Class import AutoViz_Class
from sklearn.datasets import load_boston,load_iris
import pandas as pd
boston = load_boston()
df_boston = pd.DataFrame(data=boston.data, columns=boston.feature_names)
df_boston["btarget"] = boston.target
AV = AutoViz_Class()
filename = ""
sep = ","
dft = AV.AutoViz(
filename,
sep=",",
depVar="btarget",
dfte=df_boston,
header=0,
verbose=2,
lowess=False,
chart_format="svg",
max_rows_analyzed=150000,
max_cols_analyzed=30,
)
filename
- Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.sep
- this is the separator in the file. It can be comma, semi-colon or tab or any value that you see in your file that separates each column.depVar
- target variable in your dataset. You can leave it as empty string if you don't have a target variable in your data.dfte
- this is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.header
- the row number of the header row in your file. If it is the first row, then this must be zero.verbose
- it has 3 acceptable values: 0, 1 or 2. With zero, you get all charts but limited info. With 1 you get all charts and more info. With 2, you will not see any charts but they will be quietly generated and save in your local current directory under the AutoViz_Plots directory which will be created. Make sure you delete this folder periodically, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.lowess
- this option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)chart_format
- this can be SVG, PNG or JPG. You will get charts generated and saved in this format if you used verbose=2 option. Very useful for generating charts and using them later.max_rows_analyzed
- limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.max_cols_analyzed
- limits the number of continuous vars that can be analyzedWait...that was it?
Yes. It is that simple. Using 2 as the
verbose
level had the charts generated in the AutoViz_Plots
folder. Let's take a look at some of them:Violin Plots
Scatter Plots
Heatmaps
That is cool. I won't need to do any data vizualisation myself anymore! (I hear you say)
Not quite. I believe that we need the best of both worlds. Having an automated data visualization tool like Autoviz to quickly generate some graphs for your data is a great first step. It can very quickly give you a good summary of it. However, you might need to dig deeper and create some plots yourself, depending on the task.Further reading: