dataprep.eda.create_report

create_report

This module implements the create_report(df) function.

dataprep.eda.create_report.create_report(df, *, config=None, display=None, title='DataPrep Report', mode='basic', progress=True)[source]

This function is to generate and render element in a report object.

Parameters
  • df (DataFrame) – The DataFrame for which data are calculated.

  • config (Optional[Dict[str, Any]]) – A dictionary for configuring the visualizations E.g. config={“hist.bins”: 20}

  • display (Optional[List[str]]) – The list that contains the names of plots user wants to display, E.g. display = [“bar”, “hist”] Without user’s specifications, the default is “auto”

  • title (Optional[str], default "DataPrep Report") – The title of the report, which will be shown on the navigation bar.

  • mode (Optional[str], default "basic") – This controls what type of report to be generated. Currently only the ‘basic’ is fully implemented.

  • progress (bool) – Whether to show the progress bar.

Examples

>>> import pandas as pd
>>> from dataprep.eda import create_report
>>> df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
>>> report = create_report(df)
>>> report # show report in notebook
>>> report.save('My Fantastic Report') # save report to local disk
>>> report.show_browser() # show report in the browser
Return type

Report

formatters

This module implements the formatting for create_report(df) function.

dataprep.eda.create_report.formatter.basic_computations(df, cfg)[source]

Computations for the basic version.

Parameters
  • df (EDAFrame) – The DataFrame for which data are calculated.

  • df_num – The DataFrame of numerical column (used for correlation). It is seperated from df since the small distinct value numerical column in df is regarded as categorical column, and will transform to str then used for other plots. But they should be regarded as numerical column in df_num and used in correlation. This is a temporary fix, in the future we should treat those small distinct value numerical columns as ordinary in both correlation plots and other plots.

  • cfg (Config) – The config dict user passed in. E.g. config = {“hist.bins”: 20} Without user’s specifications, the default is “auto”

Return type

Tuple[Dict[str, Any], Optional[Dict[str, Any]]]

dataprep.eda.create_report.formatter.format_basic(df, cfg)[source]

Format basic version.

Parameters
  • df (EDAFrame) – The DataFrame for which data are calculated.

  • cfg (Config) – The config dict user passed in. E.g. config = {“hist.bins”: 20} Without user’s specifications, the default is “auto”

Returns

A dictionary in which formatted data is stored. This variable acts like an API in passing data to the template engine.

Return type

Dict[str, Any]

dataprep.eda.create_report.formatter.format_report(df, cfg, mode, progress=True)[source]

Format the data and figures needed by report

Parameters
  • df (Union[DataFrame, DataFrame]) – The DataFrame for which data are calculated.

  • cfg (Config) – The config instance

  • mode (Optional[str]) – This controls what type of report to be generated. Currently only the ‘basic’ is fully implemented.

  • progress (bool) – Whether to show the progress bar.

Returns

A dictionary in which formatted data will be stored. This variable acts like an API in passing data to the template engine.

Return type

Dict[str, Any]