Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 16599 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 12 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 778.2 KiB |
| Average record size in memory | 48.0 B |
Variable types
| NUM | 3 |
|---|---|
| CAT | 3 |
Reproduction
| Analysis started | 2020-08-04 23:54:40.257427 |
|---|---|
| Analysis finished | 2020-08-04 23:54:48.190619 |
| Duration | 7.93 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 12 (0.1%) duplicate rows | Duplicates |
Dates has a high cardinality: 498 distinct values | High cardinality |
Regions is highly correlated with States | High correlation |
States is highly correlated with Regions | High correlation |
States is uniformly distributed | Uniform |
Dates is uniformly distributed | Uniform |
| Distinct count | 33 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 129.7 KiB |
| Nagaland | 503 |
|---|---|
| Assam | 503 |
| Pondy | 503 |
| Tripura | 503 |
| Gujarat | 503 |
| Other values (28) |
| Value | Count | Frequency (%) | |
| Nagaland | 503 | 3.0% | |
| Assam | 503 | 3.0% | |
| Pondy | 503 | 3.0% | |
| Tripura | 503 | 3.0% | |
| Gujarat | 503 | 3.0% | |
| Maharashtra | 503 | 3.0% | |
| Jharkhand | 503 | 3.0% | |
| West Bengal | 503 | 3.0% | |
| Arunachal Pradesh | 503 | 3.0% | |
| Odisha | 503 | 3.0% | |
| Other values (23) | 11569 | 69.7% |
Length
| Max length | 17 |
|---|---|
| Median length | 7 |
| Mean length | 7.363636364 |
| Min length | 2 |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 129.7 KiB |
| NR | |
|---|---|
| NER | |
| WR | |
| SR | |
| ER |
| Value | Count | Frequency (%) | |
| NR | 4527 | 27.3% | |
| NER | 3521 | 21.2% | |
| WR | 3018 | 18.2% | |
| SR | 3018 | 18.2% | |
| ER | 2515 | 15.2% |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 2.212121212 |
| Min length | 2 |
latitude
Real number (ℝ≥0)
| Distinct count | 33 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.17822023487879 |
|---|---|
| Minimum | 8.900372741 |
| Maximum | 33.45 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 129.7 KiB |
Quantile statistics
| Minimum | 8.900372741 |
|---|---|
| 5-th percentile | 11.93499371 |
| Q1 | 19.82042971 |
| median | 23.83540428 |
| Q3 | 27.3333303 |
| 95-th percentile | 31.51997398 |
| Maximum | 33.45 |
| Range | 24.54962726 |
| Interquartile range (IQR) | 7.51290059 |
Descriptive statistics
| Standard deviation | 6.146575264 |
|---|---|
| Coefficient of variation (CV) | 0.2651875425 |
| Kurtosis | -0.4589125045 |
| Mean | 23.17822023 |
| Median Absolute Deviation (MAD) | 3.76457641 |
| Skewness | -0.5614781947 |
| Sum | 384735.2777 |
| Variance | 37.78038748 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 11.93499371 | 503 | 3.0% | |
| 31.10002545 | 503 | 3.0% | |
| 14.7504291 | 503 | 3.0% | |
| 26.7499809 | 503 | 3.0% | |
| 19.25023195 | 503 | 3.0% | |
| 12.57038129 | 503 | 3.0% | |
| 27.59998069 | 503 | 3.0% | |
| 27.10039878 | 503 | 3.0% | |
| 33.45 | 503 | 3.0% | |
| 20.26657819 | 503 | 3.0% | |
| Other values (23) | 11569 | 69.7% |
| Value | Count | Frequency (%) | |
| 8.900372741 | 503 | 3.0% | |
| 11.93499371 | 503 | 3.0% | |
| 12.57038129 | 503 | 3.0% | |
| 12.92038576 | 503 | 3.0% | |
| 14.7504291 | 503 | 3.0% |
| Value | Count | Frequency (%) | |
| 33.45 | 503 | 3.0% | |
| 31.51997398 | 503 | 3.0% | |
| 31.10002545 | 503 | 3.0% | |
| 30.71999697 | 503 | 3.0% | |
| 30.32040895 | 503 | 3.0% |
longitude
Real number (ℝ≥0)
| Distinct count | 32 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 81.79453346151514 |
|---|---|
| Minimum | 71.1924 |
| Maximum | 94.21666744 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 129.7 KiB |
Quantile statistics
| Minimum | 71.1924 |
|---|---|
| 5-th percentile | 73.0166178 |
| Q1 | 76.56999263 |
| median | 78.57002559 |
| Q3 | 88.32994665 |
| 95-th percentile | 94.11657019 |
| Maximum | 94.21666744 |
| Range | 23.02426744 |
| Interquartile range (IQR) | 11.75995402 |
Descriptive statistics
| Standard deviation | 7.258428845 |
|---|---|
| Coefficient of variation (CV) | 0.08873977927 |
| Kurtosis | -1.202815695 |
| Mean | 81.79453346 |
| Median Absolute Deviation (MAD) | 3.93004435 |
| Skewness | 0.5198699519 |
| Sum | 1357707.461 |
| Variance | 52.6847893 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 78.05000565 | 1006 | 6.1% | |
| 79.0193 | 503 | 3.0% | |
| 88.6166475 | 503 | 3.0% | |
| 92.72001461 | 503 | 3.0% | |
| 94.11657019 | 503 | 3.0% | |
| 79.83000037 | 503 | 3.0% | |
| 75.98000281 | 503 | 3.0% | |
| 94.21666744 | 503 | 3.0% | |
| 77.16659704 | 503 | 3.0% | |
| 74.63998124 | 503 | 3.0% | |
| Other values (22) | 11066 | 66.7% |
| Value | Count | Frequency (%) | |
| 71.1924 | 503 | 3.0% | |
| 73.0166178 | 503 | 3.0% | |
| 73.16017493 | 503 | 3.0% | |
| 73.81800065 | 503 | 3.0% | |
| 74.63998124 | 503 | 3.0% |
| Value | Count | Frequency (%) | |
| 94.21666744 | 503 | 3.0% | |
| 94.11657019 | 503 | 3.0% | |
| 93.95001705 | 503 | 3.0% | |
| 93.61660071 | 503 | 3.0% | |
| 92.72001461 | 503 | 3.0% |
| Distinct count | 498 |
|---|---|
| Unique (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 129.7 KiB |
| 09/07/2019 00:00:00 | 66 |
|---|---|
| 11/07/2019 00:00:00 | 66 |
| 08/07/2019 00:00:00 | 66 |
| 10/07/2019 00:00:00 | 66 |
| 12/07/2019 00:00:00 | 66 |
| Other values (493) |
| Value | Count | Frequency (%) | |
| 09/07/2019 00:00:00 | 66 | 0.4% | |
| 11/07/2019 00:00:00 | 66 | 0.4% | |
| 08/07/2019 00:00:00 | 66 | 0.4% | |
| 10/07/2019 00:00:00 | 66 | 0.4% | |
| 12/07/2019 00:00:00 | 66 | 0.4% | |
| 24/04/2019 00:00:00 | 33 | 0.2% | |
| 22/06/2019 00:00:00 | 33 | 0.2% | |
| 05/01/2019 00:00:00 | 33 | 0.2% | |
| 26/10/2019 00:00:00 | 33 | 0.2% | |
| 21/11/2019 00:00:00 | 33 | 0.2% | |
| Other values (488) | 16104 | 97.0% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Usage
Real number (ℝ≥0)
| Distinct count | 3627 |
|---|---|
| Unique (%) | 21.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 103.00186155792517 |
|---|---|
| Minimum | 0.3 |
| Maximum | 522.1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 129.7 KiB |
Quantile statistics
| Minimum | 0.3 |
|---|---|
| 5-th percentile | 1.8 |
| Q1 | 6.7 |
| median | 64.4 |
| Q3 | 173.9 |
| 95-th percentile | 344.75 |
| Maximum | 522.1 |
| Range | 521.8 |
| Interquartile range (IQR) | 167.2 |
Descriptive statistics
| Standard deviation | 116.0440556 |
|---|---|
| Coefficient of variation (CV) | 1.126620955 |
| Kurtosis | 0.8018644468 |
| Mean | 103.0018616 |
| Median Absolute Deviation (MAD) | 60.8 |
| Skewness | 1.243323796 |
| Sum | 1709727.9 |
| Variance | 13466.22285 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2.2 | 328 | 2.0% | |
| 2.1 | 315 | 1.9% | |
| 2.3 | 207 | 1.2% | |
| 1.7 | 178 | 1.1% | |
| 2 | 158 | 1.0% | |
| 1.8 | 156 | 0.9% | |
| 2.4 | 151 | 0.9% | |
| 2.5 | 129 | 0.8% | |
| 1.6 | 119 | 0.7% | |
| 1.9 | 115 | 0.7% | |
| Other values (3617) | 14743 | 88.8% |
| Value | Count | Frequency (%) | |
| 0.3 | 1 | < 0.1% | |
| 0.4 | 1 | < 0.1% | |
| 0.5 | 5 | < 0.1% | |
| 0.6 | 6 | < 0.1% | |
| 0.7 | 9 | 0.1% |
| Value | Count | Frequency (%) | |
| 522.1 | 1 | < 0.1% | |
| 516.4 | 1 | < 0.1% | |
| 515.8 | 1 | < 0.1% | |
| 513.9 | 1 | < 0.1% | |
| 513.6 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| States | Regions | latitude | longitude | Dates | Usage | |
|---|---|---|---|---|---|---|
| 0 | Punjab | NR | 31.519974 | 75.980003 | 02/01/2019 00:00:00 | 119.9 |
| 1 | Haryana | NR | 28.450006 | 77.019991 | 02/01/2019 00:00:00 | 130.3 |
| 2 | Rajasthan | NR | 26.449999 | 74.639981 | 02/01/2019 00:00:00 | 234.1 |
| 3 | Delhi | NR | 28.669993 | 77.230004 | 02/01/2019 00:00:00 | 85.8 |
| 4 | UP | NR | 27.599981 | 78.050006 | 02/01/2019 00:00:00 | 313.9 |
| 5 | Uttarakhand | NR | 30.320409 | 78.050006 | 02/01/2019 00:00:00 | 40.7 |
| 6 | HP | NR | 31.100025 | 77.166597 | 02/01/2019 00:00:00 | 30.0 |
| 7 | J&K | NR | 33.450000 | 76.240000 | 02/01/2019 00:00:00 | 52.5 |
| 8 | Chandigarh | NR | 30.719997 | 76.780006 | 02/01/2019 00:00:00 | 5.0 |
| 9 | Chhattisgarh | WR | 22.090420 | 82.159987 | 02/01/2019 00:00:00 | 78.7 |
Last rows
| States | Regions | latitude | longitude | Dates | Usage | |
|---|---|---|---|---|---|---|
| 16589 | Odisha | ER | 19.820430 | 85.900017 | 05/12/2020 00:00:00 | 95.1 |
| 16590 | West Bengal | ER | 22.580390 | 88.329947 | 05/12/2020 00:00:00 | 110.4 |
| 16591 | Sikkim | ER | 27.333330 | 88.616647 | 05/12/2020 00:00:00 | 1.2 |
| 16592 | Arunachal Pradesh | NER | 27.100399 | 93.616601 | 05/12/2020 00:00:00 | 2.1 |
| 16593 | Assam | NER | 26.749981 | 94.216667 | 05/12/2020 00:00:00 | 20.3 |
| 16594 | Manipur | NER | 24.799971 | 93.950017 | 05/12/2020 00:00:00 | 2.5 |
| 16595 | Meghalaya | NER | 25.570492 | 91.880014 | 05/12/2020 00:00:00 | 5.8 |
| 16596 | Mizoram | NER | 23.710399 | 92.720015 | 05/12/2020 00:00:00 | 1.6 |
| 16597 | Nagaland | NER | 25.666998 | 94.116570 | 05/12/2020 00:00:00 | 2.1 |
| 16598 | Tripura | NER | 23.835404 | 91.279999 | 05/12/2020 00:00:00 | 3.3 |
Most frequent
| States | Regions | latitude | longitude | Dates | Usage | count | |
|---|---|---|---|---|---|---|---|
| 0 | Arunachal Pradesh | NER | 27.100399 | 93.616601 | 08/07/2019 00:00:00 | 1.4 | 2 |
| 1 | Arunachal Pradesh | NER | 27.100399 | 93.616601 | 12/07/2019 00:00:00 | 2.1 | 2 |
| 2 | Meghalaya | NER | 25.570492 | 91.880014 | 10/07/2019 00:00:00 | 4.1 | 2 |
| 3 | Mizoram | NER | 23.710399 | 92.720015 | 09/07/2019 00:00:00 | 1.4 | 2 |
| 4 | Mizoram | NER | 23.710399 | 92.720015 | 10/07/2019 00:00:00 | 1.4 | 2 |
| 5 | Nagaland | NER | 25.666998 | 94.116570 | 10/07/2019 00:00:00 | 1.8 | 2 |
| 6 | Nagaland | NER | 25.666998 | 94.116570 | 12/07/2019 00:00:00 | 2.1 | 2 |
| 7 | Pondy | SR | 11.934994 | 79.830000 | 12/07/2019 00:00:00 | 7.4 | 2 |
| 8 | Sikkim | ER | 27.333330 | 88.616647 | 10/07/2019 00:00:00 | 1.5 | 2 |
| 9 | Tripura | NER | 23.835404 | 91.279999 | 10/07/2019 00:00:00 | 2.9 | 2 |