Dataset statistics
| Number of variables | 1 |
|---|---|
| Number of observations | 59079 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 31785 |
| Duplicate rows (%) | 53.8% |
| Total size in memory | 5.2 MiB |
| Average record size in memory | 91.7 B |
Variable types
| CAT | 1 |
|---|
Reproduction
| Analysis started | 2020-06-04 22:28:50.050734 |
|---|---|
| Analysis finished | 2020-06-04 22:28:51.972652 |
| Duration | 1.92 second |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Distinct count | 27294 |
|---|---|
| Unique (%) | 46.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 461.7 KiB |
| <td>0</td> | 4631 |
|---|---|
| <td>2020-06-03 02:33:13</td> | 3641 |
| </tr> | 3641 |
| <td>US</td> | 3036 |
| <td></td> | 1668 |
| Other values (27289) |
| Value | Count | Frequency (%) | |
| <td>0</td> | 4631 | 7.8% | |
| <td>2020-06-03 02:33:13</td> | 3641 | 6.2% | |
| </tr> | 3641 | 6.2% | |
| <td>US</td> | 3036 | 5.1% | |
| <td></td> | 1668 | 2.8% | |
| <td>0.0</td> | 1329 | 2.2% | |
| <td>1</td> | 734 | 1.2% | |
| <td>2</td> | 427 | 0.7% | |
| <td>3</td> | 343 | 0.6% | |
| <td>4</td> | 257 | 0.4% | |
| <td>Texas</td> | 235 | 0.4% | |
| <td>5</td> | 204 | 0.3% | |
| <td>7</td> | 185 | 0.3% | |
| <td>6</td> | 182 | 0.3% | |
| <td>Georgia</td> | 163 | 0.3% | |
| <td>8</td> | 153 | 0.3% | |
| <td>9</td> | 144 | 0.2% | |
| <td>Virginia</td> | 133 | 0.2% | |
| <td>12</td> | 126 | 0.2% | |
| <td>Kentucky</td> | 120 | 0.2% | |
| <td>11</td> | 118 | 0.2% | |
| <td>10</td> | 116 | 0.2% | |
| <td>13</td> | 103 | 0.2% | |
| <td>Missouri</td> | 103 | 0.2% | |
| <td>Illinois</td> | 103 | 0.2% | |
| Other values (27269) | 37184 | 62.9% |
Length
| Max length | 765 |
|---|---|
| Median length | 29 |
| Mean length | 34.66646355 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 815590 | 39.8% | ||
| t | 117954 | 5.8% | |
| d | 115593 | 5.6% | |
| < | 110410 | 5.4% | |
| > | 110406 | 5.4% | |
| / | 55609 | 2.7% | |
| 0 | 48383 | 2.4% | |
| 3 | 42375 | 2.1% | |
| 2 | 39159 | 1.9% | |
| - | 38905 | 1.9% | |
| " | 38518 | 1.9% | |
| 1 | 36530 | 1.8% | |
| l | 29696 | 1.4% | |
| e | 28909 | 1.4% | |
| i | 28676 | 1.4% | |
| n | 28344 | 1.4% | |
| s | 27777 | 1.4% | |
| 6 | 25157 | 1.2% | |
| a | 24029 | 1.2% | |
| 4 | 23839 | 1.2% | |
| 5 | 23050 | 1.1% | |
| 7 | 22292 | 1.1% | |
| 9 | 21983 | 1.1% | |
| 8 | 21784 | 1.1% | |
| r | 19398 | 0.9% | |
| Other values (64) | 153694 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Space Separator | 815590 | 39.8% | |
| Lowercase Letter | 503963 | 24.6% | |
| Decimal Number | 304552 | 14.9% | |
| Math Symbol | 240134 | 11.7% | |
| Other Punctuation | 117294 | 5.7% | |
| Dash Punctuation | 38905 | 1.9% | |
| Uppercase Letter | 27407 | 1.3% | |
| Connector Punctuation | 188 | < 0.1% | |
| Open Punctuation | 9 | < 0.1% | |
| Close Punctuation | 9 | < 0.1% | |
| Other Symbol | 6 | < 0.1% | |
| Modifier Symbol | 2 | < 0.1% | |
| Final Punctuation | 1 | < 0.1% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| < | 110410 | 46.0% | |
| > | 110406 | 46.0% | |
| = | 19297 | 8.0% | |
| + | 21 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| t | 117954 | 23.4% | |
| d | 115593 | 22.9% | |
| l | 29696 | 5.9% | |
| e | 28909 | 5.7% | |
| i | 28676 | 5.7% | |
| n | 28344 | 5.6% | |
| s | 27777 | 5.5% | |
| a | 24029 | 4.8% | |
| r | 19398 | 3.8% | |
| b | 15684 | 3.1% | |
| u | 13005 | 2.6% | |
| m | 12614 | 2.5% | |
| c | 9746 | 1.9% | |
| o | 9744 | 1.9% | |
| j | 7616 | 1.5% | |
| f | 4474 | 0.9% | |
| h | 2244 | 0.4% | |
| g | 1740 | 0.3% | |
| p | 1608 | 0.3% | |
| k | 1226 | 0.2% | |
| y | 1054 | 0.2% | |
| v | 995 | 0.2% | |
| w | 762 | 0.2% | |
| x | 756 | 0.2% | |
| z | 255 | 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 815590 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| / | 55609 | 47.4% | |
| " | 38518 | 32.8% | |
| . | 15701 | 13.4% | |
| : | 7351 | 6.3% | |
| ; | 30 | < 0.1% | |
| & | 25 | < 0.1% | |
| % | 25 | < 0.1% | |
| # | 13 | < 0.1% | |
| ? | 6 | < 0.1% | |
| ! | 5 | < 0.1% | |
| * | 4 | < 0.1% | |
| · | 2 | < 0.1% | |
| ' | 2 | < 0.1% | |
| … | 2 | < 0.1% | |
| @ | 1 | < 0.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 38905 | 100.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 48383 | 15.9% | |
| 3 | 42375 | 13.9% | |
| 2 | 39159 | 12.9% | |
| 1 | 36530 | 12.0% | |
| 6 | 25157 | 8.3% | |
| 4 | 23839 | 7.8% | |
| 5 | 23050 | 7.6% | |
| 7 | 22292 | 7.3% | |
| 9 | 21983 | 7.2% | |
| 8 | 21784 | 7.2% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| L | 7700 | 28.1% | |
| C | 4692 | 17.1% | |
| S | 3793 | 13.8% | |
| U | 3212 | 11.7% | |
| M | 980 | 3.6% | |
| I | 617 | 2.3% | |
| N | 526 | 1.9% | |
| T | 511 | 1.9% | |
| G | 494 | 1.8% | |
| O | 463 | 1.7% | |
| W | 433 | 1.6% | |
| A | 428 | 1.6% | |
| D | 427 | 1.6% | |
| V | 426 | 1.6% | |
| B | 424 | 1.5% | |
| K | 376 | 1.4% | |
| H | 372 | 1.4% | |
| P | 365 | 1.3% | |
| R | 289 | 1.1% | |
| F | 260 | 0.9% | |
| J | 217 | 0.8% | |
| E | 180 | 0.7% | |
| Y | 116 | 0.4% | |
| Z | 46 | 0.2% | |
| Q | 32 | 0.1% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 188 | 100.0% |
Most frequent Modifier Symbol characters
| Value | Count | Frequency (%) | |
| ` | 2 | 100.0% |
Most frequent Other Symbol characters
| Value | Count | Frequency (%) | |
| ↵ | 6 | 100.0% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 9 | 100.0% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 9 | 100.0% |
Most frequent Final Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 1516690 | 74.1% | |
| Latin | 531370 | 25.9% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 815590 | 53.8% | ||
| < | 110410 | 7.3% | |
| > | 110406 | 7.3% | |
| / | 55609 | 3.7% | |
| 0 | 48383 | 3.2% | |
| 3 | 42375 | 2.8% | |
| 2 | 39159 | 2.6% | |
| - | 38905 | 2.6% | |
| " | 38518 | 2.5% | |
| 1 | 36530 | 2.4% | |
| 6 | 25157 | 1.7% | |
| 4 | 23839 | 1.6% | |
| 5 | 23050 | 1.5% | |
| 7 | 22292 | 1.5% | |
| 9 | 21983 | 1.4% | |
| 8 | 21784 | 1.4% | |
| = | 19297 | 1.3% | |
| . | 15701 | 1.0% | |
| : | 7351 | 0.5% | |
| _ | 188 | < 0.1% | |
| ; | 30 | < 0.1% | |
| & | 25 | < 0.1% | |
| % | 25 | < 0.1% | |
| + | 21 | < 0.1% | |
| # | 13 | < 0.1% | |
| Other values (12) | 49 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| t | 117954 | 22.2% | |
| d | 115593 | 21.8% | |
| l | 29696 | 5.6% | |
| e | 28909 | 5.4% | |
| i | 28676 | 5.4% | |
| n | 28344 | 5.3% | |
| s | 27777 | 5.2% | |
| a | 24029 | 4.5% | |
| r | 19398 | 3.7% | |
| b | 15684 | 3.0% | |
| u | 13005 | 2.4% | |
| m | 12614 | 2.4% | |
| c | 9746 | 1.8% | |
| o | 9744 | 1.8% | |
| L | 7700 | 1.4% | |
| j | 7616 | 1.4% | |
| C | 4692 | 0.9% | |
| f | 4474 | 0.8% | |
| S | 3793 | 0.7% | |
| U | 3212 | 0.6% | |
| h | 2244 | 0.4% | |
| g | 1740 | 0.3% | |
| p | 1608 | 0.3% | |
| k | 1226 | 0.2% | |
| y | 1054 | 0.2% | |
| Other values (27) | 10842 | 2.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 2048049 | > 99.9% | |
| Arrows | 6 | < 0.1% | |
| Punctuation | 3 | < 0.1% | |
| None | 2 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 815590 | 39.8% | ||
| t | 117954 | 5.8% | |
| d | 115593 | 5.6% | |
| < | 110410 | 5.4% | |
| > | 110406 | 5.4% | |
| / | 55609 | 2.7% | |
| 0 | 48383 | 2.4% | |
| 3 | 42375 | 2.1% | |
| 2 | 39159 | 1.9% | |
| - | 38905 | 1.9% | |
| " | 38518 | 1.9% | |
| 1 | 36530 | 1.8% | |
| l | 29696 | 1.4% | |
| e | 28909 | 1.4% | |
| i | 28676 | 1.4% | |
| n | 28344 | 1.4% | |
| s | 27777 | 1.4% | |
| 6 | 25157 | 1.2% | |
| a | 24029 | 1.2% | |
| 4 | 23839 | 1.2% | |
| 5 | 23050 | 1.1% | |
| 7 | 22292 | 1.1% | |
| 9 | 21983 | 1.1% | |
| 8 | 21784 | 1.1% | |
| r | 19398 | 0.9% | |
| Other values (60) | 153683 | 7.5% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| · | 2 | 100.0% |
Most frequent Arrows characters
| Value | Count | Frequency (%) | |
| ↵ | 6 | 100.0% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| … | 2 | 66.7% | |
| ’ | 1 | 33.3% |
First rows
| <!DOCTYPE html> | |
|---|---|
| 0 | <html lang="en"> |
| 1 | <head> |
| 2 | <meta charset="utf-8"> |
| 3 | <link rel="dns-prefetch" href="https://github.githubassets.com"> |
| 4 | <link rel="dns-prefetch" href="https://avatars0.githubusercontent.com"> |
| 5 | <link rel="dns-prefetch" href="https://avatars1.githubusercontent.com"> |
| 6 | <link rel="dns-prefetch" href="https://avatars2.githubusercontent.com"> |
| 7 | <link rel="dns-prefetch" href="https://avatars3.githubusercontent.com"> |
| 8 | <link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com"> |
| 9 | <link rel="dns-prefetch" href="https://user-images.githubusercontent.com/"> |
Last rows
| <!DOCTYPE html> | |
|---|---|
| 59069 | <div class="octocat-spinner my-6 js-details-dialog-spinner"></div> |
| 59070 | </details-dialog> |
| 59071 | </details> |
| 59072 | </template> |
| 59073 | <div class="Popover js-hovercard-content position-absolute" style="display: none; outline: none;" tabindex="0"> |
| 59074 | <div class="Popover-message Popover-message--bottom-left Popover-message--large Box box-shadow-large" style="width:360px;"> |
| 59075 | </div> |
| 59076 | </div> |
| 59077 | </body> |
| 59078 | </html> |
Most frequent
| <!DOCTYPE html> | count | |
|---|---|---|
| 46 | <td>0</td> | 4631 |
| 399 | <td>2020-06-03 02:33:13</td> | 3641 |
| 1688 | </tr> | 3641 |
| 1623 | <td>US</td> | 3036 |
| 1028 | <td></td> | 1668 |
| 25 | <td>0.0</td> | 1329 |
| 360 | <td>1</td> | 734 |
| 526 | <td>2</td> | 427 |
| 643 | <td>3</td> | 343 |
| 738 | <td>4</td> | 257 |