Overview

Dataset statistics

Number of variables1
Number of observations69240
Missing cells0
Missing cells (%)0.0%
Duplicate rows53884
Duplicate rows (%)77.8%
Total size in memory5.9 MiB
Average record size in memory89.1 B

Variable types

CAT1

Reproduction

Analysis started2020-06-09 05:15:57.002462
Analysis finished2020-06-09 05:15:58.473779
Duration1.47 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 53884 (77.8%) duplicate rows Duplicates
has a high cardinality: 15356 distinct values High cardinality

Variables


Categorical

HIGH CARDINALITY

Distinct count15356
Unique (%)22.2%
Missing0
Missing (%)0.0%
Memory size541.1 KiB
<td></td>
17603
</tr>
 
4573
<td>weekly</td>
 
4179
<td>2018</td>
 
862
<td>2019</td>
 
832
Other values (15351)
41191
ValueCountFrequency (%) 
<td></td>1760325.4%
 
</tr>45736.6%
 
<td>weekly</td>41796.0%
 
<td>2018</td>8621.2%
 
<td>2019</td>8321.2%
 
<td>2017</td>8071.2%
 
<td>2016</td>6951.0%
 
<td>3</td>6510.9%
 
<td>4</td>6390.9%
 
<td>2</td>6050.9%
 
<td>United Kingdom</td>5180.7%
 
<td>France</td>4830.7%
 
<td>5</td>4790.7%
 
<td>7</td>4600.7%
 
<td>9</td>4500.6%
 
<td>10</td>4450.6%
 
<td>6</td>4360.6%
 
<td>8</td>4350.6%
 
<td>11</td>4320.6%
 
<td>Brazil</td>4250.6%
 
<td>12</td>4090.6%
 
<td>monthly</td>3930.6%
 
<td>2015</td>3890.6%
 
<td>1</td>3810.6%
 
<td>2020</td>3800.5%
 
Other values (15331)3127945.2%
 

Length

Max length765
Median length27
Mean length32.03019931
Min length1

Overview of Unicode Properties

Unique unicode characters88
Unique unicode categories (?)13
Unique unicode scripts (?)2
Unique unicode blocks (?)4
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
95166942.9%
 
t1382366.2%
 
d1362486.1%
 
<1288695.8%
 
>1288655.8%
 
/648232.9%
 
-519092.3%
 
"478022.2%
 
e444922.0%
 
l405391.8%
 
1360711.6%
 
n334201.5%
 
0330151.5%
 
i328561.5%
 
2322321.5%
 
s309761.4%
 
a259771.2%
 
=239421.1%
 
r236391.1%
 
b188710.9%
 
m163350.7%
 
u156850.7%
 
3136950.6%
 
4117850.5%
 
c113440.5%
 
Other values (63)1244765.6%
 

Most occurring categories

ValueCountFrequency (%) 
Space Separator95166942.9%
 
Lowercase Letter61699527.8%
 
Math Symbol28170612.7%
 
Decimal Number1796088.1%
 
Other Punctuation1141715.1%
 
Dash Punctuation519092.3%
 
Uppercase Letter216061.0%
 
Connector Punctuation96< 0.1%
 
Other Symbol6< 0.1%
 
Modifier Symbol2< 0.1%
 
Open Punctuation1< 0.1%
 
Close Punctuation1< 0.1%
 
Final Punctuation1< 0.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
<12886945.7%
 
>12886545.7%
 
=239428.5%
 
+30< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
t13823622.4%
 
d13624822.1%
 
e444927.2%
 
l405396.6%
 
n334205.4%
 
i328565.3%
 
s309765.0%
 
a259774.2%
 
r236393.8%
 
b188713.1%
 
m163352.6%
 
u156852.5%
 
c113441.8%
 
j94621.5%
 
o89671.5%
 
y60541.0%
 
w57690.9%
 
k53280.9%
 
f52840.9%
 
g21290.3%
 
h16930.3%
 
p12110.2%
 
v11720.2%
 
z8710.1%
 
x4120.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
951669100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/6482356.8%
 
"4780241.9%
 
.13691.2%
 
:800.1%
 
%30< 0.1%
 
;24< 0.1%
 
&19< 0.1%
 
#6< 0.1%
 
?6< 0.1%
 
!5< 0.1%
 
·2< 0.1%
 
'2< 0.1%
 
…2< 0.1%
 
@1< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-51909100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
13607120.1%
 
03301518.4%
 
23223217.9%
 
3136957.6%
 
4117856.6%
 
9112576.3%
 
7106735.9%
 
8106635.9%
 
6102925.7%
 
599255.5%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
L922942.7%
 
C485822.5%
 
S9664.5%
 
F7863.6%
 
B7413.4%
 
U7173.3%
 
N6783.1%
 
K5422.5%
 
P4352.0%
 
A3751.7%
 
D3001.4%
 
I2781.3%
 
R2661.2%
 
G2621.2%
 
J2381.1%
 
M2361.1%
 
Y1910.9%
 
H1330.6%
 
V1010.5%
 
T800.4%
 
E610.3%
 
Z330.2%
 
O270.1%
 
Q260.1%
 
W250.1%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_96100.0%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`2100.0%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
↵6100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(1100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)1100.0%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
’1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common157917071.2%
 
Latin63860128.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
95166960.3%
 
<1288698.2%
 
>1288658.2%
 
/648234.1%
 
-519093.3%
 
"478023.0%
 
1360712.3%
 
0330152.1%
 
2322322.0%
 
=239421.5%
 
3136950.9%
 
4117850.7%
 
9112570.7%
 
7106730.7%
 
8106630.7%
 
6102920.7%
 
599250.6%
 
.13690.1%
 
_96< 0.1%
 
:80< 0.1%
 
+30< 0.1%
 
%30< 0.1%
 
;24< 0.1%
 
&19< 0.1%
 
#6< 0.1%
 
Other values (11)29< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
t13823621.6%
 
d13624821.3%
 
e444927.0%
 
l405396.3%
 
n334205.2%
 
i328565.1%
 
s309764.9%
 
a259774.1%
 
r236393.7%
 
b188713.0%
 
m163352.6%
 
u156852.5%
 
c113441.8%
 
j94621.5%
 
L92291.4%
 
o89671.4%
 
y60540.9%
 
w57690.9%
 
k53280.8%
 
f52840.8%
 
C48580.8%
 
g21290.3%
 
h16930.3%
 
p12110.2%
 
v11720.2%
 
Other values (27)88271.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2217760> 99.9%
 
Arrows6< 0.1%
 
Punctuation3< 0.1%
 
None2< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
95166942.9%
 
t1382366.2%
 
d1362486.1%
 
<1288695.8%
 
>1288655.8%
 
/648232.9%
 
-519092.3%
 
"478022.2%
 
e444922.0%
 
l405391.8%
 
1360711.6%
 
n334201.5%
 
0330151.5%
 
i328561.5%
 
2322321.5%
 
s309761.4%
 
a259771.2%
 
=239421.1%
 
r236391.1%
 
b188710.9%
 
m163350.7%
 
u156850.7%
 
3136950.6%
 
4117850.5%
 
c113440.5%
 
Other values (59)1244655.6%
 

Most frequent None characters

ValueCountFrequency (%) 
·2100.0%
 

Most frequent Arrows characters

ValueCountFrequency (%) 
↵6100.0%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
…266.7%
 
’133.3%
 

Missing values

Sample

First rows

<!DOCTYPE html>
0<html lang="en">
1<head>
2<meta charset="utf-8">
3<link rel="dns-prefetch" href="https://github.githubassets.com">
4<link rel="dns-prefetch" href="https://avatars0.githubusercontent.com">
5<link rel="dns-prefetch" href="https://avatars1.githubusercontent.com">
6<link rel="dns-prefetch" href="https://avatars2.githubusercontent.com">
7<link rel="dns-prefetch" href="https://avatars3.githubusercontent.com">
8<link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">
9<link rel="dns-prefetch" href="https://user-images.githubusercontent.com/">

Last rows

<!DOCTYPE html>
69230<div class="octocat-spinner my-6 js-details-dialog-spinner"></div>
69231</details-dialog>
69232</details>
69233</template>
69234<div class="Popover js-hovercard-content position-absolute" style="display: none; outline: none;" tabindex="0">
69235<div class="Popover-message Popover-message--bottom-left Popover-message--large Box box-shadow-large" style="width:360px;">
69236</div>
69237</div>
69238</body>
69239</html>

Duplicate rows

Most frequent

<!DOCTYPE html>count
1851<td></td>17603
1891</tr>4573
1886<td>weekly</td>4179
1185<td>2018</td>862
1336<td>2019</td>832
1032<td>2017</td>807
879<td>2016</td>695
1621<td>3</td>651
1636<td>4</td>639
1567<td>2</td>605