Text

Introduction

The function clean_text() cleans text data in a DataFrame column.

Using a default or customized pipeline, the function performs a series of cleaning operations on the data.

The following sections demonstrate the functionality of clean_text().

An example dirty dataset

[1]:
import numpy as np
import pandas as pd
pd.set_option("display.max_colwidth", None)

df = pd.DataFrame(
    {
        "text": [
            "'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.",
            "The cast played Shakespeare.<br /><br />Shakespeare lost.",
            "Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.",
            "[SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}",
            "<a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>",
            "Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.",
            "#GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer.  But does it deserve to be?",
            "Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3",
            123,
            np.nan,
            "NULL",
        ]
    }
)
df
[1]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

1. Default clean_text()

The default pipeline for the clean_text() function is the following:

  1. fillna: Replace all null values with NaN.

  2. lowercase: Convert all characters to lowercase.

  3. remove_digits: Remove numbers.

  4. remove_html Remove HTML tags.

  5. remove_urls: Remove URLs.

  6. remove_punctuation: Remove punctuation marks.

  7. remove_accents: Remove accent marks.

  8. remove_stopwords: Remove stopwords.

  9. remove_whitespace: Remove extra spaces, and tabs and newlines.

[2]:
from dataprep.clean import clean_text
clean_text(df, "text")
[2]:
text
0 zzzzz imdb would allow one word reviews mine would
1 cast played shakespeare shakespeare lost
2 simon desert simon del desierto film directed luis bunuel
3 spoilers think seen film bad acting script effects etc
4 cannes video essay
5 recap thread rottentomatoes excellent panel hosted erikdavis filmfatale nyc ashcrossan
6 gameofthrones season rotten tomatometer deserve
7 come join share thoughts week episode
8
9 NaN
10 NaN

By default, the stopwords removed are the set of words in NLTK’s English stopwords. To remove a different set of words, pass the set into the stopwords parameter.

[3]:
from dataprep.clean import clean_text
clean_text(df, "text", stopwords={"imdb", "film"})
[3]:
text
0 zzzzz if would allow one word reviews that s what mine would be
1 the cast played shakespeare shakespeare lost
2 simon of the desert simon del desierto is a directed by luis bunuel
3 spoilers i don t think i ve seen a this bad before acting script effects etc
4 cannes a video essay
5 recap thread for rottentomatoes excellent panel hosted by erikdavis with filmfatale nyc and ashcrossan
6 gameofthrones season is rotten at on the tomatometer but does it deserve to be
7 come join and share your thoughts on this week s episode
8
9 NaN
10 NaN

2. Custom pipeline

Users can pass in a custom pipeline to clean_text() using the pipeline parameter.

[4]:
custom_pipeline = [
    {"operator": "lowercase"},
    {"operator": "remove_digits"},
    {"operator": "remove_whitespace"},
]
clean_text(df, "text", pipeline=custom_pipeline)
[4]:
text
0 'zzzzz!' if imdb would allow one-word reviews, that's what mine would be.
1 the cast played shakespeare.<br /><br />shakespeare lost.
2 simon of the desert (simón del desierto) is a film directed by luis buñuel.
3 [spoilers] i don't think i've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes--a-video-essay'>cannes : a video essay</a>
5 recap thread for @rottentomatoes excellent panel, hosted by @erikdavis with @filmfatale_nyc and @ashcrossan.
6 #gameofthrones: season is #rotten at % on the #tomatometer. but does it deserve to be?
7 come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/fakeurl
8
9 NaN
10 null

Users can also define and pass in their own functions using the pipeline parameter.

[5]:
import re

def split(text: str) -> str:
    return str(text).split()

def replace_z(text: str, value: str) -> str:
    return re.sub(r"z", value, str(text), flags=re.I)

custom_pipeline = [
    {"operator": "lowercase"},
    {"operator": "remove_digits"},
    {"operator": split},
    {"operator": replace_z, "parameters": {"value": "*"}},
    {"operator": "remove_whitespace"},
]
clean_text(df, "text", pipeline=custom_pipeline)
[5]:
text
0 ["'*****!'", 'if', 'imdb', 'would', 'allow', 'one-word', 'reviews,', "that's", 'what', 'mine', 'would', 'be.']
1 ['the', 'cast', 'played', 'shakespeare.<br', '/><br', '/>shakespeare', 'lost.']
2 ['simon', 'of', 'the', 'desert', '(simón', 'del', 'desierto)', 'is', 'a', 'film', 'directed', 'by', 'luis', 'buñuel.']
3 ['[spoilers]', 'i', "don't", 'think', "i've", 'seen', 'a', 'film', 'this', 'bad', 'before', '{acting,', 'script,', 'effects', '(!),', 'etc...}']
4 ['<a', "href='/festivals/cannes--a-video-essay'>cannes", ':', 'a', 'video', 'essay</a>']
5 ['recap', 'thread', 'for', '@rottentomatoes', 'excellent', 'panel,', 'hosted', 'by', '@erikdavis', 'with', '@filmfatale_nyc', 'and', '@ashcrossan.']
6 ['#gameofthrones:', 'season', 'is', '#rotten', 'at', '%', 'on', 'the', '#tomatometer.', 'but', 'does', 'it', 'deserve', 'to', 'be?']
7 ['come', 'join', 'and', 'share', 'your', 'thoughts', 'on', 'this', "week's", 'episode:', 'https://twitter.com/i/spaces/fakeurl']
8 []
9 ['nan']
10 ['null']

In general, custom pipelines can be defined using the form:

[6]:
custom_pipeline = [
    {
        "operator": "<operator_name>",
        "parameters": {"<parameter_name>": "<parameter_value>"},
    }
]

To get the default pipeline in the form of a list, call default_text_pipeline().

This can be used as a template to build a list of cleaning operations to be passed into the pipeline parameter.

[7]:
from dataprep.clean import default_text_pipeline
default_text_pipeline()
[7]:
[{'operator': 'fillna'},
 {'operator': 'lowercase'},
 {'operator': 'remove_digits'},
 {'operator': 'remove_html'},
 {'operator': 'remove_urls'},
 {'operator': 'remove_punctuation'},
 {'operator': 'remove_accents'},
 {'operator': 'remove_stopwords', 'parameters': {'stopwords': None}},
 {'operator': 'remove_whitespace'}]

3. Built-in functions

This section demonstrates the built-in cleaning operations which can be called using the pipeline parameter.

clean_text() assumes the DataFrame column contains text data. As such, any int values will be cast to str after applying a cleaning function.

fillna

By default, fillna replaces all null values with NaN.

[8]:
custom_pipeline = [{"operator": "fillna"}]
clean_text(df, "text", pipeline=custom_pipeline)
[8]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NaN

To specify a specific value to replace null values, use the value parameter.

[9]:
custom_pipeline = [{"operator": "fillna", "parameters": {"value": "<NAN>"}}]
clean_text(df, "text", pipeline=custom_pipeline)
[9]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 <NAN>
10 <NAN>

lowercase

Convert all characters to lowercase.

[10]:
custom_pipeline = [{"operator": "lowercase"}]
clean_text(df, "text", pipeline=custom_pipeline)
[10]:
text
0 'zzzzz!' if imdb would allow one-word reviews, that's what mine would be.
1 the cast played shakespeare.<br /><br />shakespeare lost.
2 simon of the desert (simón del desierto) is a 1965 film directed by luis buñuel.
3 [spoilers]\ni don't think i've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>cannes 1968:\ta video essay</a>
5 recap thread for @rottentomatoes excellent panel, hosted by @erikdavis with @filmfatale_nyc and @ashcrossan.
6 #gameofthrones: season 8 is #rotten at 54% on the #tomatometer. but does it deserve to be?
7 come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2url3
8 123
9 NaN
10 null

sentence_case

Convert the first character of the string to uppercase and all remaining characters to lowercase.

[11]:
custom_pipeline = [{"operator": "sentence_case"}]
clean_text(df, "text", pipeline=custom_pipeline)
[11]:
text
0 'zzzzz!' if imdb would allow one-word reviews, that's what mine would be.
1 The cast played shakespeare.<br /><br />shakespeare lost.
2 Simon of the desert (simón del desierto) is a 1965 film directed by luis buñuel.
3 [spoilers]\ni don't think i've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>cannes 1968:\ta video essay</a>
5 Recap thread for @rottentomatoes excellent panel, hosted by @erikdavis with @filmfatale_nyc and @ashcrossan.
6 #gameofthrones: season 8 is #rotten at 54% on the #tomatometer. but does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2url3
8 123
9 NaN
10 Null

title_case

Convert the first character of each word to uppercase and the remaining words to lowercase.

[12]:
custom_pipeline = [{"operator": "title_case"}]
clean_text(df, "text", pipeline=custom_pipeline)
[12]:
text
0 'Zzzzz!' If Imdb Would Allow One-Word Reviews, That'S What Mine Would Be.
1 The Cast Played Shakespeare.<Br /><Br />Shakespeare Lost.
2 Simon Of The Desert (Simón Del Desierto) Is A 1965 Film Directed By Luis Buñuel.
3 [Spoilers]\nI Don'T Think I'Ve Seen A Film This Bad Before {Acting, Script, Effects (!), Etc...}
4 <A Href='/Festivals/Cannes-1968-A-Video-Essay'>Cannes 1968:\tA Video Essay</A>
5 Recap Thread For @Rottentomatoes Excellent Panel, Hosted By @Erikdavis With @Filmfatale_Nyc And @Ashcrossan.
6 #Gameofthrones: Season 8 Is #Rotten At 54% On The #Tomatometer. But Does It Deserve To Be?
7 Come Join And Share Your Thoughts On This Week'S Episode: Https://Twitter.Com/I/Spaces/1Fake2Url3
8 123
9 NaN
10 Null

uppercase

Convert all characters to uppercase.

[13]:
custom_pipeline = [{"operator": "uppercase"}]
clean_text(df, "text", pipeline=custom_pipeline)
[13]:
text
0 'ZZZZZ!' IF IMDB WOULD ALLOW ONE-WORD REVIEWS, THAT'S WHAT MINE WOULD BE.
1 THE CAST PLAYED SHAKESPEARE.<BR /><BR />SHAKESPEARE LOST.
2 SIMON OF THE DESERT (SIMÓN DEL DESIERTO) IS A 1965 FILM DIRECTED BY LUIS BUÑUEL.
3 [SPOILERS]\nI DON'T THINK I'VE SEEN A FILM THIS BAD BEFORE {ACTING, SCRIPT, EFFECTS (!), ETC...}
4 <A HREF='/FESTIVALS/CANNES-1968-A-VIDEO-ESSAY'>CANNES 1968:\tA VIDEO ESSAY</A>
5 RECAP THREAD FOR @ROTTENTOMATOES EXCELLENT PANEL, HOSTED BY @ERIKDAVIS WITH @FILMFATALE_NYC AND @ASHCROSSAN.
6 #GAMEOFTHRONES: SEASON 8 IS #ROTTEN AT 54% ON THE #TOMATOMETER. BUT DOES IT DESERVE TO BE?
7 COME JOIN AND SHARE YOUR THOUGHTS ON THIS WEEK'S EPISODE: HTTPS://TWITTER.COM/I/SPACES/1FAKE2URL3
8 123
9 NaN
10 NULL

remove_accents

Remove accents (diacritic marks) from the text.

[14]:
custom_pipeline = [{"operator": "remove_accents"}]
clean_text(df, "text", pipeline=custom_pipeline)
[14]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simon del desierto) is a 1965 film directed by Luis Bunuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

remove_bracketed

Remove text between brackets.

The style of the brackets can be specified using the brackets parameter:

  • “angle”: <>

  • “curly”: {}

  • “round”: ()

  • “square”: []

By default, the inclusive parameter is set to True and the brackets are removed along with the text in between.

[15]:
custom_pipeline = [
    {"operator": "remove_bracketed", "parameters": {"brackets": "round"}}
]
clean_text(df, "text", pipeline=custom_pipeline)
[15]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects , etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To remove the text but keep the brackets, set inclusive to False.

[16]:
custom_pipeline = [
    {
        "operator": "remove_bracketed",
        "parameters": {"brackets": "round", "inclusive": False},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[16]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert () is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

The brackets parameter can also take in a set, which allows multiple bracket styles to be specified at a time.

[17]:
custom_pipeline = [
    {
        "operator": "remove_bracketed",
        "parameters": {"brackets": {"angle", "curly", "round", "square"}},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[17]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.Shakespeare lost.
2 Simon of the Desert is a 1965 film directed by Luis Buñuel.
3 \nI don't think I've seen a film this bad before
4 Cannes 1968:\tA video essay
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

remove_digits

Remove all digits.

[18]:
custom_pipeline = [{"operator": "remove_digits"}]
clean_text(df, "text", pipeline=custom_pipeline)
[18]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes--a-video-essay'>Cannes :\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season is #Rotten at % on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/fakeURL
8
9 NaN
10 NULL

remove_html

Remove HTML tags, including the non-breaking space &nbsp;.

[19]:
custom_pipeline = [{"operator": "remove_html"}]
clean_text(df, "text", pipeline=custom_pipeline)
[19]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 Cannes 1968:\tA video essay
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

remove_prefixed

Remove substrings that start with the prefix(es) specified in the prefix parameter.

[20]:
custom_pipeline = [{"operator": "remove_prefixed", "parameters": {"prefix": "#"}}]
clean_text(df, "text", pipeline=custom_pipeline)
[20]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 Season 8 is at 54% on the But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To specify multiple prefixes, pass in a set of the prefixes to the prefix parameter.

[21]:
custom_pipeline = [
    {"operator": "remove_prefixed", "parameters": {"prefix": {"#", "@"}}}
]
clean_text(df, "text", pipeline=custom_pipeline)
[21]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for excellent panel, hosted by with and
6 Season 8 is at 54% on the But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

remove_puncutation

Remove all punctuation marks defined in Python’s string.punctuation.

[22]:
custom_pipeline = [{"operator": "remove_punctuation"}]
clean_text(df, "text", pipeline=custom_pipeline)
[22]:
text
0 ZZZZZ If IMDb would allow one word reviews that s what mine would be
1 The cast played Shakespeare br br Shakespeare lost
2 Simon of the Desert Simón del desierto is a 1965 film directed by Luis Buñuel
3 SPOILERS \nI don t think I ve seen a film this bad before acting script effects etc
4 a href festivals cannes 1968 a video essay Cannes 1968 \tA video essay a
5 Recap thread for RottenTomatoes excellent panel hosted by ErikDavis with FilmFatale NYC and AshCrossan
6 GameOfThrones Season 8 is Rotten at 54 on the Tomatometer But does it deserve to be
7 Come join and share your thoughts on this week s episode https twitter com i spaces 1fake2URL3
8 123
9 NaN
10 NULL

remove_stopwords

Remove common words. By default, the set of stopwords to remove is NLTK’s English stopwords.

[23]:
custom_pipeline = [{"operator": "remove_stopwords"}]
clean_text(df, "text", pipeline=custom_pipeline)
[23]:
text
0 'ZZZZZ!' IMDb would allow one-word reviews, that's mine would be.
1 cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon Desert (Simón del desierto) 1965 film directed Luis Buñuel.
3 [SPOILERS] think I've seen film bad {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: video essay</a>
5 Recap thread @RottenTomatoes excellent panel, hosted @ErikDavis @FilmFatale_NYC @AshCrossan.
6 #GameOfThrones: Season 8 #Rotten 54% #Tomatometer. deserve be?
7 Come join share thoughts week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To use a custom set of words, pass the set into the stopwords parameter.

[24]:
custom_pipeline = [
    {"operator": "remove_stopwords", "parameters": {"stopwords": {"imdb", "film"}}}
]
clean_text(df, "text", pipeline=custom_pipeline)
[24]:
text
0 'ZZZZZ!' If would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 directed by Luis Buñuel.
3 [SPOILERS] I don't think I've seen a this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: A video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

Alternatively, expand upon the default set of stopwords by importing dataprep.assets.english_stopwords and adding custom words.

[25]:
from dataprep.assets.english_stopwords import english_stopwords
custom_stopwords = english_stopwords.copy()
custom_stopwords.add("imdb")
custom_stopwords.add("film")

custom_pipeline = [
    {
        "operator": "remove_stopwords",
        "parameters": {"stopwords": custom_stopwords},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[25]:
text
0 'ZZZZZ!' would allow one-word reviews, that's mine would be.
1 cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon Desert (Simón del desierto) 1965 directed Luis Buñuel.
3 [SPOILERS] think I've seen bad {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: video essay</a>
5 Recap thread @RottenTomatoes excellent panel, hosted @ErikDavis @FilmFatale_NYC @AshCrossan.
6 #GameOfThrones: Season 8 #Rotten 54% #Tomatometer. deserve be?
7 Come join share thoughts week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

remove_urls

Remove URLs. Substrings that start with “http” or “www” are considered URLs.

[26]:
custom_pipeline = [{"operator": "remove_urls"}]
clean_text(df, "text", pipeline=custom_pipeline)
[26]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode:
8 123
9 NaN
10 NULL

remove_whitespace

Remove extra spaces (two or more) along with tabs and newlines. Leading and trailing spaces are also removed.

[27]:
custom_pipeline = [{"operator": "remove_whitespace"}]
clean_text(df, "text", pipeline=custom_pipeline)
[27]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS] I don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: A video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

replace_bracketed

Replace text between brackets with the value.

The style of the brackets can be specified using the brackets parameter:

  • “angle”: <>

  • “curly”: {}

  • “round”: ()

  • “square”: []

By default, the inclusive parameter is set to True and the brackets are also replaced by the value along with the text in between.

[28]:
custom_pipeline = [
    {
        "operator": "replace_bracketed",
        "parameters": {"brackets": "square", "value": "**SPOILERS**"},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[28]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 **SPOILERS**\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To replace the text, but keep the brackets, set inclusive to False.

[29]:
custom_pipeline = [
    {
        "operator": "replace_bracketed",
        "parameters": {
            "brackets": "square",
            "value": "**SPOILERS**",
            "inclusive": False,
        },
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[29]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [**SPOILERS**]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

The brackets parameter can also take in a set, which allows multiple bracket styles to be specified at a time.

[30]:
custom_pipeline = [
    {
        "operator": "replace_bracketed",
        "parameters": {
            "brackets": {"angle", "curly", "round", "square"},
            "value": "<REDACTED>",
        },
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[30]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<REDACTED><REDACTED>Shakespeare lost.
2 Simon of the Desert <REDACTED> is a 1965 film directed by Luis Buñuel.
3 <REDACTED>\nI don't think I've seen a film this bad before <REDACTED>
4 <REDACTED>Cannes 1968:\tA video essay<REDACTED>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To assign different replacement values to different bracket styles, chain together replace_bracketed operations.

[31]:
custom_pipeline = [
    {
        "operator": "replace_bracketed",
        "parameters": {
            "brackets": "square",
            "value": "**SPOILERS**",
        },
    },
    {
        "operator": "replace_bracketed",
        "parameters": {
            "brackets": "curly",
            "value": "in every aspect.",
        },
    },
]
clean_text(df, "text", pipeline=custom_pipeline)
[31]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 **SPOILERS**\nI don't think I've seen a film this bad before in every aspect.
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

replace_digits

Replace all digits with the value. By default, the block parameter is set to True and only blocks of digits, i.e. tokens composed solely of numbers, are removed.

[32]:
custom_pipeline = [{"operator": "replace_digits", "parameters": {"value": "X"}}]
clean_text(df, "text", pipeline=custom_pipeline)
[32]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a X film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-X-a-video-essay'>Cannes X:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season X is #Rotten at X% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 X
9 NaN
10 NULL

To replace all digits appearing in the text, set block to False.

[33]:
custom_pipeline = [
    {"operator": "replace_digits", "parameters": {"value": "X", "block": False}}
]
clean_text(df, "text", pipeline=custom_pipeline)
[33]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a X film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-X-a-video-essay'>Cannes X:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season X is #Rotten at X% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/XfakeXURLX
8 X
9 NaN
10 NULL

replace_prefixed

Replace all substrings that start with the prefix(es) specified in the prefix parameter with the value.

[34]:
custom_pipeline = [
    {
        "operator": "replace_prefixed",
        "parameters": {"prefix": "#", "value": "<HASHTAG>"},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[34]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 <HASHTAG> Season 8 is <HASHTAG> at 54% on the <HASHTAG> But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To replace substrings of different prefixes with the same value, pass in a set of the prefixes to the prefix parameter.

[35]:
custom_pipeline = [
    {
        "operator": "replace_prefixed",
        "parameters": {"prefix": {"#", "@"}, "value": "<TAG>"},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[35]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for <TAG> excellent panel, hosted by <TAG> with <TAG> and <TAG>
6 <TAG> Season 8 is <TAG> at 54% on the <TAG> But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To replace different prefixed substrings with different values, chain together replace_prefixed operations.

[36]:
custom_pipeline = [
    {
        "operator": "replace_prefixed",
        "parameters": {"prefix": "#", "value": "<HASHTAG>"},
    },
    {
        "operator": "replace_prefixed",
        "parameters": {"prefix": "@", "value": "<MENTION>"},
    },
]
clean_text(df, "text", pipeline=custom_pipeline)
[36]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for <MENTION> excellent panel, hosted by <MENTION> with <MENTION> and <MENTION>
6 <HASHTAG> Season 8 is <HASHTAG> at 54% on the <HASHTAG> But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

replace_punctuation

Replace all punctuation marks defined in string.punctuation with the value.

[37]:
custom_pipeline = [
    {"operator": "replace_punctuation", "parameters": {"value": "<PUNC>"}}
]
clean_text(df, "text", pipeline=custom_pipeline)
[37]:
text
0 <PUNC>ZZZZZ<PUNC><PUNC> If IMDb would allow one<PUNC>word reviews<PUNC> that<PUNC>s what mine would be<PUNC>
1 The cast played Shakespeare<PUNC><PUNC>br <PUNC><PUNC><PUNC>br <PUNC><PUNC>Shakespeare lost<PUNC>
2 Simon of the Desert <PUNC>Simón del desierto<PUNC> is a 1965 film directed by Luis Buñuel<PUNC>
3 <PUNC>SPOILERS<PUNC>\nI don<PUNC>t think I<PUNC>ve seen a film this bad before <PUNC>acting<PUNC> script<PUNC> effects <PUNC><PUNC><PUNC><PUNC> etc<PUNC><PUNC><PUNC><PUNC>
4 <PUNC>a href<PUNC><PUNC><PUNC>festivals<PUNC>cannes<PUNC>1968<PUNC>a<PUNC>video<PUNC>essay<PUNC><PUNC>Cannes 1968<PUNC>\tA video essay<PUNC><PUNC>a<PUNC>
5 Recap thread for <PUNC>RottenTomatoes excellent panel<PUNC> hosted by <PUNC>ErikDavis with <PUNC>FilmFatale<PUNC>NYC and <PUNC>AshCrossan<PUNC>
6 <PUNC>GameOfThrones<PUNC> Season 8 is <PUNC>Rotten at 54<PUNC> on the <PUNC>Tomatometer<PUNC> But does it deserve to be<PUNC>
7 Come join and share your thoughts on this week<PUNC>s episode<PUNC> https<PUNC><PUNC><PUNC>twitter<PUNC>com<PUNC>i<PUNC>spaces<PUNC>1fake2URL3
8 123
9 NaN
10 NULL

replace_stopwords

Replace common words with the value. By default, the set of stopwords to replace is NLTK’s English stopwords.

[38]:
custom_pipeline = [{"operator": "replace_stopwords", "parameters": {"value": "<S>"}}]
clean_text(df, "text", pipeline=custom_pipeline)
[38]:
text
0 'ZZZZZ!' <S> IMDb would allow one-word reviews, that's <S> mine would be.
1 <S> cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon <S> <S> Desert (Simón del desierto) <S> <S> 1965 film directed <S> Luis Buñuel.
3 [SPOILERS] <S> <S> think I've seen <S> film <S> bad <S> {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: <S> video essay</a>
5 Recap thread <S> @RottenTomatoes excellent panel, hosted <S> @ErikDavis <S> @FilmFatale_NYC <S> @AshCrossan.
6 #GameOfThrones: Season 8 <S> #Rotten <S> 54% <S> <S> #Tomatometer. <S> <S> <S> deserve <S> be?
7 Come join <S> share <S> thoughts <S> <S> week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To use a custom set of words, pass the set into the stopwords parameter.

[39]:
custom_pipeline = [
    {
        "operator": "replace_stopwords",
        "parameters": {"stopwords": {"imdb", "film"}, "value": "<S>"},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[39]:
text
0 'ZZZZZ!' If <S> would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 <S> directed by Luis Buñuel.
3 [SPOILERS] I don't think I've seen a <S> this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: A video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

Alternatively, expand upon the default set of stopwords by importing dataprep.assets.english_stopwords and adding custom words.

[40]:
from dataprep.assets.english_stopwords import english_stopwords
custom_stopwords = english_stopwords.copy()
custom_stopwords.add("imdb")
custom_stopwords.add("film")

custom_pipeline = [
    {
        "operator": "replace_stopwords",
        "parameters": {
            "stopwords": custom_stopwords,
            "value": "<S>"
        },
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[40]:
text
0 'ZZZZZ!' <S> <S> would allow one-word reviews, that's <S> mine would be.
1 <S> cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon <S> <S> Desert (Simón del desierto) <S> <S> 1965 <S> directed <S> Luis Buñuel.
3 [SPOILERS] <S> <S> think I've seen <S> <S> <S> bad <S> {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968: <S> video essay</a>
5 Recap thread <S> @RottenTomatoes excellent panel, hosted <S> @ErikDavis <S> @FilmFatale_NYC <S> @AshCrossan.
6 #GameOfThrones: Season 8 <S> #Rotten <S> 54% <S> <S> #Tomatometer. <S> <S> <S> deserve <S> be?
7 Come join <S> share <S> thoughts <S> <S> week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

replace_text

Replace a sequence of characters with another according to the mapping specified in the value parameter. By default, block is set to True and only blocks of text, i.e. tokens composed solely of the specified sequence of characters, are replaced.

[41]:
custom_pipeline = [
    {
        "operator": "replace_text",
        "parameters": {"value": {"imdb": "Netflix", "film": "movie"}},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[41]:
text
0 'ZZZZZ!' If Netflix would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 movie directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a movie this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

To replace the sequence of characters wherever they appear in the text, set block to False.

[42]:
custom_pipeline = [
    {
        "operator": "replace_text",
        "parameters": {"value": {"imdb": "Netflix", "film": "movie"}, "block": False},
    }
]
clean_text(df, "text", pipeline=custom_pipeline)
[42]:
text
0 'ZZZZZ!' If Netflix would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 movie directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a movie this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @movieFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: https://twitter.com/i/spaces/1fake2URL3
8 123
9 NaN
10 NULL

replace_urls

Replace URLs with the value. Substrings that start with “http” or “www” are considered URLs.

[43]:
custom_pipeline = [{"operator": "replace_urls", "parameters": {"value": "<URL>"}}]
clean_text(df, "text", pipeline=custom_pipeline)
[43]:
text
0 'ZZZZZ!' If IMDb would allow one-word reviews, that's what mine would be.
1 The cast played Shakespeare.<br /><br />Shakespeare lost.
2 Simon of the Desert (Simón del desierto) is a 1965 film directed by Luis Buñuel.
3 [SPOILERS]\nI don't think I've seen a film this bad before {acting, script, effects (!), etc...}
4 <a href='/festivals/cannes-1968-a-video-essay'>Cannes 1968:\tA video essay</a>
5 Recap thread for @RottenTomatoes excellent panel, hosted by @ErikDavis with @FilmFatale_NYC and @AshCrossan.
6 #GameOfThrones: Season 8 is #Rotten at 54% on the #Tomatometer. But does it deserve to be?
7 Come join and share your thoughts on this week's episode: <URL>
8 123
9 NaN
10 NULL