Project: Wrangle and Analyze Data

Table of Contents

Introduction

We wrangle and analyze the WeRateDogs data on Twitter to find interesting insights. We get some of the data from the WeRateDogs Twitter archive and download additional data using the Twitter API based on the tweets that are already available in the archive.
We only work with Tweets that have images attached to it and are not retweets to do the analysis.

Data Wrangling

Gather

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import tweepy
import json
import re

pd.set_option('display.max_colwidth', None)
In [2]:
# WeRateDogs Twitter Archive

# load the WeRateDogs Twitter Archive provided
archive_df = pd.read_csv('twitter-archive-enhanced.csv')
In [3]:
# Predicted Images From Neural Network

# download Image Predictions
images_url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(images_url)

# # write response to file
with open('image_predictions.tsv', mode='wb') as file:
    file.write(response.content)

# Load Predicted images into a dataframe
image_predictions_df = pd.read_csv('image_predictions.tsv', sep='\t')
In [4]:
# # Download additional data from twitter using the Twitter API

# # list of tweet ids
tweet_ids = archive_df['tweet_id'].values

# # initialize twitter api object
consumer_key = 'HIDDEN'
consumer_secret = 'HIDDEN'
access_token = 'HIDDEN'
access_token_secret = 'HIDDEN'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth_handler=auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# # download tweet data and write to file
failed_ids = []
with open('tweet_json.txt', 'w') as file:
    for id in tweet_ids:
        try:
            tweet = api.get_status(id, tweet_mode='extended')
            json.dump(tweet._json, file)
            file.write('\n')
        except tweepy.TweepError as e:
            print(e)
            failed_ids.append(id)
            pass

# read and load downloaded tweet data into a dataframe
tweet_data = []
with open('tweet_json.txt') as file:
    for line in file:
        tweet_data.append(json.loads(line))

tweet_df = pd.DataFrame(tweet_data)
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 179, 'message': 'Sorry, you are not authorized to see this status.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
[{'code': 144, 'message': 'No status found with that ID.'}]
Rate limit reached. Sleeping for: 101
[{'code': 144, 'message': 'No status found with that ID.'}]

Assess

In [5]:
archive_df.shape
Out[5]:
(2356, 17)
In [6]:
archive_df.head()
Out[6]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU NaN NaN NaN https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV NaN NaN NaN https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB NaN NaN NaN https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ NaN NaN NaN https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f NaN NaN NaN https://twitter.com/dog_rates/status/891327558926688256/photo/1,https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin None None None None
In [7]:
archive_df.tail()
Out[7]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq NaN NaN NaN https://twitter.com/dog_rates/status/666049248165822465/photo/1 5 10 None None None None None
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx NaN NaN NaN https://twitter.com/dog_rates/status/666044226329800704/photo/1 6 10 a None None None None
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR NaN NaN NaN https://twitter.com/dog_rates/status/666033412701032449/photo/1 9 10 a None None None None
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI NaN NaN NaN https://twitter.com/dog_rates/status/666029285002620928/photo/1 7 10 a None None None None
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Here we have a Japanese Irish Setter. Lost eye in Vietnam (?). Big fan of relaxing on stair. 8/10 would pet https://t.co/BLDqew2Ijj NaN NaN NaN https://twitter.com/dog_rates/status/666020888022790149/photo/1 8 10 None None None None None
In [8]:
archive_df.sample(5)
Out[8]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
37 885167619883638784 NaN NaN 2017-07-12 16:03:00 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Here we have a corgi undercover as a malamute. Pawbably doing important investigative work. Zero control over tongue happenings. 13/10 https://t.co/44ItaMubBf NaN NaN NaN https://twitter.com/dog_rates/status/885167619883638784/photo/1,https://twitter.com/dog_rates/status/885167619883638784/photo/1,https://twitter.com/dog_rates/status/885167619883638784/photo/1,https://twitter.com/dog_rates/status/885167619883638784/photo/1 13 10 None None None None None
2088 670792680469889025 NaN NaN 2015-11-29 02:33:32 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Antony. He's a Sheraton Tetrahedron. Skips awkwardly. Doesn't look when he crosses the road (reckless). 7/10 https://t.co/gTy4WMXu8l NaN NaN NaN https://twitter.com/dog_rates/status/670792680469889025/photo/1 7 10 Antony None None None None
1760 678708137298427904 NaN NaN 2015-12-20 22:46:44 +0000 <a href="http://vine.co" rel="nofollow">Vine - Make a Scene</a> Here we are witnessing a wild field pupper. Lost his wallet in there. Rather unfortunate. 10/10 good luck pup https://t.co/sZy9Co74Bw NaN NaN NaN https://vine.co/v/eQjxxYaQ60K 10 10 None None None pupper None
1714 680440374763077632 NaN NaN 2015-12-25 17:30:01 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Merry Christmas. My gift to you is this tiny unicorn running into a wall in slow motion. 11/10 https://t.co/UKqIAnR3He NaN NaN NaN https://twitter.com/dog_rates/status/680440374763077632/video/1 11 10 None None None None None
1272 709225125749587968 NaN NaN 2016-03-14 03:50:21 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This is Walker. He's a Butternut Khalifa. Appears fuzzy af. 11/10 would hug for a ridiculous amount of time https://t.co/k6fEWHSALn NaN NaN NaN https://twitter.com/dog_rates/status/709225125749587968/photo/1 11 10 Walker None None None None
In [9]:
# Source containing 2 variables i.e the source_name and the source_link
archive_df.source.head(1)
Out[9]:
0    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
Name: source, dtype: object
In [10]:
# Duplicates in expanded_urls
archive_df['expanded_urls'].sample(15)
Out[10]:
400                                                                     https://twitter.com/dog_rates/status/824775126675836928/photo/1,https://twitter.com/dog_rates/status/824775126675836928/photo/1
2155                                                                                                                                    https://twitter.com/dog_rates/status/669603084620980224/photo/1
275                                                                                                                                     https://twitter.com/dog_rates/status/840696689258311684/photo/1
2156                                                                                                                                    https://twitter.com/dog_rates/status/669597912108789760/photo/1
633                                                                                                                                     https://twitter.com/dog_rates/status/793845145112371200/photo/1
756     https://twitter.com/dog_rates/status/778650543019483137/photo/1,https://twitter.com/dog_rates/status/778650543019483137/photo/1,https://twitter.com/dog_rates/status/778650543019483137/photo/1
1492                                                                                                                                    https://twitter.com/dog_rates/status/692828166163931137/photo/1
1036    https://twitter.com/dog_rates/status/744971049620602880/photo/1,https://twitter.com/dog_rates/status/744971049620602880/photo/1,https://twitter.com/dog_rates/status/744971049620602880/photo/1
1953                                                                                                                                    https://twitter.com/dog_rates/status/673662677122719744/photo/1
2104                                                                                                                                    https://twitter.com/dog_rates/status/670668383499735048/photo/1
389                                                                                                                                     https://twitter.com/dog_rates/status/826476773533745153/photo/1
2103                                                                                                                                    https://twitter.com/dog_rates/status/670676092097810432/photo/1
2119                                                                                                                                    https://twitter.com/dog_rates/status/670417414769758208/photo/1
2122                                                                                                                                    https://twitter.com/dog_rates/status/670403879788544000/photo/1
1514                                                                                                                                    https://twitter.com/dog_rates/status/691090071332753408/photo/1
Name: expanded_urls, dtype: object
In [11]:
# Tweets with duplicates in expanded_urls
archive_df.loc[1578].expanded_urls
Out[11]:
'https://twitter.com/dog_rates/status/687317306314240000/photo/1,https://twitter.com/dog_rates/status/687317306314240000/photo/1'
In [12]:
# Tweets without images. All tweets with an NaN value for expanded_urls were found to be without images.
archive_df[archive_df['expanded_urls'].isnull()]
Out[12]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
30 886267009285017600 8.862664e+17 2.281182e+09 2017-07-15 16:51:35 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @NonWhiteHat @MayhewMayhem omg hello tanner you are a scary good boy 12/10 would pet with extreme caution NaN NaN NaN NaN 12 10 None None None None None
55 881633300179243008 8.816070e+17 4.738443e+07 2017-07-02 21:58:53 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @roushfenway These are good dogs but 17/10 is an emotional impulse rating. More like 13/10s NaN NaN NaN NaN 17 10 None None None None None
64 879674319642796034 8.795538e+17 3.105441e+09 2017-06-27 12:14:36 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @RealKentMurphy 14/10 confirmed NaN NaN NaN NaN 14 10 None None None None None
113 870726314365509632 8.707262e+17 1.648776e+07 2017-06-02 19:38:25 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @ComplicitOwl @ShopWeRateDogs &gt;10/10 is reserved for dogs NaN NaN NaN NaN 10 10 None None None None None
148 863427515083354112 8.634256e+17 7.759620e+07 2017-05-13 16:15:35 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @Jack_Septic_Eye I'd need a few more pics to polish a full analysis, but based on the good boy content above I'm leaning towards 12/10 NaN NaN NaN NaN 12 10 None None None None None
179 857214891891077121 8.571567e+17 1.806710e+08 2017-04-26 12:48:51 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @Marc_IRL pixelated af 12/10 NaN NaN NaN NaN 12 10 None None None None None
185 856330835276025856 NaN NaN 2017-04-24 02:15:55 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @Jenna_Marbles: @dog_rates Thanks for rating my cermets 14/10 wow I'm so proud I watered them so much 8.563302e+17 66699013.0 2017-04-24 02:13:14 +0000 NaN 14 10 None None None None None
186 856288084350160898 8.562860e+17 2.792810e+08 2017-04-23 23:26:03 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @xianmcguire @Jenna_Marbles Kardashians wouldn't be famous if as a society we didn't place enormous value on what they do. The dogs are very deserving of their 14/10 NaN NaN NaN NaN 14 10 None None None None None
188 855862651834028034 8.558616e+17 1.943518e+08 2017-04-22 19:15:32 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @dhmontgomery We also gave snoop dogg a 420/10 but I think that predated your research NaN NaN NaN NaN 420 10 None None None None None
189 855860136149123072 8.558585e+17 1.361572e+07 2017-04-22 19:05:32 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10 NaN NaN NaN NaN 666 10 None None None None None
218 850333567704068097 8.503288e+17 2.195506e+07 2017-04-07 13:04:55 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @markhoppus MARK THAT DOG HAS SEEN AND EXPERIENCED MANY THINGS. PROBABLY LOST OTHER EAR DOING SOMETHING HEROIC. 13/10 HUG THE DOG HOPPUS NaN NaN NaN NaN 13 10 None None None None None
228 848213670039564288 8.482121e+17 4.196984e+09 2017-04-01 16:41:12 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Jerry just apuppologized to me. He said there was no ill-intent to the slippage. I overreacted I admit. Pupgraded to an 11/10 would pet NaN NaN NaN NaN 11 10 None None None None None
234 847617282490613760 8.476062e+17 4.196984e+09 2017-03-31 01:11:22 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> .@breaannanicolee PUPDATE: Cannon has a heart on his nose. Pupgraded to a 13/10 NaN NaN NaN NaN 13 10 None None None None None
274 840698636975636481 8.406983e+17 8.405479e+17 2017-03-11 22:59:09 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @0_kelvin_0 &gt;10/10 is reserved for puppos sorry Kevin NaN NaN NaN NaN 10 10 None None None None None
290 838150277551247360 8.381455e+17 2.195506e+07 2017-03-04 22:12:52 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @markhoppus 182/10 NaN NaN NaN NaN 182 10 None None None None None
291 838085839343206401 8.380855e+17 2.894131e+09 2017-03-04 17:56:49 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @bragg6of8 @Andy_Pace_ we are still looking for the first 15/10 NaN NaN NaN NaN 15 10 None None None None None
313 835246439529840640 8.352460e+17 2.625958e+07 2017-02-24 21:54:03 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho NaN NaN NaN NaN 960 0 None None None None None
342 832088576586297345 8.320875e+17 3.058208e+07 2017-02-16 04:45:50 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @docmisterio account started on 11/15/15 NaN NaN NaN NaN 11 15 None None None None None
346 831926988323639298 8.319030e+17 2.068372e+07 2017-02-15 18:03:45 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @UNC can confirm 12/10 NaN NaN NaN NaN 12 10 None None None None None
375 828361771580813312 NaN NaN 2017-02-05 21:56:51 +0000 <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a> Beebop and Doobert should start a band 12/10 would listen NaN NaN NaN NaN 12 10 None None None None None
387 826598799820865537 8.265984e+17 4.196984e+09 2017-02-01 01:11:25 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> I was going to do 007/10, but the joke wasn't worth the &lt;10 rating NaN NaN NaN NaN 7 10 None None None None None
409 823333489516937216 8.233264e+17 1.582854e+09 2017-01-23 00:56:15 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @HistoryInPics 13/10 NaN NaN NaN NaN 13 10 None None None None None
427 821153421864615936 8.211526e+17 1.132119e+08 2017-01-17 00:33:26 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @imgur for a polar bear tho I'd say 13/10 is appropriate NaN NaN NaN NaN 13 10 None None None None None
498 813130366689148928 8.131273e+17 4.196984e+09 2016-12-25 21:12:41 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> I've been informed by multiple sources that this is actually a dog elf who's tired from helping Santa all night. Pupgraded to 12/10 NaN NaN NaN NaN 12 10 None None None None None
513 811647686436880384 8.116272e+17 4.196984e+09 2016-12-21 19:01:02 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: I've been informed that Augie was actually bringing his family these flowers when he tripped. Very good boy. Pupgraded to 11/10 NaN NaN NaN NaN 11 10 None None None None None
570 801854953262350336 8.018543e+17 1.185634e+07 2016-11-24 18:28:13 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> .@NBCSports OMG THE TINY HAT I'M GOING TO HAVE TO SAY 11/10 NBC NaN NaN NaN NaN 11 10 None None None None None
576 800859414831898624 8.008580e+17 2.918590e+08 2016-11-22 00:32:18 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @SkyWilliams doggo simply protecting you from evil that which you cannot see. 11/10 would give extra pets NaN NaN NaN NaN 11 10 None doggo None None None
611 797165961484890113 7.971238e+17 2.916630e+07 2016-11-11 19:55:50 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @JODYHiGHROLLER it may be an 11/10 but what do I know 😉 NaN NaN NaN NaN 11 10 None None None None None
701 786051337297522688 7.727430e+17 7.305050e+17 2016-10-12 03:50:17 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 13/10 for breakdancing puppo @shibbnbot NaN NaN NaN NaN 13 10 None None None None puppo
707 785515384317313025 NaN NaN 2016-10-10 16:20:36 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Today, 10/10, should be National Dog Rates Day NaN NaN NaN NaN 10 10 None None None None None
843 766714921925144576 7.667118e+17 4.196984e+09 2016-08-19 19:14:16 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> His name is Charley and he already has a new set of wheels thanks to donations. I heard his top speed was also increased. 13/10 for Charley NaN NaN NaN NaN 13 10 None None None None None
857 763956972077010945 7.638652e+17 1.584641e+07 2016-08-12 04:35:10 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @TheEllenShow I'm not sure if you know this but that doggo right there is a 12/10 NaN NaN NaN NaN 12 10 None doggo None None None
967 750381685133418496 7.501805e+17 4.717297e+09 2016-07-05 17:31:49 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 13/10 such a good doggo\n@spaghemily NaN NaN NaN NaN 13 10 None doggo None None None
1005 747651430853525504 7.476487e+17 4.196984e+09 2016-06-28 04:42:46 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Other pupper asked not to have his identity shared. Probably just embarrassed about the headbutt. Also 12/10 it'll be ok mystery pup NaN NaN NaN NaN 12 10 None None None pupper None
1080 738891149612572673 7.384119e+17 3.589728e+08 2016-06-04 00:32:32 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @mount_alex3 13/10 NaN NaN NaN NaN 13 10 None None None None None
1295 707983188426153984 7.079801e+17 2.319108e+09 2016-03-10 17:35:20 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> @serial @MrRoles OH MY GOD I listened to all of season 1 during a single road trip. I love you guys! I can confirm Bernie's 12/10 rating :) NaN NaN NaN NaN 12 10 None None None None None
1345 704491224099647488 7.044857e+17 2.878549e+07 2016-03-01 02:19:31 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 13/10 hero af\n@ABC NaN NaN NaN NaN 13 10 None None None None None
1445 696518437233913856 NaN NaN 2016-02-08 02:18:30 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Oh my god 10/10 for every little hot dog pupper NaN NaN NaN NaN 10 10 None None None pupper None
1446 696490539101908992 6.964887e+17 4.196984e+09 2016-02-08 00:27:39 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After reading the comments I may have overestimated this pup. Downgraded to a 1/10. Please forgive me NaN NaN NaN NaN 1 10 None None None None None
1474 693644216740769793 6.936422e+17 4.196984e+09 2016-01-31 03:57:23 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> BREAKING PUPDATE: I've just been notified that (if in U.S.) this dog appears to be operating the vehicle. Upgraded to 10/10. Skilled af NaN NaN NaN NaN 10 10 None None None None None
1479 693582294167244802 6.935722e+17 1.198989e+09 2016-01-30 23:51:19 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Personally I'd give him an 11/10. Not sure why you think you're qualified to rate such a stellar pup.\n@CommonWhiteGirI NaN NaN NaN NaN 11 10 None None None None None
1497 692423280028966913 6.924173e+17 4.196984e+09 2016-01-27 19:05:49 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: just noticed this dog has some extra legs. Very advanced. Revolutionary af. Upgraded to a 9/10 NaN NaN NaN NaN 9 10 None None None None None
1523 690607260360429569 6.903413e+17 4.670367e+08 2016-01-22 18:49:36 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 12/10 @LightningHoltt NaN NaN NaN NaN 12 10 None None None None None
1598 686035780142297088 6.860340e+17 4.196984e+09 2016-01-10 04:04:10 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Yes I do realize a rating of 4/20 would've been fitting. However, it would be unjust to give these cooperative pups that low of a rating NaN NaN NaN NaN 4 20 None None None None None
1605 685681090388975616 6.855479e+17 4.196984e+09 2016-01-09 04:34:45 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Jack deserves another round of applause. If you missed this earlier today I strongly suggest reading it. Wonderful first 14/10 🐶❤️ NaN NaN NaN NaN 14 10 None None None None None
1618 684969860808454144 6.849598e+17 4.196984e+09 2016-01-07 05:28:35 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> For those who claim this is a goat, u are wrong. It is not the Greatest Of All Time. The rating of 5/10 should have made that clear. Thank u NaN NaN NaN NaN 5 10 None None None None None
1663 682808988178739200 6.827884e+17 4.196984e+09 2016-01-01 06:22:03 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible NaN NaN NaN NaN 20 16 None None None None None
1689 681340665377193984 6.813394e+17 4.196984e+09 2015-12-28 05:07:27 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace NaN NaN NaN NaN 5 10 None None None None None
1774 678023323247357953 6.780211e+17 4.196984e+09 2015-12-19 01:25:31 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After getting lost in Reese's eyes for several minutes we're going to upgrade him to a 13/10 NaN NaN NaN NaN 13 10 None None None None None
1819 676590572941893632 6.765883e+17 4.196984e+09 2015-12-15 02:32:17 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After some outrage from the crowd. Bubbles is being upgraded to a 7/10. That's as high as I'm going. Thank you NaN NaN NaN NaN 7 10 None None None None None
1844 675849018447167488 6.758457e+17 4.196984e+09 2015-12-13 01:25:37 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> This dog is being demoted to a 9/10 for not wearing a helmet while riding. Gotta stay safe out there. Thank you NaN NaN NaN NaN 9 10 None None None None None
1895 674742531037511680 6.747400e+17 4.196984e+09 2015-12-10 00:08:50 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Some clarification is required. The dog is singing Cher and that is more than worthy of an 11/10. Thank you NaN NaN NaN NaN 11 10 None None None None None
1905 674606911342424069 6.744689e+17 4.196984e+09 2015-12-09 15:09:55 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> The 13/10 also takes into account this impeccable yard. Louis is great but the future dad in me can't ignore that luscious green grass NaN NaN NaN NaN 13 10 None None None None None
1914 674330906434379776 6.658147e+17 1.637468e+07 2015-12-08 20:53:11 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 13/10\n@ABC7 NaN NaN NaN NaN 13 10 None None None None None
1940 673716320723169284 6.737159e+17 4.196984e+09 2015-12-07 04:11:02 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> The millennials have spoken and we've decided to immediately demote to a 1/10. Thank you NaN NaN NaN NaN 1 10 None None None None None
2038 671550332464455680 6.715449e+17 4.196984e+09 2015-12-01 04:44:10 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After 22 minutes of careful deliberation this dog is being demoted to a 1/10. The longer you look at him the more terrifying he becomes NaN NaN NaN NaN 1 10 None None None None None
2149 669684865554620416 6.693544e+17 4.196984e+09 2015-11-26 01:11:28 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After countless hours of research and hundreds of formula alterations we have concluded that Dug should be bumped to an 11/10 NaN NaN NaN NaN 11 10 None None None None None
2189 668967877119254528 6.689207e+17 2.143566e+07 2015-11-24 01:42:25 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> 12/10 good shit Bubka\n@wane15 NaN NaN NaN NaN 12 10 None None None None None
2298 667070482143944705 6.670655e+17 4.196984e+09 2015-11-18 20:02:51 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> After much debate this dog is being upgraded to 10/10. I repeat 10/10 NaN NaN NaN NaN 10 10 None None None None None
In [13]:
# Non-original tweets, Retweets
archive_df[archive_df['text'].str.contains('RT @', flags=re.IGNORECASE, case=False)].head()
Out[13]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
19 888202515573088257 NaN NaN 2017-07-21 01:02:36 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @dog_rates: This is Canela. She attempted some fancy porch pics. They were unsuccessful. 13/10 someone help her https://t.co/cLyzpcUcMX 8.874740e+17 4.196984e+09 2017-07-19 00:47:34 +0000 https://twitter.com/dog_rates/status/887473957103951883/photo/1,https://twitter.com/dog_rates/status/887473957103951883/photo/1,https://twitter.com/dog_rates/status/887473957103951883/photo/1,https://twitter.com/dog_rates/status/887473957103951883/photo/1 13 10 Canela None None None None
32 886054160059072513 NaN NaN 2017-07-15 02:45:48 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @Athletics: 12/10 #BATP https://t.co/WxwJmvjfxo 8.860537e+17 1.960740e+07 2017-07-15 02:44:07 +0000 https://twitter.com/dog_rates/status/886053434075471873,https://twitter.com/dog_rates/status/886053434075471873 12 10 None None None None None
36 885311592912609280 NaN NaN 2017-07-13 01:35:06 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @dog_rates: This is Lilly. She just parallel barked. Kindly requests a reward now. 13/10 would pet so well https://t.co/SATN4If5H5 8.305833e+17 4.196984e+09 2017-02-12 01:04:29 +0000 https://twitter.com/dog_rates/status/830583320585068544/photo/1,https://twitter.com/dog_rates/status/830583320585068544/photo/1,https://twitter.com/dog_rates/status/830583320585068544/photo/1,https://twitter.com/dog_rates/status/830583320585068544/photo/1 13 10 Lilly None None None None
68 879130579576475649 NaN NaN 2017-06-26 00:13:58 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @dog_rates: This is Emmy. She was adopted today. Massive round of pupplause for Emmy and her new family. 14/10 for all involved https://… 8.780576e+17 4.196984e+09 2017-06-23 01:10:23 +0000 https://twitter.com/dog_rates/status/878057613040115712/photo/1,https://twitter.com/dog_rates/status/878057613040115712/photo/1 14 10 Emmy None None None None
73 878404777348136964 NaN NaN 2017-06-24 00:09:53 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @dog_rates: Meet Shadow. In an attempt to reach maximum zooming borkdrive, he tore his ACL. Still 13/10 tho. Help him out below\n\nhttps:/… 8.782815e+17 4.196984e+09 2017-06-23 16:00:04 +0000 https://www.gofundme.com/3yd6y1c,https://twitter.com/dog_rates/status/878281511006478336/photo/1 13 10 Shadow None None None None
In [14]:
# Non-original tweets, PUPDATE(s)
archive_df[archive_df['text'].str.contains('PUPDATE', flags=re.IGNORECASE, case=False)]
Out[14]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
101 872668790621863937 NaN NaN 2017-06-08 04:17:07 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> RT @loganamnosis: Penelope here is doing me quite a divertir. Well done, @dog_rates! Loving the pupdate. 14/10, je jouerais de nouveau. htt… 8.726576e+17 154767397.0 2017-06-08 03:32:35 +0000 https://twitter.com/loganamnosis/status/872657584259551233/photo/1 14 10 None None None None None
234 847617282490613760 8.476062e+17 4.196984e+09 2017-03-31 01:11:22 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> .@breaannanicolee PUPDATE: Cannon has a heart on his nose. Pupgraded to a 13/10 NaN NaN NaN NaN 13 10 None None None None None
251 844979544864018432 7.590995e+17 4.196984e+09 2017-03-23 18:29:57 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: I'm proud to announce that Toby is 236 days sober. Pupgraded to a 13/10. We're all very proud of you, Toby https://t.co/a5OaJeRl9B NaN NaN NaN https://twitter.com/dog_rates/status/844979544864018432/photo/1,https://twitter.com/dog_rates/status/844979544864018432/photo/1,https://twitter.com/dog_rates/status/844979544864018432/photo/1 13 10 None None None None None
513 811647686436880384 8.116272e+17 4.196984e+09 2016-12-21 19:01:02 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: I've been informed that Augie was actually bringing his family these flowers when he tripped. Very good boy. Pupgraded to 11/10 NaN NaN NaN NaN 11 10 None None None None None
1016 746906459439529985 7.468859e+17 4.196984e+09 2016-06-26 03:22:31 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: can't see any. Even if I could, I couldn't reach them to pet. 0/10 much disappointment https://t.co/c7WXaB2nqX NaN NaN NaN https://twitter.com/dog_rates/status/746906459439529985/photo/1 0 10 None None None None None
1474 693644216740769793 6.936422e+17 4.196984e+09 2016-01-31 03:57:23 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> BREAKING PUPDATE: I've just been notified that (if in U.S.) this dog appears to be operating the vehicle. Upgraded to 10/10. Skilled af NaN NaN NaN NaN 10 10 None None None None None
1497 692423280028966913 6.924173e+17 4.196984e+09 2016-01-27 19:05:49 +0000 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> PUPDATE: just noticed this dog has some extra legs. Very advanced. Revolutionary af. Upgraded to a 9/10 NaN NaN NaN NaN 9 10 None None None None None
In [15]:
archive_df.name.value_counts()
Out[15]:
None       745
a           55
Charlie     12
Oliver      11
Lucy        11
          ... 
Gert         1
Bubba        1
Lilah        1
Raphael      1
Harvey       1
Name: name, Length: 957, dtype: int64
In [16]:
# Dogs whose names were not captured but assigned a value of 'None'
archive_df.query('name == "None"')[['tweet_id','text','name']].sample(15)
Out[16]:
tweet_id text name
1635 684222868335505415 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 None
1779 677716515794329600 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq None
1506 691756958957883396 THE BRITISH ARE COMING\nTHE BRITISH ARE COMING\n10/10 https://t.co/frGWV7IP6J None
429 821107785811234820 Here's a doggo who looks like he's about to give you a list of mythical ingredients to go collect for his potion. 11/10 would obey https://t.co/8SiwKDlRcl None
1610 685532292383666176 For the last time, WE. DO. NOT. RATE. BULBASAUR. We only rate dogs. Please only send dogs. Thank you ...9/10 https://t.co/GboDG8WhJG None
180 857062103051644929 RT @AaronChewning: First time wearing my @dog_rates hat on a flight and I get DOUBLE OPEN ROWS. Really makes you think. 13/10 https://t.co/… None
1476 693629975228977152 This pupper is afraid of its own feet. 12/10 would comfort https://t.co/Tn9Mp0oPoJ None
2067 671141549288370177 Neat pup here. Enjoys lettuce. Long af ears. Short lil legs. Hops surprisingly high for dog. 9/10 still very petable https://t.co/HYR611wiA4 None
1624 684880619965411328 Here we have a basking dino pupper. Looks powerful. Occasionally shits eggs. Doesn't want the holidays to end. 5/10 https://t.co/DnNweb5eTO None
599 798682547630837760 RT @dog_rates: Here we see a rare pouched pupper. Ample storage space. Looks alert. Jumps at random. Kicked open that door. 8/10 https://t.… None
912 757596066325864448 Here's another picture without a dog in it. Idk why you guys keep sending these. 4/10 just because that's a neat rug https://t.co/mOmnL19Wsl None
1823 676533798876651520 ITSOFLUFFAYYYYY 12/10 https://t.co/bfw13CnuuZ None
1468 694206574471057408 "Martha come take a look at this. I'm so fed up with the media's unrealistic portrayal of dogs these days." 10/10 https://t.co/Sd4qAdSRqI None
1505 691793053716221953 We usually don't rate penguins but this one is in need of a confidence boost after that slide. 10/10 https://t.co/qnMJHBxPuo None
1344 704499785726889984 When you wake up from a long nap and have no idea who you are. 12/10 https://t.co/dlF93GLnDc None
In [17]:
# Dog names that were captured as 'a'
archive_df.query('name == "a" ')[['tweet_id','text','name']]
Out[17]:
tweet_id text name
56 881536004380872706 Here is a pupper approaching maximum borkdrive. Zooming at never before seen speeds. 14/10 paw-inspiring af \n(IG: puffie_the_chow) https://t.co/ghXBIIeQZF a
649 792913359805018113 Here is a perfect example of someone who has their priorities in order. 13/10 for both owner and Forrest https://t.co/LRyMrU7Wfq a
801 772581559778025472 Guys this is getting so out of hand. We only rate dogs. This is a Galapagos Speed Panda. Pls only send dogs... 10/10 https://t.co/8lpAGaZRFn a
1002 747885874273214464 This is a mighty rare blue-tailed hammer sherk. Human almost lost a limb trying to take these. Be careful guys. 8/10 https://t.co/TGenMeXreW a
1004 747816857231626240 Viewer discretion is advised. This is a terrible attack in progress. Not even in water (tragic af). 4/10 bad sherk https://t.co/L3U0j14N5R a
1017 746872823977771008 This is a carrot. We only rate dogs. Please only send in dogs. You all really should know this by now ...11/10 https://t.co/9e48aPrBm2 a
1049 743222593470234624 This is a very rare Great Alaskan Bush Pupper. Hard to stumble upon without spooking. 12/10 would pet passionately https://t.co/xOBKCdpzaa a
1193 717537687239008257 People please. This is a Deadly Mediterranean Plop T-Rex. We only rate dogs. Only send in dogs. Thanks you... 11/10 https://t.co/2ATDsgHD4n a
1207 715733265223708672 This is a taco. We only rate dogs. Please only send in dogs. Dogs are what we rate. Not tacos. Thank you... 10/10 https://t.co/cxl6xGY8B9 a
1340 704859558691414016 Here is a heartbreaking scene of an incredible pupper being laid to rest. 10/10 RIP pupper https://t.co/81mvJ0rGRu a
1351 704054845121142784 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa a
1361 703079050210877440 This is a Butternut Cumberfloof. It's not windy they just look like that. 11/10 back at it again with the red socks https://t.co/hMjzhdUHaW a
1368 702539513671897089 This is a Wild Tuscan Poofwiggle. Careful not to startle. Rare tongue slip. One eye magical. 12/10 would def pet https://t.co/4EnShAQjv6 a
1382 700864154249383937 "Pupper is a present to world. Here is a bow for pupper." 12/10 precious as hell https://t.co/ItSsE92gCW a
1499 692187005137076224 This is a rare Arctic Wubberfloof. Unamused by the happenings. No longer has the appetites. 12/10 would totally hug https://t.co/krvbacIX0N a
1737 679530280114372609 Guys this really needs to stop. We've been over this way too many times. This is a giraffe. We only rate dogs.. 7/10 https://t.co/yavgkHYPOC a
1785 677644091929329666 This is a dog swinging. I really enjoyed it so I hope you all do as well. 11/10 https://t.co/Ozo9KHTRND a
1853 675706639471788032 This is a Sizzlin Menorah spaniel from Brooklyn named Wylie. Lovable eyes. Chiller as hell. 10/10 and I'm out.. poof https://t.co/7E0AiJXPmI a
1854 675534494439489536 Seriously guys?! Only send in dogs. I only rate dogs. This is a baby black bear... 11/10 https://t.co/H7kpabTfLj a
1877 675109292475830276 C'mon guys. We've been over this. We only rate dogs. This is a cow. Please only submit dogs. Thank you...... 9/10 https://t.co/WjcELNEqN2 a
1878 675047298674663426 This is a fluffy albino Bacardi Columbia mix. Excellent at the tweets. 11/10 would hug gently https://t.co/diboDRUuEI a
1923 674082852460433408 This is a Sagitariot Baklava mix. Loves her new hat. 11/10 radiant pup https://t.co/Bko5kFJYUU a
1941 673715861853720576 This is a heavily opinionated dog. Loves walls. Nobody knows how the hair works. Always ready for a kiss. 4/10 https://t.co/dFiaKZ9cDl a
1955 673636718965334016 This is a Lofted Aphrodisiac Terrier named Kip. Big fan of bed n breakfasts. Fits perfectly. 10/10 would pet firmly https://t.co/gKlLpNzIl3 a
1994 672604026190569472 This is a baby Rand Paul. Curls for days. 11/10 would cuddle the hell out of https://t.co/xHXNaPAYRe a
2034 671743150407421952 This is a Tuscaloosa Alcatraz named Jacob (Yacōb). Loves to sit in swing. Stellar tongue. 11/10 look at his feet https://t.co/2IslQ8ZSc7 a
2066 671147085991960577 This is a Helvetica Listerine named Rufus. This time Rufus will be ready for the UPS guy. He'll never expect it 9/10 https://t.co/34OhVhMkVr a
2116 670427002554466305 This is a Deciduous Trimester mix named Spork. Only 1 ear works. No seat belt. Incredibly reckless. 9/10 still cute https://t.co/CtuJoLHiDo a
2125 670361874861563904 This is a Rich Mahogany Seltzer named Cherokee. Just got destroyed by a snowball. Isn't very happy about it. 9/10 https://t.co/98ZBi6o4dj a
2128 670303360680108032 This is a Speckled Cauliflower Yosemite named Hemry. He's terrified of intruder dog. Not one bit comfortable. 9/10 https://t.co/yV3Qgjh8iN a
2146 669923323644657664 This is a spotted Lipitor Rumpelstiltskin named Alphred. He can't wait for the Turkey. 10/10 would pet really well https://t.co/6GUGO7azNX a
2153 669661792646373376 This is a brave dog. Excellent free climber. Trying to get closer to God. Not very loyal though. Doesn't bark. 5/10 https://t.co/ODnILTr4QM a
2161 669564461267722241 This is a Coriander Baton Rouge named Alfredo. Loves to cuddle with smaller well-dressed dog. 10/10 would hug lots https://t.co/eCRdwouKCl a
2191 668955713004314625 This is a Slovakian Helter Skelter Feta named Leroi. Likes to skip on roofs. Good traction. Much balance. 10/10 wow! https://t.co/Dmy2mY2Qj5 a
2198 668815180734689280 This is a wild Toblerone from Papua New Guinea. Mouth always open. Addicted to hay. Acts blind. 7/10 handsome dog https://t.co/IGmVbz07tZ a
2211 668614819948453888 Here is a horned dog. Much grace. Can jump over moons (dam!). Paws not soft. Bad at barking. 7/10 can still pet tho https://t.co/2Su7gmsnZm a
2218 668507509523615744 This is a Birmingham Quagmire named Chuk. Loves to relax and watch the game while sippin on that iced mocha. 10/10 https://t.co/HvNg9JWxFt a
2222 668466899341221888 Here is a mother dog caring for her pups. Snazzy red mohawk. Doesn't wag tail. Pups look confused. Overall 4/10 https://t.co/YOHe6lf09m a
2235 668171859951755264 This is a Trans Siberian Kellogg named Alfonso. Huge ass eyeballs. Actually Dobby from Harry Potter. 7/10 https://t.co/XpseHBlAAb a
2249 667861340749471744 This is a Shotokon Macadamia mix named Cheryl. Sophisticated af. Looks like a disappointed librarian. Shh (lol) 9/10 https://t.co/J4GnJ5Swba a
2255 667773195014021121 This is a rare Hungarian Pinot named Jessiga. She is either mid-stroke or got stuck in the washing machine. 8/10 https://t.co/ZU0i0KJyqD a
2264 667538891197542400 This is a southwest Coriander named Klint. Hat looks expensive. Still on house arrest :(\n9/10 https://t.co/IQTOMqDUIe a
2273 667470559035432960 This is a northern Wahoo named Kohl. He runs this town. Chases tumbleweeds. Draws gun wicked fast. 11/10 legendary https://t.co/J4vn2rOYFk a
2287 667177989038297088 This is a Dasani Kingfisher from Maine. His name is Daryl. Daryl doesn't like being swallowed by a panda. 8/10 https://t.co/jpaeu6LNmW a
2304 666983947667116034 This is a curly Ticonderoga named Pepe. No feet. Loves to jet ski. 11/10 would hug until forever https://t.co/cyDfaK8NBc a
2311 666781792255496192 This is a purebred Bacardi named Octaviath. Can shoot spaghetti out of mouth. 10/10 https://t.co/uEvsGLOFHa a
2314 666701168228331520 This is a golden Buckminsterfullerene named Johm. Drives trucks. Lumberjack (?). Enjoys wall. 8/10 would hug softly https://t.co/uQbZJM2DQB a
2327 666407126856765440 This is a southern Vesuvius bumblegruff. Can drive a truck (wow). Made friends with 5 other nifty dogs (neat). 7/10 https://t.co/LopTBkKa8h a
2334 666293911632134144 This is a funny dog. Weird toes. Won't come down. Loves branch. Refuses to eat his food. Hard to cuddle with. 3/10 https://t.co/IIXis0zta0 a
2347 666057090499244032 My oh my. This is a rare blond Canadian terrier on wheels. Only $8.98. Rather docile. 9/10 very rare https://t.co/yWBqbrzy8O a
2348 666055525042405380 Here is a Siberian heavily armored polar bear mix. Strong owner. 10/10 I would do unspeakable things to pet this dog https://t.co/rdivxLiqEt a
2350 666050758794694657 This is a truly beautiful English Wilson Staff retriever. Has a nice phone. Privileged. 10/10 would trade lives with https://t.co/fvIbQfHjIe a
2352 666044226329800704 This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx a
2353 666033412701032449 Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR a
2354 666029285002620928 This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI a
In [18]:
archive_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2356 non-null   int64  
 1   in_reply_to_status_id       78 non-null     float64
 2   in_reply_to_user_id         78 non-null     float64
 3   timestamp                   2356 non-null   object 
 4   source                      2356 non-null   object 
 5   text                        2356 non-null   object 
 6   retweeted_status_id         181 non-null    float64
 7   retweeted_status_user_id    181 non-null    float64
 8   retweeted_status_timestamp  181 non-null    object 
 9   expanded_urls               2297 non-null   object 
 10  rating_numerator            2356 non-null   int64  
 11  rating_denominator          2356 non-null   int64  
 12  name                        2356 non-null   object 
 13  doggo                       2356 non-null   object 
 14  floofer                     2356 non-null   object 
 15  pupper                      2356 non-null   object 
 16  puppo                       2356 non-null   object 
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB
In [19]:
archive_df.describe()
Out[19]:
tweet_id in_reply_to_status_id in_reply_to_user_id retweeted_status_id retweeted_status_user_id rating_numerator rating_denominator
count 2.356000e+03 7.800000e+01 7.800000e+01 1.810000e+02 1.810000e+02 2356.000000 2356.000000
mean 7.427716e+17 7.455079e+17 2.014171e+16 7.720400e+17 1.241698e+16 13.126486 10.455433
std 6.856705e+16 7.582492e+16 1.252797e+17 6.236928e+16 9.599254e+16 45.876648 6.745237
min 6.660209e+17 6.658147e+17 1.185634e+07 6.661041e+17 7.832140e+05 0.000000 0.000000
25% 6.783989e+17 6.757419e+17 3.086374e+08 7.186315e+17 4.196984e+09 10.000000 10.000000
50% 7.196279e+17 7.038708e+17 4.196984e+09 7.804657e+17 4.196984e+09 11.000000 10.000000
75% 7.993373e+17 8.257804e+17 4.196984e+09 8.203146e+17 4.196984e+09 12.000000 10.000000
max 8.924206e+17 8.862664e+17 8.405479e+17 8.874740e+17 7.874618e+17 1776.000000 170.000000
In [20]:
# doggo stage
archive_df[archive_df['text'].str.contains('doggo',flags=re.IGNORECASE, case=False)].shape[0]
Out[20]:
107
In [21]:
# pupper stage
archive_df[archive_df['text'].str.contains('pupper',flags=re.IGNORECASE, case=False)].shape[0]
Out[21]:
283
In [22]:
# puppo stage
archive_df[archive_df['text'].str.contains('puppo',flags=re.IGNORECASE, case=False)].shape[0]
Out[22]:
38
In [23]:
# floofer stage
archive_df[archive_df['text'].str.contains('floofer',flags=re.IGNORECASE, case=False)].shape[0]
Out[23]:
10
In [24]:
# floof stage
archive_df[archive_df['text'].str.contains('floof',flags=re.IGNORECASE, case=False)].shape[0]
Out[24]:
41
In [25]:
# blep stage
archive_df[archive_df['text'].str.contains('blep',flags=re.IGNORECASE, case=False)].shape[0]
Out[25]:
4
In [26]:
# snoot stage
archive_df[archive_df['text'].str.contains('snoot',flags=re.IGNORECASE, case=False)].shape[0]
Out[26]:
0
In [27]:
archive_df.name.value_counts()
Out[27]:
None       745
a           55
Charlie     12
Oliver      11
Lucy        11
          ... 
Gert         1
Bubba        1
Lilah        1
Raphael      1
Harvey       1
Name: name, Length: 957, dtype: int64
In [28]:
# Tweets with a name of 'a'
a_names = archive_df.query('name == "a"')[['tweet_id','text','name']]
In [29]:
a_names.shape
Out[29]:
(55, 3)
In [30]:
# Names that were not captured but begin with named
a_names[a_names['text'].str.contains('named',flags=re.IGNORECASE, case=False)]
Out[30]:
tweet_id text name
1853 675706639471788032 This is a Sizzlin Menorah spaniel from Brooklyn named Wylie. Lovable eyes. Chiller as hell. 10/10 and I'm out.. poof https://t.co/7E0AiJXPmI a
1955 673636718965334016 This is a Lofted Aphrodisiac Terrier named Kip. Big fan of bed n breakfasts. Fits perfectly. 10/10 would pet firmly https://t.co/gKlLpNzIl3 a
2034 671743150407421952 This is a Tuscaloosa Alcatraz named Jacob (Yacōb). Loves to sit in swing. Stellar tongue. 11/10 look at his feet https://t.co/2IslQ8ZSc7 a
2066 671147085991960577 This is a Helvetica Listerine named Rufus. This time Rufus will be ready for the UPS guy. He'll never expect it 9/10 https://t.co/34OhVhMkVr a
2116 670427002554466305 This is a Deciduous Trimester mix named Spork. Only 1 ear works. No seat belt. Incredibly reckless. 9/10 still cute https://t.co/CtuJoLHiDo a
2125 670361874861563904 This is a Rich Mahogany Seltzer named Cherokee. Just got destroyed by a snowball. Isn't very happy about it. 9/10 https://t.co/98ZBi6o4dj a
2128 670303360680108032 This is a Speckled Cauliflower Yosemite named Hemry. He's terrified of intruder dog. Not one bit comfortable. 9/10 https://t.co/yV3Qgjh8iN a
2146 669923323644657664 This is a spotted Lipitor Rumpelstiltskin named Alphred. He can't wait for the Turkey. 10/10 would pet really well https://t.co/6GUGO7azNX a
2161 669564461267722241 This is a Coriander Baton Rouge named Alfredo. Loves to cuddle with smaller well-dressed dog. 10/10 would hug lots https://t.co/eCRdwouKCl a
2191 668955713004314625 This is a Slovakian Helter Skelter Feta named Leroi. Likes to skip on roofs. Good traction. Much balance. 10/10 wow! https://t.co/Dmy2mY2Qj5 a
2218 668507509523615744 This is a Birmingham Quagmire named Chuk. Loves to relax and watch the game while sippin on that iced mocha. 10/10 https://t.co/HvNg9JWxFt a
2235 668171859951755264 This is a Trans Siberian Kellogg named Alfonso. Huge ass eyeballs. Actually Dobby from Harry Potter. 7/10 https://t.co/XpseHBlAAb a
2249 667861340749471744 This is a Shotokon Macadamia mix named Cheryl. Sophisticated af. Looks like a disappointed librarian. Shh (lol) 9/10 https://t.co/J4GnJ5Swba a
2255 667773195014021121 This is a rare Hungarian Pinot named Jessiga. She is either mid-stroke or got stuck in the washing machine. 8/10 https://t.co/ZU0i0KJyqD a
2264 667538891197542400 This is a southwest Coriander named Klint. Hat looks expensive. Still on house arrest :(\n9/10 https://t.co/IQTOMqDUIe a
2273 667470559035432960 This is a northern Wahoo named Kohl. He runs this town. Chases tumbleweeds. Draws gun wicked fast. 11/10 legendary https://t.co/J4vn2rOYFk a
2304 666983947667116034 This is a curly Ticonderoga named Pepe. No feet. Loves to jet ski. 11/10 would hug until forever https://t.co/cyDfaK8NBc a
2311 666781792255496192 This is a purebred Bacardi named Octaviath. Can shoot spaghetti out of mouth. 10/10 https://t.co/uEvsGLOFHa a
2314 666701168228331520 This is a golden Buckminsterfullerene named Johm. Drives trucks. Lumberjack (?). Enjoys wall. 8/10 would hug softly https://t.co/uQbZJM2DQB a
In [31]:
# Names that were not captured but begin with named
a_names[a_names['text'].str.contains('name is', case=False, flags=re.IGNORECASE)]
Out[31]:
tweet_id text name
2287 667177989038297088 This is a Dasani Kingfisher from Maine. His name is Daryl. Daryl doesn't like being swallowed by a panda. 8/10 https://t.co/jpaeu6LNmW a
In [32]:
# Names that were not captured but begin with 'meet'
a_names[a_names['text'].str.contains('meet', case=False, flags=re.IGNORECASE)]
Out[32]:
tweet_id text name
In [33]:
# Names that were not captured but begin 'Say hello to'
a_names[a_names['text'].str.contains('hello to', case=False, flags=re.IGNORECASE)]
Out[33]:
tweet_id text name
In [34]:
# Tweets with a name of 'None'
none_names = archive_df.query('name == "None"')[['tweet_id','text','name']]
In [35]:
none_names.shape
Out[35]:
(745, 3)
In [36]:
# Names that have a value of 'None' but were not captured and they begin with 'named'
none_names[none_names['text'].str.contains('named', case=False, flags=re.IGNORECASE)]
Out[36]:
tweet_id text name
603 798628517273620480 RT @dog_rates: This a Norwegian Pewterschmidt named Tickles. Ears for days. 12/10 I care deeply for Tickles https://t.co/0aDF62KVP7 None
2166 669363888236994561 Here we have a Gingivitis Pumpernickel named Zeus. Unmatched tennis ball capacity. 10/10 would highly recommend https://t.co/jPkd7hhX7m None
2227 668268907921326080 Here we have an Azerbaijani Buttermilk named Guss. He sees a demon baby Hitler behind his owner. 10/10 stays alert https://t.co/aeZykWwiJN None
2269 667509364010450944 This a Norwegian Pewterschmidt named Tickles. Ears for days. 12/10 I care deeply for Tickles https://t.co/0aDF62KVP7 None
In [37]:
# Names that have a value of 'None' but were not captured and they begin with 'name is'
none_names[none_names['text'].str.contains('name is', case=False, flags=re.IGNORECASE)]
Out[37]:
tweet_id text name
35 885518971528720385 I have a new hero and his name is Howard. 14/10 https://t.co/gzLHboL7Sk None
168 859607811541651456 Sorry for the lack of posts today. I came home from school and had to spend quality time with my puppo. Her name is Zoey and she's 13/10 https://t.co/BArWupFAn0 None
843 766714921925144576 His name is Charley and he already has a new set of wheels thanks to donations. I heard his top speed was also increased. 13/10 for Charley None
1678 682047327939461121 We normally don't rate bears but this one seems nice. Her name is Thea. Appears rather fluffy. 10/10 good bear https://t.co/fZc7MixeeT None
1734 679736210798047232 This pup's name is Sabertooth (parents must be cool). Ears for days. Jumps unannounced. 9/10 would pet diligently https://t.co/iazoiNUviP None
2267 667524857454854144 Another topnotch dog. His name is Big Jumpy Rat. Massive ass feet. Superior tail. Jumps high af. 12/10 great pup https://t.co/seESNzgsdm None
In [38]:
# Names that have a value of 'None' but were not captured and they begin with 'meet'
none_names[none_names['text'].str.contains('meet', case=False, flags=re.IGNORECASE)]
Out[38]:
tweet_id text name
In [39]:
# Names that have a value of 'None' but were not captured and they begin with 'meet'
none_names[none_names['text'].str.contains('Say hello to', case=False, flags=re.IGNORECASE)]
Out[39]:
tweet_id text name
In [40]:
# Names that have a value of 'None' but were not captured and they begin with 'This is'
none_names[none_names['text'].str.contains('This is', case=False, flags=re.IGNORECASE)]
Out[40]:
tweet_id text name
184 856526610513747968 THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY HI AFTER ALL. PUPGRADED TO A 14/10. WOULD BE AN HONOR TO FLY WITH https://t.co/p1hBHCmWnA None
190 855857698524602368 HE'S LIKE "WAIT A MINUTE I'M AN ANIMAL THIS IS AMAZING HI HUMAN I LOVE YOU AS WELL" 13/10 https://t.co/sb73bV5Y7S None
204 852936405516943360 RT @dog_rates: I usually only share these on Friday's, but this is Blue. He's a very smoochable pooch who needs your help. 13/10\n\nhttps://t… None
243 846139713627017216 SHE DID AN ICY ZOOM AND KNEW WHEN TO PUT ON THE BRAKES 13/10 CANCEL THE GAME THIS IS ALL WE NEED https://t.co/4ctgpGcqAd None
349 831650051525054464 I usually only share these on Friday's, but this is Blue. He's a very smoochable pooch who needs your help. 13/10\n\nhttps://t.co/piiX0ke8Z6 https://t.co/1UHrKcaCiO None
498 813130366689148928 I've been informed by multiple sources that this is actually a dog elf who's tired from helping Santa all night. Pupgraded to 12/10 None
589 799308762079035393 RT @dog_rates: I WAS SENT THE ACTUAL DOG IN THE PROFILE PIC BY HIS OWNER THIS IS SO WILD. 14/10 ULTIMATE LEGEND STATUS https://t.co/7oQ1wpf… None
784 775096608509886464 RT @dog_rates: After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https:/… None
788 774314403806253056 I WAS SENT THE ACTUAL DOG IN THE PROFILE PIC BY HIS OWNER THIS IS SO WILD. 14/10 ULTIMATE LEGEND STATUS https://t.co/7oQ1wpfxIH None
841 766864461642756096 RT @dog_rates: We only rate dogs... this is a Taiwanese Guide Walrus. Im getting real heckin tired of this. Please send dogs. 10/10 https:/… None
887 759923798737051648 We only rate dogs... this is a Taiwanese Guide Walrus. Im getting real heckin tired of this. Please send dogs. 10/10 https://t.co/49hkNAsubi None
893 759446261539934208 No no no this is all wrong. The Walmart had to have run into the dog driving the car. 10/10 someone tell him it's ok\nhttps://t.co/fRaTGcj68A None
1051 742534281772302336 For anyone who's wondering, this is what happens after a doggo catches it's tail... 11/10 https://t.co/G4fNhzelDv None
1068 740373189193256964 After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ None
1111 733482008106668032 "Ello this is dog how may I assist" ...10/10 https://t.co/jeAENpjH7L None
1356 703425003149250560 Really guys? Again? I know this is a rare Albanian Bingo Seal, but we only rate dogs. Only send in dogs... 9/10 https://t.co/6JYLpUmBrC None
1401 699434518667751424 I know this is a tad late but here's a wonderful Valentine's Day pupper 12/10 https://t.co/hTE2PEwGvi None
1618 684969860808454144 For those who claim this is a goat, u are wrong. It is not the Greatest Of All Time. The rating of 5/10 should have made that clear. Thank u None
1669 682429480204398592 I know we joke around on here, but this is getting really frustrating. We rate dogs. Not T-Rex. Thank you... 8/10 https://t.co/5aFw7SWyxU None
1841 675878199931371520 Ok, I'll admit this is a pretty adorable bunny hopping towards the ocean but please only send in dogs... 11/10 https://t.co/sfsVCGIipI None
1842 675870721063669760 &amp; this is Yoshi. Another world record contender 11/10 (what the hell is happening why are there so many contenders?) https://t.co/QG708dDNH6 None
1890 674767892831932416 This pup was carefully tossed to make it look like she's riding that horse. I have no words this is fabulous. 12/10 https://t.co/Bob33W4sfD None
2131 670086499208155136 "Hi yes this is dog. I can't help with that s- sir please... the manager isn't in right n- well that was rude"\n10/10 https://t.co/DuQXATW27f None
In [41]:
image_predictions_df.shape
Out[41]:
(2075, 12)
In [42]:
image_predictions_df.head()
Out[42]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True
In [43]:
image_predictions_df.sample(5)
Out[43]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
1345 759159934323924993 https://pbs.twimg.com/media/CU1zsMSUAAAS0qW.jpg 1 Irish_terrier 0.254856 True briard 0.227716 True soft-coated_wheaten_terrier 0.223263 True
1041 712097430750289920 https://pbs.twimg.com/media/CeHg1klW8AE4YOB.jpg 1 Labrador_retriever 0.720481 True whippet 0.048032 True Chesapeake_Bay_retriever 0.045046 True
680 683773439333797890 https://pbs.twimg.com/media/CX1AUQ2UAAAC6s-.jpg 1 miniature_pinscher 0.072885 True Labrador_retriever 0.057866 True schipperke 0.053257 True
465 675006312288268288 https://pbs.twimg.com/media/CV4aqCwWsAIi3OP.jpg 1 boxer 0.654697 True space_heater 0.043389 False beagle 0.042848 True
1452 776813020089548800 https://pbs.twimg.com/media/CsfLUDbXEAAu0VF.jpg 1 toy_poodle 0.516610 True miniature_poodle 0.255033 True standard_poodle 0.168989 True
In [44]:
image_predictions_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   tweet_id  2075 non-null   int64  
 1   jpg_url   2075 non-null   object 
 2   img_num   2075 non-null   int64  
 3   p1        2075 non-null   object 
 4   p1_conf   2075 non-null   float64
 5   p1_dog    2075 non-null   bool   
 6   p2        2075 non-null   object 
 7   p2_conf   2075 non-null   float64
 8   p2_dog    2075 non-null   bool   
 9   p3        2075 non-null   object 
 10  p3_conf   2075 non-null   float64
 11  p3_dog    2075 non-null   bool   
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB
In [45]:
image_predictions_df.describe()
Out[45]:
tweet_id img_num p1_conf p2_conf p3_conf
count 2.075000e+03 2075.000000 2075.000000 2.075000e+03 2.075000e+03
mean 7.384514e+17 1.203855 0.594548 1.345886e-01 6.032417e-02
std 6.785203e+16 0.561875 0.271174 1.006657e-01 5.090593e-02
min 6.660209e+17 1.000000 0.044333 1.011300e-08 1.740170e-10
25% 6.764835e+17 1.000000 0.364412 5.388625e-02 1.622240e-02
50% 7.119988e+17 1.000000 0.588230 1.181810e-01 4.944380e-02
75% 7.932034e+17 1.000000 0.843855 1.955655e-01 9.180755e-02
max 8.924206e+17 4.000000 1.000000 4.880140e-01 2.734190e-01
In [46]:
tweet_df.shape
Out[46]:
(2331, 32)
In [47]:
tweet_df.head()
Out[47]:
created_at id id_str full_text truncated display_text_range entities extended_entities source in_reply_to_status_id ... favorited retweeted possibly_sensitive possibly_sensitive_appealable lang retweeted_status quoted_status_id quoted_status_id_str quoted_status_permalink quoted_status
0 Tue Aug 01 16:23:56 +0000 2017 892420643555336193 892420643555336193 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU False [0, 85] {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 892420639486877696, 'id_str': '892420639486877696', 'indices': [86, 109], 'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'url': 'https://t.co/MgUWQ76dJU', 'display_url': 'pic.twitter.com/MgUWQ76dJU', 'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 540, 'h': 528, 'resize': 'fit'}, 'small': {'w': 540, 'h': 528, 'resize': 'fit'}, 'large': {'w': 540, 'h': 528, 'resize': 'fit'}}}]} {'media': [{'id': 892420639486877696, 'id_str': '892420639486877696', 'indices': [86, 109], 'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'url': 'https://t.co/MgUWQ76dJU', 'display_url': 'pic.twitter.com/MgUWQ76dJU', 'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 540, 'h': 528, 'resize': 'fit'}, 'small': {'w': 540, 'h': 528, 'resize': 'fit'}, 'large': {'w': 540, 'h': 528, 'resize': 'fit'}}}]} <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> NaN ... False False False False en NaN NaN NaN NaN NaN
1 Tue Aug 01 00:17:27 +0000 2017 892177421306343426 892177421306343426 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV False [0, 138] {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 892177413194625024, 'id_str': '892177413194625024', 'indices': [139, 162], 'media_url': 'http://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg', 'url': 'https://t.co/0Xxu71qeIV', 'display_url': 'pic.twitter.com/0Xxu71qeIV', 'expanded_url': 'https://twitter.com/dog_rates/status/892177421306343426/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1055, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 598, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1407, 'h': 1600, 'resize': 'fit'}}}]} {'media': [{'id': 892177413194625024, 'id_str': '892177413194625024', 'indices': [139, 162], 'media_url': 'http://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg', 'url': 'https://t.co/0Xxu71qeIV', 'display_url': 'pic.twitter.com/0Xxu71qeIV', 'expanded_url': 'https://twitter.com/dog_rates/status/892177421306343426/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1055, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 598, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1407, 'h': 1600, 'resize': 'fit'}}}]} <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> NaN ... False False False False en NaN NaN NaN NaN NaN
2 Mon Jul 31 00:18:03 +0000 2017 891815181378084864 891815181378084864 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB False [0, 121] {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 891815175371796480, 'id_str': '891815175371796480', 'indices': [122, 145], 'media_url': 'http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg', 'url': 'https://t.co/wUnZnhtVJB', 'display_url': 'pic.twitter.com/wUnZnhtVJB', 'expanded_url': 'https://twitter.com/dog_rates/status/891815181378084864/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 901, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1201, 'h': 1600, 'resize': 'fit'}}}]} {'media': [{'id': 891815175371796480, 'id_str': '891815175371796480', 'indices': [122, 145], 'media_url': 'http://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg', 'url': 'https://t.co/wUnZnhtVJB', 'display_url': 'pic.twitter.com/wUnZnhtVJB', 'expanded_url': 'https://twitter.com/dog_rates/status/891815181378084864/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 901, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1201, 'h': 1600, 'resize': 'fit'}}}]} <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> NaN ... False False False False en NaN NaN NaN NaN NaN
3 Sun Jul 30 15:58:51 +0000 2017 891689557279858688 891689557279858688 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ False [0, 79] {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 891689552724799489, 'id_str': '891689552724799489', 'indices': [80, 103], 'media_url': 'http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg', 'url': 'https://t.co/tD36da7qLQ', 'display_url': 'pic.twitter.com/tD36da7qLQ', 'expanded_url': 'https://twitter.com/dog_rates/status/891689557279858688/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 901, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1201, 'h': 1600, 'resize': 'fit'}}}]} {'media': [{'id': 891689552724799489, 'id_str': '891689552724799489', 'indices': [80, 103], 'media_url': 'http://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg', 'url': 'https://t.co/tD36da7qLQ', 'display_url': 'pic.twitter.com/tD36da7qLQ', 'expanded_url': 'https://twitter.com/dog_rates/status/891689557279858688/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 901, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1201, 'h': 1600, 'resize': 'fit'}}}]} <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> NaN ... False False False False en NaN NaN NaN NaN NaN
4 Sat Jul 29 16:00:24 +0000 2017 891327558926688256 891327558926688256 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f False [0, 138] {'hashtags': [{'text': 'BarkWeek', 'indices': [129, 138]}], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 891327551943041024, 'id_str': '891327551943041024', 'indices': [139, 162], 'media_url': 'http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg', 'url': 'https://t.co/AtUZn91f7f', 'display_url': 'pic.twitter.com/AtUZn91f7f', 'expanded_url': 'https://twitter.com/dog_rates/status/891327558926688256/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 720, 'h': 540, 'resize': 'fit'}, 'large': {'w': 720, 'h': 540, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}]} {'media': [{'id': 891327551943041024, 'id_str': '891327551943041024', 'indices': [139, 162], 'media_url': 'http://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DF6hr6AVYAAZ8G8.jpg', 'url': 'https://t.co/AtUZn91f7f', 'display_url': 'pic.twitter.com/AtUZn91f7f', 'expanded_url': 'https://twitter.com/dog_rates/status/891327558926688256/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 720, 'h': 540, 'resize': 'fit'}, 'large': {'w': 720, 'h': 540, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}, {'id': 891327551947157504, 'id_str': '891327551947157504', 'indices': [139, 162], 'media_url': 'http://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg', 'url': 'https://t.co/AtUZn91f7f', 'display_url': 'pic.twitter.com/AtUZn91f7f', 'expanded_url': 'https://twitter.com/dog_rates/status/891327558926688256/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 720, 'h': 540, 'resize': 'fit'}, 'large': {'w': 720, 'h': 540, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}]} <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> NaN ... False False False False en NaN NaN NaN NaN NaN

5 rows × 32 columns

In [48]:
tweet_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2331 entries, 0 to 2330
Data columns (total 32 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   created_at                     2331 non-null   object 
 1   id                             2331 non-null   int64  
 2   id_str                         2331 non-null   object 
 3   full_text                      2331 non-null   object 
 4   truncated                      2331 non-null   bool   
 5   display_text_range             2331 non-null   object 
 6   entities                       2331 non-null   object 
 7   extended_entities              2059 non-null   object 
 8   source                         2331 non-null   object 
 9   in_reply_to_status_id          77 non-null     float64
 10  in_reply_to_status_id_str      77 non-null     object 
 11  in_reply_to_user_id            77 non-null     float64
 12  in_reply_to_user_id_str        77 non-null     object 
 13  in_reply_to_screen_name        77 non-null     object 
 14  user                           2331 non-null   object 
 15  geo                            0 non-null      object 
 16  coordinates                    0 non-null      object 
 17  place                          1 non-null      object 
 18  contributors                   0 non-null      object 
 19  is_quote_status                2331 non-null   bool   
 20  retweet_count                  2331 non-null   int64  
 21  favorite_count                 2331 non-null   int64  
 22  favorited                      2331 non-null   bool   
 23  retweeted                      2331 non-null   bool   
 24  possibly_sensitive             2196 non-null   object 
 25  possibly_sensitive_appealable  2196 non-null   object 
 26  lang                           2331 non-null   object 
 27  retweeted_status               163 non-null    object 
 28  quoted_status_id               26 non-null     float64
 29  quoted_status_id_str           26 non-null     object 
 30  quoted_status_permalink        26 non-null     object 
 31  quoted_status                  24 non-null     object 
dtypes: bool(4), float64(3), int64(3), object(22)
memory usage: 519.1+ KB
In [49]:
tweet_df.describe()
Out[49]:
id in_reply_to_status_id in_reply_to_user_id retweet_count favorite_count quoted_status_id
count 2.331000e+03 7.700000e+01 7.700000e+01 2331.000000 2331.000000 2.600000e+01
mean 7.419079e+17 7.440692e+17 2.040329e+16 2706.098241 7568.487344 8.113972e+17
std 6.823170e+16 7.524295e+16 1.260797e+17 4575.914665 11747.772050 6.295843e+16
min 6.660209e+17 6.658147e+17 1.185634e+07 1.000000 0.000000 6.721083e+17
25% 6.782670e+17 6.757073e+17 3.589728e+08 548.000000 1319.500000 7.761338e+17
50% 7.182469e+17 7.032559e+17 4.196984e+09 1269.000000 3291.000000 8.281173e+17
75% 7.986692e+17 8.233264e+17 4.196984e+09 3143.500000 9266.000000 8.637581e+17
max 8.924206e+17 8.862664e+17 8.405479e+17 77867.000000 156336.000000 8.860534e+17
In [50]:
all_columns = pd.Series(list(tweet_df) + list(archive_df) + list(image_predictions_df))
all_columns[all_columns.duplicated()]
Out[50]:
33    in_reply_to_status_id
34      in_reply_to_user_id
36                   source
49                 tweet_id
dtype: object

Issues

Quality

twitter archive table

  • Retweets
  • Tweet Updates
  • Duplicates in expanded_urls
  • Tweets without images, i.e expanded_urls == NaN
  • Full dog rating as one string
  • Names with a value of 'a'
  • Names with a value of 'None'
  • Lowercase and Capitalized names
  • Erroneous datatypes
  • Lowercase, Capitalized, Uppercase dog_stages.
  • Floofer and floof for the same dog stage

image prediction table

  • Lowercase and Capitalized algorithm predictions i.e p1, p2, p3

Tidiness

twitter archive table

  • source column has full link html tag with source_name and source_link
  • Separate columns for dog stages

Downloaded tweet data tweet_df table

  • favorite_count,retweet_count from tweet_df to be added to archive_df

Image predictions dataset

  • Separate image prediction tables to be joined to twitter archive

Clean

In [51]:
archive_clean_df = archive_df.copy()
images_clean_df = image_predictions_df.copy()
tweets_clean_df = tweet_df.copy()

# Drop unneeded columns
archive_clean_df.drop(columns=['in_reply_to_status_id','in_reply_to_user_id', 'retweeted_status_id',
                               'retweeted_status_user_id', 'retweeted_status_timestamp'], inplace=True)

Tidiness

Define

Separate source column into source_name and source_url and drop the source column

Code

In [52]:
archive_clean_df['source_url'] = archive_clean_df['source'].str.extract(r'href=[\'"]?([^\'" >]+)', expand=True, flags=re.IGNORECASE)
archive_clean_df['source_name'] = archive_clean_df['source'].str.extract(r'>(.*?)<\/a>', expand=True, flags=re.IGNORECASE)
In [53]:
archive_clean_df.drop(columns=['source'], inplace=True)

Test

In [54]:
archive_clean_df['source_url'].head()
Out[54]:
0    http://twitter.com/download/iphone
1    http://twitter.com/download/iphone
2    http://twitter.com/download/iphone
3    http://twitter.com/download/iphone
4    http://twitter.com/download/iphone
Name: source_url, dtype: object
In [55]:
archive_clean_df['source_name'].head()
Out[55]:
0    Twitter for iPhone
1    Twitter for iPhone
2    Twitter for iPhone
3    Twitter for iPhone
4    Twitter for iPhone
Name: source_name, dtype: object
In [56]:
list(archive_clean_df)
Out[56]:
['tweet_id',
 'timestamp',
 'text',
 'expanded_urls',
 'rating_numerator',
 'rating_denominator',
 'name',
 'doggo',
 'floofer',
 'pupper',
 'puppo',
 'source_url',
 'source_name']

Separate columns for dog stages

Define

Extract the stage of the dog from the text column and put under a new dog_stage column and drop the doggo, floofer, pupper and puppo columns. Set the value of the ones we that couldn't be found to None

Code

In [57]:
# Extract dog stage
archive_clean_df['dog_stage'] = archive_clean_df['text'].str.extract(r'(doggo|pupper|puppo|blep|floofer|floof)',
                                                                   expand=True, flags=re.IGNORECASE)

# Drop columns
archive_clean_df.drop(columns=['doggo','floofer', 'pupper', 'puppo'], inplace=True)

Test

In [58]:
list(archive_clean_df)
Out[58]:
['tweet_id',
 'timestamp',
 'text',
 'expanded_urls',
 'rating_numerator',
 'rating_denominator',
 'name',
 'source_url',
 'source_name',
 'dog_stage']
In [59]:
archive_clean_df['dog_stage'].unique()
Out[59]:
array([nan, 'doggo', 'puppo', 'floof', 'pupper', 'blep', 'Puppo', 'Blep',
       'PUPPER', 'DOGGO', 'Doggo', 'Pupper', 'floofer', 'Floof',
       'Floofer'], dtype=object)
In [60]:
archive_clean_df['dog_stage'].value_counts()
Out[60]:
pupper     262
doggo       91
puppo       37
floof       20
Floof       10
Doggo        8
Pupper       8
Floofer      5
PUPPER       5
floofer      3
Blep         2
DOGGO        1
blep         1
Puppo        1
Name: dog_stage, dtype: int64

favorite_count,retweet_count from tweet_df to be added to archive_df

Define

Rename the id column on the data from the twitter API to tweet_id.
Merge favorite_count and retweet_count to the archive_df, joining on tweet_id

Code

In [61]:
# Rename id to tweet_id
tweets_clean_df.rename(columns={'id':'tweet_id'}, inplace=True)
In [62]:
# Merge favorite_count and retweet_count
archive_clean_df = pd.merge(archive_clean_df, tweets_clean_df[['tweet_id','favorite_count','retweet_count']],
                           on=['tweet_id'], how='left')

Test

In [63]:
list(archive_clean_df)
Out[63]:
['tweet_id',
 'timestamp',
 'text',
 'expanded_urls',
 'rating_numerator',
 'rating_denominator',
 'name',
 'source_url',
 'source_name',
 'dog_stage',
 'favorite_count',
 'retweet_count']
In [64]:
archive_clean_df.head()
Out[64]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count
0 892420643555336193 2017-08-01 16:23:56 +0000 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas http://twitter.com/download/iphone Twitter for iPhone NaN 36240.0 7712.0
1 892177421306343426 2017-08-01 00:17:27 +0000 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly http://twitter.com/download/iphone Twitter for iPhone NaN 31261.0 5700.0
2 891815181378084864 2017-07-31 00:18:03 +0000 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie http://twitter.com/download/iphone Twitter for iPhone NaN 23535.0 3779.0
3 891689557279858688 2017-07-30 15:58:51 +0000 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla http://twitter.com/download/iphone Twitter for iPhone NaN 39538.0 7877.0
4 891327558926688256 2017-07-29 16:00:24 +0000 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f https://twitter.com/dog_rates/status/891327558926688256/photo/1,https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin http://twitter.com/download/iphone Twitter for iPhone NaN 37755.0 8484.0

Quality

Retweets

Define

Remove tweets that contain "RT @" in their text.

Code

In [65]:
archive_clean_df = archive_clean_df[~archive_clean_df['text'].str.contains('RT @', case=False, flags=re.IGNORECASE)]

Test

In [66]:
archive_clean_df.shape
Out[66]:
(2174, 12)
In [67]:
archive_clean_df[archive_clean_df['text'].str.contains('RT @', case=False, flags=re.IGNORECASE)]
Out[67]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count

Tweet Updates

Define

Remove tweets that contain "PUPDATE" in their text.

Code

In [68]:
archive_clean_df = archive_clean_df[~archive_clean_df['text'].str.contains('PUPDATE', case=False, flags=re.IGNORECASE)]

Test

In [69]:
archive_clean_df.shape
Out[69]:
(2168, 12)
In [70]:
archive_clean_df[archive_clean_df['text'].str.contains('PUPDATE', case=False, flags=re.IGNORECASE)]
Out[70]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count

Tweets without images

Define

Remove all tweets that have the expanded_urls column to be null.

Code

In [71]:
# Remove tweets without images
archive_clean_df = archive_clean_df[archive_clean_df['expanded_urls'].notnull()]

Test

In [72]:
# Tweets without images
archive_clean_df[archive_clean_df['expanded_urls'].isnull()]
Out[72]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count
In [73]:
archive_clean_df.shape
Out[73]:
(2114, 12)

Duplicates in expanded_urls

Define

Remove expanded_urls values that are duplicates

Code

In [74]:
# removing duplicates from expanded_urls
archive_clean_df['expanded_urls'] = archive_clean_df['expanded_urls'].apply(lambda x: ','.join(list(set(x.split(',')))))

Test

In [75]:
archive_clean_df['expanded_urls'].sample(15)
Out[75]:
2119                                                                                           https://twitter.com/dog_rates/status/670417414769758208/photo/1
1900                                                                                           https://twitter.com/dog_rates/status/674664755118911488/photo/1
1547                                                                                           https://twitter.com/dog_rates/status/689280876073582592/photo/1
1313                                                                                                                             https://vine.co/v/iXQAm5Lrgrh
1525                                                                                           https://twitter.com/dog_rates/status/690400367696297985/photo/1
102                                                                                            https://twitter.com/dog_rates/status/872620804844003328/photo/1
1423                                                                                           https://twitter.com/dog_rates/status/697995514407682048/photo/1
1100                                                                                           https://twitter.com/dog_rates/status/735648611367784448/photo/1
756                                                                                            https://twitter.com/dog_rates/status/778650543019483137/photo/1
1535                                                                                           https://twitter.com/dog_rates/status/689977555533848577/photo/1
1814                                                                                           https://twitter.com/dog_rates/status/676617503762681856/photo/1
464     https://twitter.com/dog_rates/status/817415592588222464/photo/1,https://www.gofundme.com/help-strudel-walk-again?rcid=ec2be8b6f825461f8ee0fd5dcdf43fea
2337                                                                                           https://twitter.com/dog_rates/status/666268910803644416/photo/1
1304                                                                                           https://twitter.com/dog_rates/status/707411934438625280/photo/1
632                                                                                            https://twitter.com/dog_rates/status/793962221541933056/photo/1
Name: expanded_urls, dtype: object
In [76]:
archive_clean_df.loc[1578].expanded_urls
Out[76]:
'https://twitter.com/dog_rates/status/687317306314240000/photo/1'

Name with value of a

Define

Extract the names from the text with names that have a value of a using a regex pattern.
The names are preceeded by name is or named.

Code

In [77]:
# Extract names
a_name_is = archive_clean_df.query('name == "a"')['text'].str.extract(r'[Nn]ame is (.*?)\.', expand=True, flags=re.IGNORECASE).dropna()
In [78]:
a_name_is
Out[78]:
0
2287 Daryl
In [79]:
archive_clean_df.loc[a_name_is.index, 'name'] = a_name_is[0]
In [80]:
a_named = archive_clean_df.query('name == "a"')['text'].str.extract(r'[Nn]amed (.*?)\.', expand=True, flags=re.IGNORECASE).dropna()
In [81]:
a_named
Out[81]:
0
1853 Wylie
1955 Kip
2034 Jacob (Yacōb)
2066 Rufus
2116 Spork
2125 Cherokee
2128 Hemry
2146 Alphred
2161 Alfredo
2191 Leroi
2218 Chuk
2235 Alfonso
2249 Cheryl
2255 Jessiga
2264 Klint
2273 Kohl
2304 Pepe
2311 Octaviath
2314 Johm
In [82]:
a_named
Out[82]:
0
1853 Wylie
1955 Kip
2034 Jacob (Yacōb)
2066 Rufus
2116 Spork
2125 Cherokee
2128 Hemry
2146 Alphred
2161 Alfredo
2191 Leroi
2218 Chuk
2235 Alfonso
2249 Cheryl
2255 Jessiga
2264 Klint
2273 Kohl
2304 Pepe
2311 Octaviath
2314 Johm
In [83]:
archive_clean_df.loc[a_named.index, 'name'] = a_named[0]

# Set the name of Jacob (Yacōb) to Jacob i.e remove the attached pronunication
archive_clean_df.loc[2034, 'name'] = 'Jacob'

Test

In [84]:
archive_clean_df.loc[a_name_is.index]['name']
Out[84]:
2287    Daryl
Name: name, dtype: object
In [85]:
archive_clean_df.loc[a_named.index, 'name']
Out[85]:
1853        Wylie
1955          Kip
2034        Jacob
2066        Rufus
2116        Spork
2125     Cherokee
2128        Hemry
2146      Alphred
2161      Alfredo
2191        Leroi
2218         Chuk
2235      Alfonso
2249       Cheryl
2255      Jessiga
2264        Klint
2273         Kohl
2304         Pepe
2311    Octaviath
2314         Johm
Name: name, dtype: object

Name with a value of None

Define

Extract the names from the text with names that have a value of None using a regex pattern.
The names are preceeded by name is,named or This is.
For the dogs who still have a name of a set their names to None after getting all the names for the dogs with a name of None we could find from the text.

Code

In [86]:
none_name_is = archive_clean_df.query('name == "None"')['text'].str.extract(r'[Nn]ame is (.*?)\.', expand=True, flags=re.IGNORECASE).dropna()
none_named = archive_clean_df.query('name == "None"')['text'].str.extract(r'[Nn]amed (.*?)\.', expand=True, flags=re.IGNORECASE).dropna()
none_this_is = archive_clean_df.query('name == "None"')['text'].str.extract(r'This is (.*?)\.', expand=True, flags=re.IGNORECASE).dropna()
In [87]:
none_name_is
Out[87]:
0
35 Howard
168 Zoey and she's 13/10 https://t
1678 Thea
1734 Sabertooth (parents must be cool)
2267 Big Jumpy Rat
In [88]:
archive_clean_df.loc[none_name_is.index]
Out[88]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count
35 885518971528720385 2017-07-13 15:19:09 +0000 I have a new hero and his name is Howard. 14/10 https://t.co/gzLHboL7Sk https://twitter.com/4bonds2carbon/status/885517367337512960 14 10 None http://twitter.com/download/iphone Twitter for iPhone NaN 19236.0 3406.0
168 859607811541651456 2017-05-03 03:17:27 +0000 Sorry for the lack of posts today. I came home from school and had to spend quality time with my puppo. Her name is Zoey and she's 13/10 https://t.co/BArWupFAn0 https://twitter.com/dog_rates/status/859607811541651456/photo/1 13 10 None http://twitter.com/download/iphone Twitter for iPhone puppo 17938.0 1487.0
1678 682047327939461121 2015-12-30 03:55:29 +0000 We normally don't rate bears but this one seems nice. Her name is Thea. Appears rather fluffy. 10/10 good bear https://t.co/fZc7MixeeT https://twitter.com/dog_rates/status/682047327939461121/photo/1 10 10 None http://twitter.com/download/iphone Twitter for iPhone NaN 3184.0 928.0
1734 679736210798047232 2015-12-23 18:51:56 +0000 This pup's name is Sabertooth (parents must be cool). Ears for days. Jumps unannounced. 9/10 would pet diligently https://t.co/iazoiNUviP https://twitter.com/dog_rates/status/679736210798047232/photo/1 9 10 None http://twitter.com/download/iphone Twitter for iPhone NaN 2081.0 781.0
2267 667524857454854144 2015-11-20 02:08:22 +0000 Another topnotch dog. His name is Big Jumpy Rat. Massive ass feet. Superior tail. Jumps high af. 12/10 great pup https://t.co/seESNzgsdm https://twitter.com/dog_rates/status/667524857454854144/photo/1 12 10 None http://twitter.com Twitter Web Client NaN 1624.0 1064.0
In [89]:
archive_clean_df.loc[none_name_is.index, 'name'] = ['Howard','Zoey','Thea','Sabertooth', 'Big Jumpy Rat']
In [90]:
none_named
Out[90]:
0
2166 Zeus
2227 Guss
2269 Tickles
In [91]:
archive_clean_df.loc[none_named.index, 'name'] = none_named[0]
In [92]:
none_this_is
Out[92]:
0
184 CHARLIE, MARK
190 AMAZING HI HUMAN I LOVE YOU AS WELL" 13/10 https://t
243 ALL WE NEED https://t
349 Blue
788 SO WILD
887 a Taiwanese Guide Walrus
893 all wrong
1051 what happens after a doggo catches it's tail
1068 Bretagne
1111 dog how may I assist"
1356 a rare Albanian Bingo Seal, but we only rate dogs
1401 a tad late but here's a wonderful Valentine's Day pupper 12/10 https://t
1669 getting really frustrating
1841 a pretty adorable bunny hopping towards the ocean but please only send in dogs
1842 Yoshi
1890 fabulous
2131 dog
In [93]:
archive_clean_df.loc[none_this_is.index][['text','name']]
Out[93]:
text name
184 THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY HI AFTER ALL. PUPGRADED TO A 14/10. WOULD BE AN HONOR TO FLY WITH https://t.co/p1hBHCmWnA None
190 HE'S LIKE "WAIT A MINUTE I'M AN ANIMAL THIS IS AMAZING HI HUMAN I LOVE YOU AS WELL" 13/10 https://t.co/sb73bV5Y7S None
243 SHE DID AN ICY ZOOM AND KNEW WHEN TO PUT ON THE BRAKES 13/10 CANCEL THE GAME THIS IS ALL WE NEED https://t.co/4ctgpGcqAd None
349 I usually only share these on Friday's, but this is Blue. He's a very smoochable pooch who needs your help. 13/10\n\nhttps://t.co/piiX0ke8Z6 https://t.co/1UHrKcaCiO None
788 I WAS SENT THE ACTUAL DOG IN THE PROFILE PIC BY HIS OWNER THIS IS SO WILD. 14/10 ULTIMATE LEGEND STATUS https://t.co/7oQ1wpfxIH None
887 We only rate dogs... this is a Taiwanese Guide Walrus. Im getting real heckin tired of this. Please send dogs. 10/10 https://t.co/49hkNAsubi None
893 No no no this is all wrong. The Walmart had to have run into the dog driving the car. 10/10 someone tell him it's ok\nhttps://t.co/fRaTGcj68A None
1051 For anyone who's wondering, this is what happens after a doggo catches it's tail... 11/10 https://t.co/G4fNhzelDv None
1068 After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ None
1111 "Ello this is dog how may I assist" ...10/10 https://t.co/jeAENpjH7L None
1356 Really guys? Again? I know this is a rare Albanian Bingo Seal, but we only rate dogs. Only send in dogs... 9/10 https://t.co/6JYLpUmBrC None
1401 I know this is a tad late but here's a wonderful Valentine's Day pupper 12/10 https://t.co/hTE2PEwGvi None
1669 I know we joke around on here, but this is getting really frustrating. We rate dogs. Not T-Rex. Thank you... 8/10 https://t.co/5aFw7SWyxU None
1841 Ok, I'll admit this is a pretty adorable bunny hopping towards the ocean but please only send in dogs... 11/10 https://t.co/sfsVCGIipI None
1842 &amp; this is Yoshi. Another world record contender 11/10 (what the hell is happening why are there so many contenders?) https://t.co/QG708dDNH6 None
1890 This pup was carefully tossed to make it look like she's riding that horse. I have no words this is fabulous. 12/10 https://t.co/Bob33W4sfD None
2131 "Hi yes this is dog. I can't help with that s- sir please... the manager isn't in right n- well that was rude"\n10/10 https://t.co/DuQXATW27f None
In [94]:
# Assign dog names gleaned confirmed from the text
archive_clean_df.loc[[184,349,1068,1842], 'name'] = ['Charlie','Blue', 'Bretagne','Yoshi']
In [95]:
archive_clean_df.query('name == "None"').shape
Out[95]:
(607, 12)
In [96]:
# For the dogs who have a name of 'a' that we couldn't get their names, set it to 'None'
dog_a = archive_clean_df.query('name == "a"')
archive_clean_df.loc[dog_a.index,'name'] = 'None'

Test

In [97]:
archive_clean_df.loc[none_name_is.index, 'name']
Out[97]:
35             Howard
168              Zoey
1678             Thea
1734       Sabertooth
2267    Big Jumpy Rat
Name: name, dtype: object
In [98]:
archive_clean_df.loc[none_named.index, 'name']
Out[98]:
2166       Zeus
2227       Guss
2269    Tickles
Name: name, dtype: object
In [99]:
archive_clean_df.loc[[184,349,1068,1842], 'name']
Out[99]:
184      Charlie
349         Blue
1068    Bretagne
1842       Yoshi
Name: name, dtype: object
In [100]:
archive_clean_df['name'].value_counts()
Out[100]:
None          642
Charlie        12
Lucy           11
Oliver         10
Cooper         10
             ... 
officially      1
Marty           1
by              1
Carll           1
Jay             1
Name: name, Length: 980, dtype: int64

Lowercase and Capitalized name

Define

Capitalize all the names

Code

In [101]:
archive_clean_df.name = archive_clean_df.name.str.capitalize()

Test

In [102]:
archive_clean_df['name'].value_counts()
Out[102]:
None       642
Charlie     12
Lucy        11
Cooper      10
Oliver      10
          ... 
Mya          1
Marty        1
Carll        1
Burt         1
Jay          1
Name: name, Length: 980, dtype: int64

Full Dog Rating

Define

Creat rating column in the form Eg. rating_numerator/rating_denominator, Eg. 12/10 from the text and store as a string.

Code

In [103]:
archive_clean_df['rating'] = archive_clean_df[['rating_numerator', 'rating_denominator']].astype(str).apply('/'.join, axis=1)

Test

In [104]:
archive_clean_df.head()
Out[104]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count rating
0 892420643555336193 2017-08-01 16:23:56 +0000 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas http://twitter.com/download/iphone Twitter for iPhone NaN 36240.0 7712.0 13/10
1 892177421306343426 2017-08-01 00:17:27 +0000 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly http://twitter.com/download/iphone Twitter for iPhone NaN 31261.0 5700.0 13/10
2 891815181378084864 2017-07-31 00:18:03 +0000 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie http://twitter.com/download/iphone Twitter for iPhone NaN 23535.0 3779.0 12/10
3 891689557279858688 2017-07-30 15:58:51 +0000 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla http://twitter.com/download/iphone Twitter for iPhone NaN 39538.0 7877.0 13/10
4 891327558926688256 2017-07-29 16:00:24 +0000 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin http://twitter.com/download/iphone Twitter for iPhone NaN 37755.0 8484.0 12/10
In [105]:
type(archive_clean_df['rating'][0])
Out[105]:
str
In [106]:
archive_clean_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2114 entries, 0 to 2355
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   tweet_id            2114 non-null   int64  
 1   timestamp           2114 non-null   object 
 2   text                2114 non-null   object 
 3   expanded_urls       2114 non-null   object 
 4   rating_numerator    2114 non-null   int64  
 5   rating_denominator  2114 non-null   int64  
 6   name                2114 non-null   object 
 7   source_url          2114 non-null   object 
 8   source_name         2114 non-null   object 
 9   dog_stage           407 non-null    object 
 10  favorite_count      2107 non-null   float64
 11  retweet_count       2107 non-null   float64
 12  rating              2114 non-null   object 
dtypes: float64(2), int64(3), object(8)
memory usage: 311.2+ KB

Lowercase, Capitalized, Uppercase dog_stages.

Define

Convert all dog_stage to lowercase

Code

In [107]:
archive_clean_df.dog_stage = archive_clean_df.dog_stage.str.lower()

Test

In [108]:
archive_clean_df['dog_stage'].value_counts()
Out[108]:
pupper     251
doggo       86
puppo       30
floof       29
floofer      8
blep         3
Name: dog_stage, dtype: int64

Floofer and floof for the same dog stage

Define

Set dog_stage of floofer to floof

Code

In [109]:
# Get all dogs belonging to the floofer stage
clean_floofer_stage = archive_clean_df.query('dog_stage == "floofer"')

# Set the stage of the floofer dogs to floof
archive_clean_df.loc[clean_floofer_stage.index, 'dog_stage'] = 'floof'

Test

In [110]:
archive_clean_df.loc[clean_floofer_stage.index, 'dog_stage']
Out[110]:
582     floof
774     floof
984     floof
1022    floof
1091    floof
1110    floof
1534    floof
1614    floof
Name: dog_stage, dtype: object
In [111]:
archive_clean_df.dog_stage.value_counts()
Out[111]:
pupper    251
doggo      86
floof      37
puppo      30
blep        3
Name: dog_stage, dtype: int64
In [112]:
archive_clean_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2114 entries, 0 to 2355
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   tweet_id            2114 non-null   int64  
 1   timestamp           2114 non-null   object 
 2   text                2114 non-null   object 
 3   expanded_urls       2114 non-null   object 
 4   rating_numerator    2114 non-null   int64  
 5   rating_denominator  2114 non-null   int64  
 6   name                2114 non-null   object 
 7   source_url          2114 non-null   object 
 8   source_name         2114 non-null   object 
 9   dog_stage           407 non-null    object 
 10  favorite_count      2107 non-null   float64
 11  retweet_count       2107 non-null   float64
 12  rating              2114 non-null   object 
dtypes: float64(2), int64(3), object(8)
memory usage: 311.2+ KB
In [113]:
archive_clean_df.head()
Out[113]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count rating
0 892420643555336193 2017-08-01 16:23:56 +0000 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas http://twitter.com/download/iphone Twitter for iPhone NaN 36240.0 7712.0 13/10
1 892177421306343426 2017-08-01 00:17:27 +0000 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly http://twitter.com/download/iphone Twitter for iPhone NaN 31261.0 5700.0 13/10
2 891815181378084864 2017-07-31 00:18:03 +0000 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie http://twitter.com/download/iphone Twitter for iPhone NaN 23535.0 3779.0 12/10
3 891689557279858688 2017-07-30 15:58:51 +0000 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla http://twitter.com/download/iphone Twitter for iPhone NaN 39538.0 7877.0 13/10
4 891327558926688256 2017-07-29 16:00:24 +0000 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin http://twitter.com/download/iphone Twitter for iPhone NaN 37755.0 8484.0 12/10

Erroneous Datatypes

Define

  • favorite_count and retweet_count to int
  • timestamp to datetime
  • dog_stage to category

Code

In [114]:
# To datetime
archive_clean_df.timestamp = pd.to_datetime(archive_clean_df.timestamp)

# To category
# Change all the NaN values to string 'None'
archive_clean_df.dog_stage.fillna('None', inplace=True)
archive_clean_df.dog_stage = archive_clean_df.dog_stage.astype('category')
# Change the 'None' back to NaN
archive_clean_df.dog_stage.replace('None',np.nan, inplace=True)

# To int
# set all NaN values to -1
archive_clean_df.favorite_count.fillna(-1, inplace=True)
archive_clean_df.retweet_count.fillna(-1, inplace=True)
archive_clean_df.favorite_count = archive_clean_df.favorite_count.astype('int64')
archive_clean_df.retweet_count = archive_clean_df.retweet_count.astype('int64')
In [115]:
archive_clean_df.dog_stage.value_counts()
Out[115]:
pupper    251
doggo      86
floof      37
puppo      30
blep        3
Name: dog_stage, dtype: int64

Test

In [116]:
archive_clean_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2114 entries, 0 to 2355
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   tweet_id            2114 non-null   int64              
 1   timestamp           2114 non-null   datetime64[ns, UTC]
 2   text                2114 non-null   object             
 3   expanded_urls       2114 non-null   object             
 4   rating_numerator    2114 non-null   int64              
 5   rating_denominator  2114 non-null   int64              
 6   name                2114 non-null   object             
 7   source_url          2114 non-null   object             
 8   source_name         2114 non-null   object             
 9   dog_stage           407 non-null    category           
 10  favorite_count      2114 non-null   int64              
 11  retweet_count       2114 non-null   int64              
 12  rating              2114 non-null   object             
dtypes: category(1), datetime64[ns, UTC](1), int64(5), object(6)
memory usage: 297.0+ KB
In [117]:
archive_clean_df.head()
Out[117]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count rating
0 892420643555336193 2017-08-01 16:23:56+00:00 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas http://twitter.com/download/iphone Twitter for iPhone NaN 36240 7712 13/10
1 892177421306343426 2017-08-01 00:17:27+00:00 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly http://twitter.com/download/iphone Twitter for iPhone NaN 31261 5700 13/10
2 891815181378084864 2017-07-31 00:18:03+00:00 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie http://twitter.com/download/iphone Twitter for iPhone NaN 23535 3779 12/10
3 891689557279858688 2017-07-30 15:58:51+00:00 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla http://twitter.com/download/iphone Twitter for iPhone NaN 39538 7877 13/10
4 891327558926688256 2017-07-29 16:00:24+00:00 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin http://twitter.com/download/iphone Twitter for iPhone NaN 37755 8484 12/10
In [118]:
archive_clean_df.describe()
Out[118]:
tweet_id rating_numerator rating_denominator favorite_count retweet_count
count 2.114000e+03 2114.000000 2114.000000 2114.000000 2114.000000
mean 7.362301e+17 12.250710 10.501892 8290.580416 2504.957900
std 6.704477e+16 40.302977 7.110862 12097.238556 4399.717247
min 6.660209e+17 0.000000 2.000000 -1.000000 -1.000000
25% 6.766148e+17 10.000000 10.000000 1824.500000 554.000000
50% 7.092162e+17 11.000000 10.000000 3786.000000 1214.000000
75% 7.868996e+17 12.000000 10.000000 10310.500000 2856.000000
max 8.924206e+17 1776.000000 170.000000 156336.000000 77867.000000

Lowercase and Capitalized algorithm predictions i.e p1, p2, p3

Define

Convert all the prediction names to lowercase

Code

In [119]:
images_clean_df.p1 = images_clean_df.p1.str.lower()
images_clean_df.p2 = images_clean_df.p2.str.lower()
images_clean_df.p3 = images_clean_df.p3.str.lower()

Test

In [120]:
images_clean_df.head()
Out[120]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 welsh_springer_spaniel 0.465074 True collie 0.156665 True shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 german_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True rottweiler 0.243682 True doberman 0.154629 True

Separate image prediction tables to be joined to twitter archive

Define

  • Get the prediction which is a dog and has the highest algorithm confidence.

Code

In [121]:
breed_predictions = []

for i in range(images_clean_df.shape[0]):
    if images_clean_df['p1_dog'][i]:
        breed_predictions.append(images_clean_df['p1'][i])
    elif images_clean_df['p2_dog'][i]:
        breed_predictions.append(images_clean_df['p2'][i])
    elif images_clean_df['p3_dog'][i]:
        breed_predictions.append(images_clean_df['p3'][i])
    else:
        breed_predictions.append(np.nan)
In [122]:
images_clean_df['predicted_breed'] = breed_predictions
In [123]:
# Join image_predictions to twitter archive
combined_clean = pd.merge(archive_clean_df, images_clean_df[['tweet_id','jpg_url','predicted_breed']],
                           on=['tweet_id'], how='left')

Test

In [124]:
combined_clean.head()
Out[124]:
tweet_id timestamp text expanded_urls rating_numerator rating_denominator name source_url source_name dog_stage favorite_count retweet_count rating jpg_url predicted_breed
0 892420643555336193 2017-08-01 16:23:56+00:00 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU https://twitter.com/dog_rates/status/892420643555336193/photo/1 13 10 Phineas http://twitter.com/download/iphone Twitter for iPhone NaN 36240 7712 13/10 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN
1 892177421306343426 2017-08-01 00:17:27+00:00 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV https://twitter.com/dog_rates/status/892177421306343426/photo/1 13 10 Tilly http://twitter.com/download/iphone Twitter for iPhone NaN 31261 5700 13/10 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg chihuahua
2 891815181378084864 2017-07-31 00:18:03+00:00 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB https://twitter.com/dog_rates/status/891815181378084864/photo/1 12 10 Archie http://twitter.com/download/iphone Twitter for iPhone NaN 23535 3779 12/10 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg chihuahua
3 891689557279858688 2017-07-30 15:58:51+00:00 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ https://twitter.com/dog_rates/status/891689557279858688/photo/1 13 10 Darla http://twitter.com/download/iphone Twitter for iPhone NaN 39538 7877 13/10 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg labrador_retriever
4 891327558926688256 2017-07-29 16:00:24+00:00 This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f https://twitter.com/dog_rates/status/891327558926688256/photo/1 12 10 Franklin http://twitter.com/download/iphone Twitter for iPhone NaN 37755 8484 12/10 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset
In [125]:
combined_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2114 entries, 0 to 2113
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   tweet_id            2114 non-null   int64              
 1   timestamp           2114 non-null   datetime64[ns, UTC]
 2   text                2114 non-null   object             
 3   expanded_urls       2114 non-null   object             
 4   rating_numerator    2114 non-null   int64              
 5   rating_denominator  2114 non-null   int64              
 6   name                2114 non-null   object             
 7   source_url          2114 non-null   object             
 8   source_name         2114 non-null   object             
 9   dog_stage           407 non-null    category           
 10  favorite_count      2114 non-null   int64              
 11  retweet_count       2114 non-null   int64              
 12  rating              2114 non-null   object             
 13  jpg_url             1991 non-null   object             
 14  predicted_breed     1684 non-null   object             
dtypes: category(1), datetime64[ns, UTC](1), int64(5), object(8)
memory usage: 250.0+ KB

Visualization and Insights

We will answer the following questions

  • The most popular Dog Stage
  • The most popular ratings
  • Popular Breed
  • Details of the tweet with most retweets
  • Details of the tweet/dog most favorited
  • Source from which most tweets come.
In [126]:
combined_clean['dog_stage'].value_counts()
Out[126]:
pupper    251
doggo      86
floof      37
puppo      30
blep        3
Name: dog_stage, dtype: int64
In [127]:
# The Stage Most of the Dogs Belong to
stage_count = combined_clean['dog_stage'].value_counts()
plt.subplots(figsize=(25,15))
plt.bar(stage_count.index, stage_count)
plt.xlabel('Dog Stage', fontsize=30)
plt.ylabel('Number of Tweets/Dogs', fontsize=30)
plt.xticks(fontsize=30)
plt.yticks(fontsize=30)
plt.title('Number of Tweets/Dogs Per The Stage of the Dogs', fontsize=30);
In [128]:
# The Most popular ratings
combined_clean['rating'].value_counts()
Out[128]:
12/10      489
10/10      436
11/10      417
13/10      294
9/10       153
8/10        98
7/10        51
14/10       39
5/10        34
6/10        32
3/10        19
4/10        15
2/10         9
1/10         4
143/130      1
165/150      1
99/90        1
1776/10      1
60/50        1
1/2          1
420/10       1
80/80        1
45/50        1
121/110      1
88/80        1
0/10         1
204/170      1
44/40        1
7/11         1
144/120      1
4/20         1
50/50        1
27/10        1
9/11         1
26/10        1
84/70        1
75/10        1
24/7         1
Name: rating, dtype: int64
In [129]:
top_ratings = combined_clean['rating'].value_counts()[:14]
plt.subplots(figsize=(25,15))
plt.bar(top_ratings.index, top_ratings)
plt.xlabel('Dog Rating', fontsize=30)
plt.ylabel('Number of Tweets or Dogs', fontsize=30)
plt.xticks(fontsize=25)
plt.yticks(fontsize=20)
plt.title('Number of Tweets or Dogs Per Rating', fontsize=30);
In [130]:
# The Most Popular breed from the Predictions
combined_clean['predicted_breed'].value_counts()
Out[130]:
golden_retriever      158
labrador_retriever    108
pembroke               95
chihuahua              91
pug                    62
                     ... 
entlebucher             1
irish_wolfhound         1
standard_schnauzer      1
scotch_terrier          1
clumber                 1
Name: predicted_breed, Length: 113, dtype: int64
In [131]:
top_breeds = combined_clean['predicted_breed'].value_counts()[:10]
plt.subplots(figsize=(25,15))
plt.bar(top_breeds.index, top_breeds)
plt.xlabel('Dog Breed', fontsize=30)
plt.ylabel('Number of Tweets or Dogs', fontsize=30)
plt.xticks(fontsize=25, rotation=45)
plt.yticks(fontsize=20)
plt.title('Number of Tweets or Dogs Per Breed', fontsize=30);
In [132]:
# The most favorited tweet
combined_clean.loc[combined_clean['favorite_count'].idxmax()]
Out[132]:
tweet_id                                                                                                               744234799360020481
timestamp                                                                                                       2016-06-18 18:26:18+00:00
text                  Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4
expanded_urls                                                             https://twitter.com/dog_rates/status/744234799360020481/video/1
rating_numerator                                                                                                                       13
rating_denominator                                                                                                                     10
name                                                                                                                                 None
source_url                                                                                             http://twitter.com/download/iphone
source_name                                                                                                            Twitter for iPhone
dog_stage                                                                                                                           doggo
favorite_count                                                                                                                     156336
retweet_count                                                                                                                       77867
rating                                                                                                                              13/10
jpg_url                                           https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg
predicted_breed                                                                                                        labrador_retriever
Name: 826, dtype: object
In [133]:
# The most retweeted tweet
combined_clean.loc[combined_clean['retweet_count'].idxmax()]
Out[133]:
tweet_id                                                                                                               744234799360020481
timestamp                                                                                                       2016-06-18 18:26:18+00:00
text                  Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4
expanded_urls                                                             https://twitter.com/dog_rates/status/744234799360020481/video/1
rating_numerator                                                                                                                       13
rating_denominator                                                                                                                     10
name                                                                                                                                 None
source_url                                                                                             http://twitter.com/download/iphone
source_name                                                                                                            Twitter for iPhone
dog_stage                                                                                                                           doggo
favorite_count                                                                                                                     156336
retweet_count                                                                                                                       77867
rating                                                                                                                              13/10
jpg_url                                           https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg
predicted_breed                                                                                                        labrador_retriever
Name: 826, dtype: object
In [134]:
# source of tweet
combined_clean['source_name'].value_counts()
Out[134]:
Twitter for iPhone     1982
Vine - Make a Scene      91
Twitter Web Client       30
TweetDeck                11
Name: source_name, dtype: int64
In [135]:
tweet_source = combined_clean['source_name'].value_counts()[:10]
plt.subplots(figsize=(25,15))
plt.bar(tweet_source.index, tweet_source)
plt.xlabel('Tweet Source', fontsize=30)
plt.ylabel('Number of Tweets or Dogs', fontsize=30)
plt.xticks(fontsize=25)
plt.yticks(fontsize=20)
plt.title('Number of Tweets or Dogs By Source Name', fontsize=30);
In [136]:
# Save combined_clean to 'twitter_archive_master.csv'
combined_clean.to_csv('twitter_archive_master.csv', index=False)

Conclusions

  • Most of the dogs were in the pupper stage and blep being the least. None of the tweets indicated that a dog was at the snoot stage.
  • Most of the dogs were rated as 12/10.
  • The most popular breed found in the tweets was the Golden Retriever. The next four in descending order are Labrador Retriever, Pembroke, Chihuahua and Pug.
  • Interestingly, the same tweet was most retweeted and the most favorited.
    The name couldn't be found. It is a Labrador Retriever doggo which is rated 13/10, tweeted from 'Twitter with iPhone'. For the link to the tweet click here
  • Most of the tweets came from Twitter for iPhone and the least come from TweetDeck.