how-to-collect-data-from-tiktok

How to extract videos liked or posted by the users, snowball a big user listing from the seed accounts, as well as gather trending videos using an easy API.

TikTok has “boomed like a rocket in the U.S. as well as across the world, making a newer mobile video experience, which has left Facebook and YouTube scrambling for keeping up.”

If you consider only U.S. users with 18 years age, TikTok has reached from 22.2 million visitors during January to 39.2 million visitors during April, as per Comscore data given to Adweek.

graph

This platform has become more than a new cool app: as current events have presented, its algorithms extent videos, which can have real-life consequences. Let’s take some important examples:

  • K-pop fans have used TikTok for pranking Trump’s rally within Tulsa, buying tickets and not showing them up
  • Teenagers on TikTok prearranged “shopping cart desertion” on Trump’s product website in efforts to hide inventories from others
  • Certain users are inspiring Trump’s opposition for clicking on ads about Trump to drive a campaign’s advertising prices
  • This app has spread biased videos of Joe Biden

In brief, TikTok as well as its powerful algorithms have a large real-world guidance, particularly considering that any typical user occupies nearly an hour every day watching videos on a platform. By keeping that in mind, you should realize what TikTok displays to millions of eyeballs daily and for doing that, we would require some data.

Below is the code give about how to gather TikTok data with different ways. We have tried to make it general as well as useful for maximum use cases, however, you would probably require to tweak that as per what you do. The remainders of the post includes how to perform the following:

  • Collecting videos posted by users
  • Collecting videos liked by users
  • Snowball a user’s list
  • Collecting trending videos

(If you are planning to do a few dozens of requests, we suggest to set up a proxy. We haven’t tested it yet, but an API we demo here needs to integrate with the proxies very easily. Second thing, if you need to track reputation over time, you would need to add some timestamps to statistics.)

1. Collecting Videos Posted by Users

Any good place to begin is collecting different videos from the given users. We will use TikTok API (run pip3 to install TikTok API to get a package).

For collecting videos from Washington Post’s TikTok account, you just need to do the following:


from TikTokApi import TikTokApi
api = TikTokApi()
n_videos = 100
username = 'washingtonpost'
user_videos = api.byUsername(username, count=n_videos)

The object of user_videos is now the listing of 100 videos’ dictionaries. You would probably be typically interested within only few stats that you can scrape from full dictionaries with the following functions:


def simple_dict(tiktok_dict):
  to_return = {}
  to_return['user_name'] = tiktok_dict['author']['uniqueId']
  to_return['user_id'] = tiktok_dict['author']['id']
  to_return['video_id'] = tiktok_dict['id']
  to_return['video_desc'] = tiktok_dict['desc']
  to_return['video_time'] = tiktok_dict['createTime']
  to_return['video_length'] = tiktok_dict['video']['duration']
  to_return['video_link'] = 'https://www.tiktok.com/@{}/video/{}?lang=en'.format(to_return['user_name'], to_return['video_id'])
  to_return['n_likes'] = tiktok_dict['stats']['diggCount']
  to_return['n_shares'] = tiktok_dict['stats']['shareCount']
  to_return['n_comments'] = tiktok_dict['stats']['commentCount']
  to_return['n_plays'] = tiktok_dict['stats']['playCount']
  return to_return

After that, we can perform from API-outputteduser_videos listing to an enjoyable and clean table (i.e. Panda data frame) having only three lines:


user_videos = [simple_dict(v) for v in user_videos]
user_videos_df = pd.DataFrame(user_videos)
user_videos_df.to_csv('{}_videos.csv'.format(username),index=False)

That’s what an output file will look like (We have removed a few columns and rows to make that readable here):


video_length,n_likes,n_shares,n_comments,n_plays,video_desc,video_link
8,788,7,19,3317,Prove it #wearyourfacemask,https://www.tiktok.com/@washingtonpost/video/6844911748072492294?lang=en
14,2233,10,35,11300,Can’t believe it worked #PillowSwitch,https://www.tiktok.com/@washingtonpost/video/6844603462106369286?lang=en
12,4413,68,50,22200,The floor is the New York Times #FloorIsLava,https://www.tiktok.com/@washingtonpost/video/6844547929185717510?lang=en
12,5286,45,48,23600,I’m gonna win this year’s Hunger Games.,https://www.tiktok.com/@washingtonpost/video/6844217371339558149?lang=en
53,2149,29,28,10900,"Yes, “comic book reporter” is a real job #Marvel #CaptainAmerica #Falcon",https://www.tiktok.com/@washingtonpost/video/6844193684599164166?lang=en
53,6096,76,55,26400,Journalism on the go 🛴,https://www.tiktok.com/@washingtonpost/video/6843928291900984582?lang=en
13,3015,36,57,15800,Make sure to cover your nose and mouth! #facemasktutorial,https://www.tiktok.com/@washingtonpost/video/6843842040556522758?lang=en
8,10400,90,100,52500,"The U.S. set another single-day record for new #coronavirus cases on Thursday. State health departments across the country reported 39,327 new cases.",https://www.tiktok.com/@washingtonpost/video/6842741220419144965?lang=en
13,6721,122,89,32900,We’re really happy #PhotoStory,https://www.tiktok.com/@washingtonpost/video/6842682949004053765?lang=en
17,6608,45,87,27100,She’s so talented #DogsOfTikTok,https://www.tiktok.com/@washingtonpost/video/6842391490317094149?lang=en
2. Collecting Videos Liked by the Users

Here, you might be interested in videos “liked” by the given users. It is pretty straight to gather. Let’s observe what videos an official TikTok account has enjoyed recently:


username = 'tiktok'
n_videos = 10
liked_videos = api.userLikedbyUsername(username, count=n_videos)
liked_videos = [simple_dict(v) for v in liked_videos]
liked_videos_df = pd.DataFrame(liked_videos)
liked_videos_df.to_csv('{}_liked_videos.csv'.format(username), index=False)

As well as the output files look similar to one from the last time, as it saves a listing of videos:


user_name,video_desc,video_length,video_link,n_likes,n_shares,n_comments,n_plays
patrickhanson17,#trickshot #quarantine #boredinthehouse #fyp #pingpong,35,https://www.tiktok.com/@patrickhanson17/video/6835637364501531909?lang=en,494700,4096,1388,10900000
watercolorartist0,10000 followers a day #watercolor #art #handdrawing #paint,22,https://www.tiktok.com/@watercolorartist0/video/6840954256544042246?lang=en,491500,4067,4560,3100000
shaunt,TRY THIS 10 times and you won’t need to workout today! 🏃🏽🏃🏽🏃🏽🏃🏽🏃🏽#workout #monday #workoutchallenge,10,https://www.tiktok.com/@shaunt/video/6820528865018989829?lang=en,231900,2701,603,7100000
jeremyscheck,enjoy!! #spicyrigatoni #vodkasauce #pennevodka #pastavodka #carbone #pasta #italian,58,https://www.tiktok.com/@jeremyscheck/video/6830166716597816581?lang=en,866300,66700,4025,4800000
mrkevanthony,,10,https://www.tiktok.com/@mrkevanthony/video/6832533673712061702?lang=en,231400,23400,4734,1700000
flexyalex,What just happend ?!  #rainonme #bath #ReplyToComments,13,https://www.tiktok.com/@flexyalex/video/6833198289949674758?lang=en,20400,163,398,290900
jordanrabjohn,reposting cos so many people asked us to put this on Spotify/Apple Music.. so we did 🥺 #duet #foryou #singing #acousticcovers,48,https://www.tiktok.com/@jordanrabjohn/video/6829748747459644678?lang=en,229800,5542,5151,1700000
makayladid,Lmao. Me. Literally 5 mins ago. #fyp #foryoupage #foryou,15,https://www.tiktok.com/@makayladid/video/6767878435327986950?lang=en,1500000,56800,6711,6200000
officialreesetiktok,,11,https://www.tiktok.com/@officialreesetiktok/video/6738164581270588678?lang=en,374200,4655,2535,3700000
brittany_broski,Me trying Kombucha for the first time #foryoupage #foryou #fyp #AllBrandNew,21,https://www.tiktok.com/@brittany_broski/video/6722234609188310277?lang=en,2300000,390600,12500,13600000

3. Snowball a User’s List

Say that you want to make a larger user list from which you can collect videos posted and liked. You can use 50 maximum-followed TikTok accounts, however, 50 might not generate an extensive enough sample.

A substitute approach is using the submitted users to snowball the user’s list from only one user. Initially, we will perform this for different accounts like:

  • tiktok is an official account of an app
  • washingtonpost is amongst our favorite accounts
  • charlidamelio is a most-followed account available on TikTok
  • chunkysdead results in a self-proclaimed “cult” on an app

This is the code we have used:


seed_users = ['tiktok', 'washingtonpost', 'charlidamelio', 'chunkysdead']
seed_ids = [api.getUser(user_name)['userInfo']['user']['id'] for user_name in seed_users]
suggested = [api.getSuggestedUsersbyID(count=20, startingId=s_id) for s_id in seed_ids]

Just think about the suggested users here:


Seed: tiktok,Seed: washingtonpost,Seed: charlidamelio,Seed: chunkysdead
@rachaelrayofficial (Rachael Ray),@ifrc (We are humanitarians),@ifrc (We are humanitarians),@ifrc (We are humanitarians)
@tyrabanks (Tyra Banks),@unmigration (UN Migration),@unmigration (UN Migration),@unmigration (UN Migration)
@shawnmendes (Shawn),@justinbieber (Justin Bieber),@justinbieber (Justin Bieber),@justinbieber (Justin Bieber)
@halsey (Halsey),@rachaelrayofficial (Rachael Ray),@rachaelrayofficial (Rachael Ray),@rachaelrayofficial (Rachael Ray)
@dualipaofficial (Dua Lipa),@theweeknd (The Weeknd),@camilacabello (Camila Cabello),@theweeknd (The Weeknd)
@maroon5 (Maroon 5),@mariahcarey (Mariah Carey),@roddyricch (Roddy Ricch),@mariahcarey (Mariah Carey)
@lizzo (lizzo),@tyrabanks (Tyra Banks),@olivier_rousteing (OR),@tyrabanks (Tyra Banks)
@dojacat (Doja Cat),@shawnmendes (Shawn),@mileycyrus (Miley Cyrus),@shawnmendes (Shawn)
@ozuna (OZUNA),@halsey (Halsey),@samsmith (Sam Smith),@halsey (Halsey)
@diplo (Diplo),@dualipaofficial (Dua Lipa),@brunomars (Bruno Mars),@dualipaofficial (Dua Lipa)
@iamtrevordaniel (Trevor Daniel),@maroon5 (Maroon 5),@katyperry (Katy Perry),@maroon5 (Ma

Particularly, the listing of recommendations for chunkysdead and washingtonpost were matching, and there is ample overlap between other recommendations, therefore, this approach might not provide you what you require.

Another technique of creating a larger user list is using getSuggestedUsersbyIDCrawler for keeping the snowball roll. To make a listing of 100 recommended accounts with tiktok like a seed account, you only require the following codes:


tiktok_id = api.getUser('tiktok')['userInfo']['user']['id']
suggested_100 = api.getSuggestedUsersbyIDCrawler(count=100, startingId=tiktok_id)

It makes the list that has various celebrity accounts, let’s go through some:


@lizzo (lizzo, 8900000 fans)
@wizkhalifa (Wiz Khalifa, 1800000 fans)
@capuchina114 (Capuchina❗️👸🏼, 32600 fans)
@silviastephaniev (Silvia Stephanie💓, 27600 fans)
@theweeknd (The Weeknd, 1400000 fans)
@theawesometalents (Music videos, 33400 fans)
...

From what we have observed, a getSuggestedUsersbyIDCrawler technique begins to branch and get smaller and extra niche accounts that have thousands of followers instead of hundreds of millions or thousands. It is a good news in case, you need a typical dataset.

In case, you wish to collect an extensive sample data using TikTok, we advise you to start with the proposed users’ crawler.

4. Collecting trending videos

Finally, might be you just need to get trending videos about an easy content analysis or keep it up 🙂. The APIs make it very simple, like this:


n_trending = 20
trending_videos = api.trending(count=n_trending)
trending_videos = [simple_dict(v) for v in trending_videos]
trending_videos_df = pd.DataFrame(trending_videos)
trending_videos_df.to_csv('trending.csv',index=False)

This is an output file of trending videos on July 2nd, 2020, Thursday afternoon:


user_name,video_link,n_likes,video_desc,n_shares,n_comments,n_plays
ikiaideed,https://www.tiktok.com/@ikiaideed/video/6823654596888513798?lang=en,829500,#fyp,95300,18100,14900000
scottyhubs,https://www.tiktok.com/@scottyhubs/video/6831567819159686406?lang=en,1800000,Guard dog on duty! Enter at your own risk.   #fy #fyp #tiktokpets #gaurddog,134100,12200,9800000
asher_thrasher44,https://www.tiktok.com/@asher_thrasher44/video/6832087152667512069?lang=en,392100,She hates me😂 #texaslonghorns #fyp #comedy #foryoupage,9414,5205,6000000
devinphysique,https://www.tiktok.com/@devinphysique/video/6830171152644574469?lang=en,62200,HOW UGLY WAS IT FROM 1-10? 😏 #meetmypet #walkingonadream #fyp #tiktok #comedy #foryou #viral #trending #tattoo,94,130,2600000
marianas261,https://www.tiktok.com/@marianas261/video/6830154708947078406?lang=en,148300,no quería pero lo reté 🤣😂@chinito87 #viral #parati #parat #chustoso #chistosas #viralchallenge #viralvide #duosparahacer #paratupagina,324,292,1800000
betch,https://www.tiktok.com/@betch/video/6827237609387904262?lang=en,2200000,If 2020 was a person (ig: @lourdasprec),106400,31200,9500000
eemilylovee21,https://www.tiktok.com/@eemilylovee21/video/6832081894713167109?lang=en,141200,#fyp A cup,1981,370,1600000
timmel916,https://www.tiktok.com/@timmel916/video/6823062586582453510?lang=en,462000,Fish be like ‼️ W/ @staxxkashii #ScoobDance #jumpman #ChipotleSponsorMe #may4th #mycrib #retailtherapy #tiktok #foryou #foryoupage #fyp,48400,5424,6400000
chrisbones19,https://www.tiktok.com/@chrisbones19/video/6825350742677720326?lang=en,9100,#happyandyouknowit #quarantine #quarantinelife #fyp,253,60,405700
ambarcristalmoneg,https://www.tiktok.com/@ambarcristalmoneg/video/6822281735653362949?lang=en,13900,🙋‍♀️🤭😁😁,33,109,515700
sanlypsoubannarat,https://www.tiktok.com/@sanlypsoubannarat/video/6822772580315892998?lan

That’s all for now, thanks a lot for reading! For more details, you can contact X-Byte Enterprise Crawling or ask for a free quote!