How Web Scraping Is Used In Apple Music Streaming Data Analysis

We will analyze my personal music streaming statistics from Apple Music in this study. Apple Music is an Apple Inc. music and video streaming service. My personal broadcasting on the platform is represented by the dataset utilized here.

These topics will be discussed here.

  • Data requests and downloads
  • Cleaning and preparing data
  • Analyzing data and gaining interesting insights from it

Requesting and Downloading Data

These are the steps to take. Apple will provide you with your personal information if you ask for it.

  • Go to apple.com/privacy.
  • Please sign in to your account.
  • Make a click on Make a request for a copy of your information.
  • Make sure Apple Media Services Information is checked, then click Continue at the bottom.
  • Select the default size and click Finish Request.
Requesting And Downloading Data
Obtain A Copy Of Your Data
Choose A Maximum File Size

Data Preparation and Cleaning

  • Import any libraries that are required.
  • Obtain the dataset (csv file)
  • Examine the dataframe’s form and columns.
  • Look for any missing values.
  • Examine the column’s fundamental statistics.

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly_express as px

pd.set_option('display.max_columns', None)
                        
Load the Dataset
music_df.head()
Apple Id Number Apple Music Subscription Artist Name Build Version Client IP Address Content Name Content Provider Content Specific Type Device Identifier End Position In Milliseconds End Reason Type Event End Timestamp Event Reason Hint Type Event Received Timestamp Event Start Timestamp Event Type Feature Name Genre Item Type Media Duration In Milliseconds Media Type Metrics Bucket Id Metrics Client Id Milliseconds Since Play Offline Original Title Play Duration Milliseconds Provided Audio Bit Depth Provided Audio Channel Provided Audio Sample Rate Provided Bit Rate Provided Codec Provided Playback Format Source Type Start Position In Milliseconds Store Country Name Targeted Audio Bit Depth Targeted Audio Channel Targeted Audio Sample Rate Targeted Bit Rate Targeted Codec Targeted Playback Format User’s Audio Quality User’s Playback Format UTC Offset In Seconds
0 11569060994 True Bazzi Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b... 106.66.247.0 Paradise The Warner Music Group Song f625ff5caca143772ec5bb7962ef7f5f9267f36a 3312.0 MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM 2019-06-28T16:46:33.382Z NOT_SPECIFIED 2019-06-28T17:21:01.944Z 2019-06-28T16:46:30.070Z PLAY_END library / downloaded_music / songs Pop ITUNES_STORE_CONTENT 169087.0 AUDIO 7044.0 3z44Gmyhz4lXz4xCzBqJzr9hlFsSr 2068562 True NaN 3312.0 NaN NaN NaN NaN NaN NaN ORIGINATING_DEVICE 0 India NaN NaN NaN NaN NaN NaN NaN NaN 19800
1 11569060994 True Raftaar Music/1.0 macOS/10.15 build/19A582a model/MacB... 117.206.166.3 Aage Chal Hungama Digital Media Entertainment Pvt. Song 1C36BB164346 131214.0 MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM 2020-06-18T15:29:01.975Z NOT_SPECIFIED 2020-06-21T04:13:37.975Z 2020-06-18T15:26:50.761Z PLAY_END library Indian Pop ITUNES_STORE_CONTENT 229800.0 AUDIO 3331.0 3z4uZCHNzEVrz4vkzAsJzGurr2k8E 218676000 False NaN 131214.0 NaN NaN NaN NaN NaN NaN ORIGINATING_DEVICE 0 India NaN NaN NaN NaN NaN NaN NaN NaN 19800
2 11569060994 True Master Rakesh, Dr Zeus Music/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b... 117.228.170.82 Kangna (feat. Deepti & Shortie) The Orchard Enterprises Inc. Song f625ff5caca143772ec5bb7962ef7f5f9267f36a 0.0 MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM 2019-08-27T13:22:00.883Z NOT_SPECIFIED 2019-08-27T13:22:05.551Z 2019-08-27T13:22:00.883Z PLAY_END library / album_detail Asia ITUNES_STORE_CONTENT 209118.0 AUDIO 7044.0 3z44Gmyhz4lXz4xCzBqJzr9hlFsSr 4668 False NaN 0.0 NaN NaN NaN NaN NaN NaN ORIGINATING_DEVICE 0 India NaN NaN NaN NaN NaN NaN NaN NaN 19800
3 11569060994 True The Weeknd Music/3.1 iOS/11.3 model/iPhone9,3 hwp/t8010 b... 106.77.1.155 Can't Feel My Face Universal Music International Song f625ff5caca143772ec5bb7962ef7f5f9267f36a 31027.0 SCRUB_BEGIN 2018-04-03T18:15:01.395Z NOT_SPECIFIED 2018-04-03T18:15:14.359Z 2018-04-03T18:15:00.362Z PLAY_END library / downloaded_music / songs R&B/Soul ITUNES_STORE_CONTENT 213577.0 AUDIO 4877.0 3z4pGutFz1mxz4yYz9qazYSewotZt 12964 False NaN 1033.0 NaN NaN NaN NaN NaN NaN ORIGINATING_DEVICE 29994 India NaN NaN NaN NaN NaN NaN NaN NaN 19800
4 11569060994 True Nucleya, DIVINE Music/3.1 iOS/11.2 model/iPhone9,3 hwp/t8010 b... 42.106.57.192 Paintra (From "Mukkabaaz") Eros International USA Inc Song f625ff5caca143772ec5bb7962ef7f5f9267f36a 119958.0 MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM 2017-12-20T12:42:47.599Z NOT_SPECIFIED 2017-12-20T12:42:47.771Z 2017-12-20T12:41:25.722Z PLAY_END library / album_detail Bollywood ITUNES_STORE_CONTENT 232222.0 AUDIO 4877.0 3z4pGutFz1mxz4yYz9qazYSewotZt 172 False NaN 81877.0 NaN NaN NaN NaN NaN NaN ORIGINATING_DEVICE 38081 India NaN NaN NaN NaN NaN NaN NaN NaN 19800

There are 2,11,47 music streaming tracks with 45 features in total. To gain insights from our information, our first task is to remove the columns that aren’t needed. There are several columns in which all of the values are NULL. We must first eliminate such columns.

nans = [col for col in music_df.columns if music_df[col].isnull().all()==True]
print(nans)
['Original Title',
'Provided Audio Bit Depth',
'Provided Audio Channel',
'Provided Audio Sample Rate',
'Provided Bit Rate',
'Provided Codec',
'Provided Playback Format',
'Targeted Audio Bit Depth',
'Targeted Audio Channel',
'Targeted Audio Sample Rate',
'Targeted Bit Rate',
'Targeted Codec',
'Targeted Playback Format',
'User’s Audio Quality',
'User’s Playback Format']
# drop the above columns from the dataframe
music_df.drop(nans, axis=1, inplace=True)

There are a few columns like “Apple Id Number” and “Build Version” that aren’t really useful, so we’ll remove those as well.

to_delete = ['Apple Id Number', 'Build Version', 'Client IP Address', 'Device Identifier', 'Metrics Bucket Id', 'Metrics Client Id', 'UTC Offset In Seconds', 'Store Country Name']
music_df.drop(to_delete, axis=1, inplace=True)

From the original 45 columns, we now have 22 columns in our dataframe. The final issue is converting object-formatted timestamp columns to the actual TimeStamp variable.

music_df['Event End Timestamp'] = pd.to_datetime(music_df['Event End Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Received Timestamp'] = pd.to_datetime(music_df['Event Received Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Start Timestamp'] = pd.to_datetime(music_df['Event Start Timestamp'], format='%Y-%m-%dT%H:%M:%S')
Questions and Answers
1. Who are the Top 10 Favorite Artists?
fig = px.bar(top_10_artist, title="Top 10 favourite artists", labels={"index":"Artists", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Set2)
fig.show()
Who Are The Top 10 Favourite Artists
2. Which are the Top 20 Songs Played? (Favorite Songs)
fig = px.bar(top_20_songs, title="Top 20 favourite songs", labels={"index":"Songs", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Bold)
fig.update_xaxes(tickangle=22)
fig.show()
Which Are The Top 20 Songs Played
3. Who are the Top 10 Favorite Content providers?
fig = px.bar(top_10_labels, title="Top 20 favourite labels", labels={"index":"Music Labels", 'value':"No. of times song label played"}, color_discrete_sequence=px.colors.qualitative.Pastel)
fig.update_xaxes(tickangle=25)
fig.show()
Who Are The Top 10 Favorite Content Providers

To check top tracks from a specific music label provider, we will create a little helper function.

def top_10_song_of_label(label):
    """
    Function to see what are the top musics played from particular label. 
    """
    # use groupby method and sort ascending
    label_df = music_df[music_df['Content Provider'] == label]
    top_10_song = label_df['Content Name'].value_counts()[:10]
    print(top_10_song)
    fig = px.bar(top_10_song, labels={"index": "Song Names", "value": "No. of time song played", "variable":"Song name"}, title=f"Top songs from {label}")
    fig.show()
and it goes like this – for example, top Warner Music Group songs
top_10_song_of_label('The Warner Music Group')
Hola (feat. Maluma)                                     82
I Don't Care                                            69
Thinking Out Loud                                       63
Attention                                               62
Perfect                                                 60
1, 2, 3 (feat. Jason Derulo & De La Ghetto)             59
Dirty Sexy Money (feat. Charli XCX & French Montana)    52
Hymn for the Weekend                                    51
Crown                                                   50
10,000 Hours                                            48
Name: Content Name, dtype: int64
Top Tracks From A Specific Music

Top Songs from T-Series

top_10_song_of_label(‘Super Cassettes Industries Pvt Limited a.k.a. T-Series’)

Ishq Tera              66
Chota Sa Fasana        60
Maahi Ve               59
High Rated Gabru       50
Tu Chale               45
Tera Yaar Hoon Main    45
Befikra                41
Zindagi Do Pal Ki      40
Duniyaa                40
Chalte Chalte          40
Name: Content Name, dtype: int64
top-songs-from-t-series

4. Which are the Top 10 Songs According to Playtime?

fig = px.bar(top_longest_played[:10], labels={"Content Name": "Song Names", "value": "Play Time (in mins)", "variable":"Duration"}, color_discrete_sequence=colors.G10_r)
fig.show()
Which are the Top 10 Songs According to Playtime?

5. What is the Usual Reason to End the Song?

What is the Usual Reason to End the Song?

6. Which is Your Most Favorite Genre?

fig = px.bar(top_genre, color_discrete_sequence=colors.T10_r)
fig.show()
Which is Your Most Favorite Genre?

7. Which Media Type Do You Prefer Most on Apple Music?

fig = px.pie(music_df, names='Media Type', color_discrete_sequence=colors.Dark2, title="Most preferable Media Type (eg. Audio/Video)")
fig.show()
Which Media Type Do You Prefer Most on Apple Music?

8. What Would You Prefer Listening to Music When You Are Online/Offline?

fig = px.pie(music_df, names="Offline", title="Do you prefer listening to music Offline?")
fig.show()
                        
What Would You Prefer Listening to Music When You Are Online/Offline?

9. Which Time do You Prefer to Listen to Music?

fig = px.bar(hours, title="Most active hours (24hr)", labels={"value": "count", "Event Start Timestamp":"Timings (hours)"}, color_discrete_sequence=colors.Prism)
fig.update_xaxes(dtick=1)
fig.show()
Which Time do You Prefer to Listen to Music?

10. Which Month have You Listened to Songs Most?

m = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov','Dec']
fig = px.bar(months, title="Most active Months", text=m, labels={"value": "count", "Event Start Timestamp":"Months"}, color_discrete_sequence=colors.Light24)
fig.update_xaxes(dtick=1)
fig.show()
Which Year Have You Listened to Songs Most on Apple Music?

11. Which Year Have You Listened to Songs Most on Apple Music?

fig = px.bar(years, title="Most active year", labels={"value": "count", "Event Start Timestamp":"Year"}, color_discrete_sequence=colors.Prism_r)
fig.update_xaxes(dtick=1)
fig.show()
Which Year Have You Listened To Songs Most On Apple Music

12. Total Time Spent Listening to Music

total_mins = total_time/60000
print("Total minutes spent: {:.2f} mins".format(total_mins))
total_hours = total_mins/60
print("Total hours spent: {:.2f} hours".format(total_hours))
Total minutes spent: 24568.91 mins Total hours spent: 409.48 hours

From beginning to end, the maximum amount of time you could listen to music is,

total_possible_hours = total_possible_time * 24
print("Total possible hours from start to end: {} hours".format(total_possible_hours))
Total possible hours from start to end: 31632 hours

The important question now is how much of my total available time was spent listening to music.

hours_spent_list = np.array([total_hours, total_possible_hours])
hours_spent_list_labels = [" Actual Hours Spent", "Possible Hours"]

fig, ax = plt.subplots(figsize=(12,6))
ax.pie(hours_spent_list, labels= hours_spent_list_labels, autopct='%1.1f%%',  explode=[0.2,0.2], startangle=180, shadow = True);
plt.title("Hours Spent Percentage");
Total Time Spent Listening To Music

13. Daily Average Songs Played

total_songs = music_df.shape[0]
print("Daily average of songs played: {:.2f} songs".format(total_songs/total_possible_time))
Daily average of songs played: 16.04 songs

You can Connect with us at X-Byte Enterprise Crawling for further queries and Request for a quote!!