How Web Scraping Is Used In Apple Music Streaming Data Analysis?

We will analyze my personal music streaming statistics from Apple Music in this study. Apple Music is an Apple Inc. music and video streaming service. My personal broadcasting on the platform is represented by the dataset utilized here.

These topics will be discussed here.

  • Data requests and downloads
  • Cleaning and preparing data
  • Analyzing data and gaining interesting insights from it

Requesting and Downloading Data

These are the steps to take. Apple will provide you with your personal information if you ask for it.

  • Go to apple.com/privacy.
  • Please sign in to your account.
  • Make a click on Make a request for a copy of your information.
  • Make sure Apple Media Services Information is checked, then click Continue at the bottom.
  • Select the default size and click Finish Request.

Data Preparation and Cleaning

  • Import any libraries that are required.
  • Obtain the dataset (csv file)
  • Examine the dataframe’s form and columns.
  • Look for any missing values.
  • Examine the column’s fundamental statistics.

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly_express as px

pd.set_option('display.max_columns', None)
Load the Dataset
music_df.head()
Apple Id NumberApple Music SubscriptionArtist NameBuild VersionClient IP AddressContent NameContent ProviderContent Specific TypeDevice IdentifierEnd Position In MillisecondsEnd Reason TypeEvent End TimestampEvent Reason Hint TypeEvent Received TimestampEvent Start TimestampEvent TypeFeature NameGenreItem TypeMedia Duration In MillisecondsMedia TypeMetrics Bucket IdMetrics Client IdMilliseconds Since PlayOfflineOriginal TitlePlay Duration MillisecondsProvided Audio Bit DepthProvided Audio ChannelProvided Audio Sample RateProvided Bit RateProvided CodecProvided Playback FormatSource TypeStart Position In MillisecondsStore Country NameTargeted Audio Bit DepthTargeted Audio ChannelTargeted Audio Sample RateTargeted Bit RateTargeted CodecTargeted Playback FormatUser’s Audio QualityUser’s Playback FormatUTC Offset In Seconds
011569060994TrueBazziMusic/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b…106.66.247.0ParadiseThe Warner Music GroupSongf625ff5caca143772ec5bb7962ef7f5f9267f36a3312.0MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM2019-06-28T16:46:33.382ZNOT_SPECIFIED2019-06-28T17:21:01.944Z2019-06-28T16:46:30.070ZPLAY_ENDlibrary / downloaded_music / songsPopITUNES_STORE_CONTENT169087.0AUDIO7044.03z44Gmyhz4lXz4xCzBqJzr9hlFsSr2068562TrueNaN3312.0NaNNaNNaNNaNNaNNaNORIGINATING_DEVICE0IndiaNaNNaNNaNNaNNaNNaNNaNNaN19800
111569060994TrueRaftaarMusic/1.0 macOS/10.15 build/19A582a model/MacB…117.206.166.3Aage ChalHungama Digital Media Entertainment Pvt.Song1C36BB164346131214.0MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM2020-06-18T15:29:01.975ZNOT_SPECIFIED2020-06-21T04:13:37.975Z2020-06-18T15:26:50.761ZPLAY_ENDlibraryIndian PopITUNES_STORE_CONTENT229800.0AUDIO3331.03z4uZCHNzEVrz4vkzAsJzGurr2k8E218676000FalseNaN131214.0NaNNaNNaNNaNNaNNaNORIGINATING_DEVICE0IndiaNaNNaNNaNNaNNaNNaNNaNNaN19800
211569060994TrueMaster Rakesh, Dr ZeusMusic/3.1 iOS/13.0 model/iPhone9,3 hwp/t8010 b…117.228.170.82Kangna (feat. Deepti & Shortie)The Orchard Enterprises Inc.Songf625ff5caca143772ec5bb7962ef7f5f9267f36a0.0MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM2019-08-27T13:22:00.883ZNOT_SPECIFIED2019-08-27T13:22:05.551Z2019-08-27T13:22:00.883ZPLAY_ENDlibrary / album_detailAsiaITUNES_STORE_CONTENT209118.0AUDIO7044.03z44Gmyhz4lXz4xCzBqJzr9hlFsSr4668FalseNaN0.0NaNNaNNaNNaNNaNNaNORIGINATING_DEVICE0IndiaNaNNaNNaNNaNNaNNaNNaNNaN19800
311569060994TrueThe WeekndMusic/3.1 iOS/11.3 model/iPhone9,3 hwp/t8010 b…106.77.1.155Can’t Feel My FaceUniversal Music InternationalSongf625ff5caca143772ec5bb7962ef7f5f9267f36a31027.0SCRUB_BEGIN2018-04-03T18:15:01.395ZNOT_SPECIFIED2018-04-03T18:15:14.359Z2018-04-03T18:15:00.362ZPLAY_ENDlibrary / downloaded_music / songsR&B/SoulITUNES_STORE_CONTENT213577.0AUDIO4877.03z4pGutFz1mxz4yYz9qazYSewotZt12964FalseNaN1033.0NaNNaNNaNNaNNaNNaNORIGINATING_DEVICE29994IndiaNaNNaNNaNNaNNaNNaNNaNNaN19800
411569060994TrueNucleya, DIVINEMusic/3.1 iOS/11.2 model/iPhone9,3 hwp/t8010 b…42.106.57.192Paintra (From “Mukkabaaz”)Eros International USA IncSongf625ff5caca143772ec5bb7962ef7f5f9267f36a119958.0MANUALLY_SELECTED_PLAYBACK_OF_A_DIFF_ITEM2017-12-20T12:42:47.599ZNOT_SPECIFIED2017-12-20T12:42:47.771Z2017-12-20T12:41:25.722ZPLAY_ENDlibrary / album_detailBollywoodITUNES_STORE_CONTENT232222.0AUDIO4877.03z4pGutFz1mxz4yYz9qazYSewotZt172FalseNaN81877.0NaNNaNNaNNaNNaNNaNORIGINATING_DEVICE38081IndiaNaNNaNNaNNaNNaNNaNNaNNaN19800

There are 2,11,47 music streaming tracks with 45 features in total. To gain insights from our information, our first task is to remove the columns that aren’t needed. There are several columns in which all of the values are NULL. We must first eliminate such columns.

nans = [col for col in music_df.columns if music_df[col].isnull().all()==True]
print(nans)
['Original Title',
'Provided Audio Bit Depth',
'Provided Audio Channel',
'Provided Audio Sample Rate',
'Provided Bit Rate',
'Provided Codec',
'Provided Playback Format',
'Targeted Audio Bit Depth',
'Targeted Audio Channel',
'Targeted Audio Sample Rate',
'Targeted Bit Rate',
'Targeted Codec',
'Targeted Playback Format',
'User’s Audio Quality',
'User’s Playback Format']
# drop the above columns from the dataframe
music_df.drop(nans, axis=1, inplace=True)

There are a few columns like “Apple Id Number” and “Build Version” that aren’t really useful, so we’ll remove those as well.

to_delete = ['Apple Id Number', 'Build Version', 'Client IP Address', 'Device Identifier', 'Metrics Bucket Id', 'Metrics Client Id', 'UTC Offset In Seconds', 'Store Country Name']
music_df.drop(to_delete, axis=1, inplace=True)

From the original 45 columns, we now have 22 columns in our dataframe. The final issue is converting object-formatted timestamp columns to the actual TimeStamp variable.

music_df['Event End Timestamp'] = pd.to_datetime(music_df['Event End Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Received Timestamp'] = pd.to_datetime(music_df['Event Received Timestamp'], format='%Y-%m-%dT%H:%M:%S')
music_df['Event Start Timestamp'] = pd.to_datetime(music_df['Event Start Timestamp'], format='%Y-%m-%dT%H:%M:%S')
Questions and Answers
1. Who are the Top 10 Favorite Artists?
fig = px.bar(top_10_artist, title="Top 10 favourite artists", labels={"index":"Artists", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Set2)
fig.show()
2. Which are the Top 20 Songs Played? (Favorite Songs)
fig = px.bar(top_20_songs, title="Top 20 favourite songs", labels={"index":"Songs", 'value':"No. of times song played"}, color_discrete_sequence=px.colors.qualitative.Bold)
fig.update_xaxes(tickangle=22)
fig.show()
3. Who are the Top 10 Favorite Content providers?
fig = px.bar(top_10_labels, title="Top 20 favourite labels", labels={"index":"Music Labels", 'value':"No. of times song label played"}, color_discrete_sequence=px.colors.qualitative.Pastel)
fig.update_xaxes(tickangle=25)
fig.show()

 

To check top tracks from a specific music label provider, we will create a little helper function.

def top_10_song_of_label(label):
    """
    Function to see what are the top musics played from particular label. 
    """
    # use groupby method and sort ascending
    label_df = music_df[music_df['Content Provider'] == label]
    top_10_song = label_df['Content Name'].value_counts()[:10]
    print(top_10_song)
    fig = px.bar(top_10_song, labels={"index": "Song Names", "value": "No. of time song played", "variable":"Song name"}, title=f"Top songs from {label}")
    fig.show()
and it goes like this – for example, top Warner Music Group songs
top_10_song_of_label('The Warner Music Group')
Hola (feat. Maluma)                                     82
I Don't Care                                            69
Thinking Out Loud                                       63
Attention                                               62
Perfect                                                 60
1, 2, 3 (feat. Jason Derulo & De La Ghetto)             59
Dirty Sexy Money (feat. Charli XCX & French Montana)    52
Hymn for the Weekend                                    51
Crown                                                   50
10,000 Hours                                            48
Name: Content Name, dtype: int64

Top Songs from T-Series

top_10_song_of_label(‘Super Cassettes Industries Pvt Limited a.k.a. T-Series’)

Ishq Tera              66
Chota Sa Fasana        60
Maahi Ve               59
High Rated Gabru       50
Tu Chale               45
Tera Yaar Hoon Main    45
Befikra                41
Zindagi Do Pal Ki      40
Duniyaa                40
Chalte Chalte          40
Name: Content Name, dtype: int64

4. Which are the Top 10 Songs According to Playtime?

fig = px.bar(top_longest_played[:10], labels={"Content Name": "Song Names", "value": "Play Time (in mins)", "variable":"Duration"}, color_discrete_sequence=colors.G10_r)
fig.show()

5. What is the Usual Reason to End the Song?

 

6. Which is Your Most Favorite Genre?

fig = px.bar(top_genre, color_discrete_sequence=colors.T10_r)
fig.show()

7. Which Media Type Do You Prefer Most on Apple Music?

fig = px.pie(music_df, names='Media Type', color_discrete_sequence=colors.Dark2, title="Most preferable Media Type (eg. Audio/Video)")
fig.show()

8. What Would You Prefer Listening to Music When You Are Online/Offline?

fig = px.pie(music_df, names="Offline", title="Do you prefer listening to music Offline?")
fig.show()

9. Which Time do You Prefer to Listen to Music?

fig = px.bar(hours, title="Most active hours (24hr)", labels={"value": "count", "Event Start Timestamp":"Timings (hours)"}, color_discrete_sequence=colors.Prism)
fig.update_xaxes(dtick=1)
fig.show()

10. Which Month have You Listened to Songs Most?

m = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov','Dec']
fig = px.bar(months, title="Most active Months", text=m, labels={"value": "count", "Event Start Timestamp":"Months"}, color_discrete_sequence=colors.Light24)
fig.update_xaxes(dtick=1)
fig.show()

11. Which Year Have You Listened to Songs Most on Apple Music?

fig = px.bar(years, title="Most active year", labels={"value": "count", "Event Start Timestamp":"Year"}, color_discrete_sequence=colors.Prism_r)
fig.update_xaxes(dtick=1)
fig.show()

12. Total Time Spent Listening to Music

total_mins = total_time/60000
print("Total minutes spent: {:.2f} mins".format(total_mins))
total_hours = total_mins/60
print("Total hours spent: {:.2f} hours".format(total_hours))
Total minutes spent: 24568.91 mins Total hours spent: 409.48 hours

From beginning to end, the maximum amount of time you could listen to music is,

total_possible_hours = total_possible_time * 24
print("Total possible hours from start to end: {} hours".format(total_possible_hours))
Total possible hours from start to end: 31632 hours

The important question now is how much of my total available time was spent listening to music.

hours_spent_list = np.array([total_hours, total_possible_hours])
hours_spent_list_labels = [" Actual Hours Spent", "Possible Hours"]

fig, ax = plt.subplots(figsize=(12,6))
ax.pie(hours_spent_list, labels= hours_spent_list_labels, autopct='%1.1f%%',  explode=[0.2,0.2], startangle=180, shadow = True);
plt.title("Hours Spent Percentage");

13. Daily Average Songs Played

total_songs = music_df.shape[0]
print("Daily average of songs played: {:.2f} songs".format(total_songs/total_possible_time))
Daily average of songs played: 16.04 songs

You can Connect with us at X-Byte Enterprise Crawling for further queries and Request for a quote!!

Alpesh Khunt ✯ Alpesh Khunt ✯
Alpesh Khunt, CEO and Founder of X-Byte Enterprise Crawling created data scraping company in 2012 to boost business growth using real-time data. With a vision for scalable solutions, he developed a trusted web scraping platform that empowers businesses with accurate insights for smarter decision-making.

Related Blogs

April 30, 2026 Reading Time: 6 min
Read More
AI-Powered Web Scraping for Herbicide Resistance Data Solutions
April 26, 2026 Reading Time: 7 min
Read More
Transform Risk into Opportunity with Trusted Data Solutions Provider
April 23, 2026 Reading Time: 6 min
Read More