Python Setlist API

Ever think of a project you’d like to explore, but just can’t find the data for it after searching online for hours? Well this has happened to me multiple times. Recently, when I wanted to look at music data, specifically select artist and/or songs I couldn’t find may resources to get what I wanted.

Check out how I used the below script to bring music data to life in this Tableau Public dashboard.

Before running these code blocks, the first thing you will need to do is follow the instructions on the Setlist FM API page so that you can request access and receive your API Key. These credentials allow you to access the Setlist API.

So, below are the code blocks I used to pull details on sets where Biffy Clyro have performed. This code can be modified for any artist you are interested in.

This first block of code reads in the libraries we will need to use as well as sets up our authentication so we can access the Spotify API. Once authentication is complete you will want to get the ID for the artist or artists you wish to retrieve the data for.

# Import libaries 
import numpy as np
import pandas as pd
import sys
import json
import time
import requests
import string
import regex as re
from flatten_json import flatten
from time import sleep

# Like most API connections you will need to goto https://www.setlist.fm/signin create an account and request access to the API
# This shouldn't take too long to retrieve

# Input your setlist api key below
setlist_api = 'xxxx'

# I will use the biffy
biffy_mbid = '892500b7-09a6-4049-ba92-2d192dd70563' # taken from viewing source page on setlist website

To get this mbid code I first when to Biffy Clyros Setlist page:

Next, you will want to right click on the page and select ‘View Page Source’. You can then use Ctrl+F and search ‘mbid’ to find your mbid key:

The only thing left to do is to scroll to the bottom of the page and note how many pages of sets the artist has.

Next is the script that I used to pull in all Biffy Clyro’s setlists.

# The below website was helpful when flattening the json file
# https://stackoverflow.com/questions/52795561/flattening-nested-json-in-pandas-data-frame

# Use the below code to retrieve all the setlists from the # of pages that you want. 

def get_all_setlists(artist_mbid, api_key, pages):
    result = []
    for page in range(1, pages):
        
        # pull in api data from page
        r = requests.get('https://api.setlist.fm/rest/1.0/artist/' + artist_mbid + '/setlists?p=' + str(page), 
                                headers={'Accept': 'application/json', 'x-api-key': api_key})
        
        # convert api data to json
        rjs = r.json()
        
        # flatten json file to pandas dataframe
        try:
            df = pd.DataFrame( [flatten(x) for x in rjs['setlist'] ])
            df['page'] = page
        except:
            pass
        
        # stack dataframe to dataframe
        result.append(df)
        
        # use the sleep function so calls between API pages are delayed
        sleep(3)
        print('page: ',page, ' completed')
        
    #combine dataframes
    setlists = pd.concat(result)
    
    #convert eventDate to date variable
    setlists['eventDate'] = pd.to_datetime(setlists['eventDate'], format='%d-%m-%Y')
    
    return setlists

# Run Function
final = get_all_setlists(biffy_mbid, setlist_api, 165)
display(final)

# Save Dataframe to CSV
final.to_csv(r'file_location/artist_setlist_data.csv', index = False, header=True)

This code block will even print the pages it has retrieved data from as it pulls the data. As you can see from running the function all you need to retrieve the data is the mbid of the artist, your Setlist API key and the number of pages you want to retrieve the data from.

I then used the next code block to clean up the dataset by filtering to the columns I wanted, renaming the columns, transposing the data and removing rows with missing data.

# clean and transpose data 
print(len(final.columns))
col_wanted = ['page', 'eventDate', 'artist_mbid', 'artist_name', 'artist_url',
             'venue_id','venue_name', 'venue_city_name', 'venue_city_state', 'venue_city_stateCode', 'venue_city_coords_lat', 
              'venue_city_coords_long', 'venue_city_country_code', 'venue_city_country_name', 'venue_url','tour_name', 'url',
              'sets_set_0_song_0_name', 'sets_set_0_song_1_name', 'sets_set_0_song_2_name', 'sets_set_0_song_3_name', 'sets_set_0_song_4_name', 
              'sets_set_0_song_5_name','sets_set_0_song_6_name', 'sets_set_0_song_7_name', 'sets_set_0_song_8_name', 'sets_set_0_song_9_name', 
              'sets_set_0_song_10_name','sets_set_0_song_11_name', 'sets_set_0_song_12_name', 'sets_set_0_song_13_name', 'sets_set_0_song_14_name', 
              'sets_set_0_song_15_name','sets_set_0_song_16_name', 'sets_set_0_song_17_name', 'sets_set_0_song_18_name', 'sets_set_0_song_19_name',
              'sets_set_1_song_0_name', 'sets_set_1_song_1_name', 'sets_set_1_song_2_name']

final_filter = final[col_wanted]
print(len(final_filter.columns))

#rename columns
final_filter.columns = ['page', 'eventDate', 'artist_mbid', 'artist', 'artist_url',
             'venue_id','venue', 'venue_city', 'venue_city_state', 'venue_city_stateCode', 'venue_city_coords_lat', 
              'venue_city_coords_long', 'venue_city_country_code', 'venue_city_country', 'venue_url','tour', 'setlist_url',
              'song_0', 'song_1', 'song_2', 'song_3', 'song_4', 
              'song_5','song_6', 'song_7', 'song_8', 'song_9', 
              'song_10','song_11', 'song_12', 'song_13', 'song_14', 
              'song_15','song_16', 'song_17', 'song_18', 'song_19',
              'encore_song_0', 'encore_song_1', 'encore_song_2']
print(final_filter.columns)

#transpose data
melt = pd.melt(final_filter,
                 id_vars=['page','eventDate', 'artist_mbid', 'artist', 'artist_url',
                         'venue_id','venue', 'venue_city', 'venue_city_state', 'venue_city_stateCode', 'venue_city_coords_lat', 
                          'venue_city_coords_long', 'venue_city_country_code', 'venue_city_country', 'venue_url','tour', 'setlist_url'],
                  value_vars=['song_0', 'song_1', 'song_2', 'song_3', 'song_4', 
              'song_5','song_6', 'song_7', 'song_8', 'song_9', 
              'song_10','song_11', 'song_12', 'song_13', 'song_14', 
              'song_15','song_16', 'song_17', 'song_18', 'song_19',
              'encore_song_0', 'encore_song_1', 'encore_song_2'],
                  var_name='song_number',
                    value_name='song_name')

print("len before removing nas: ", len(melt))
print()

#drop rows with missing data
melt2 = melt[melt['song_name'].notna()]
print("len after removing nas: ", len(melt2))
print()
final_melt = melt2.drop_duplicates()
print("len after dropping duplicates: ", len(final_melt))
print()
#display(final_melt)
final_melt.to_csv(r'file_location/artist_setlist_data_clean.csv', index = False, header=True)

Hopefully, this code allows you to collect setlist data from your favourite artists!

Python Setlist API

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from