Introduction

Spotify’s mission statement reads … “Our mission is to unlock the potential of human creativity – by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.”

There are currently around 50 thousand professional musicians in the UK according to the musician’s union. If Spotify’s mission statement is to be credible then most of these professionals should be getting at the very least “the opportunity” to live off their art through uploading their music to spotify. I used R to evaluate how spotify’s revenue generating process actually works.

Historical revenues from recorded music

The total revenue from all global music sales has been falling steadily since its high point in the late 1990s. Data from https://chart2000.com shows a fall from 35 billion dollars in 2000 to just 15 billion dollars in 2014. There may have been a slight upturn since then as a result of more effective mechanisms being found to gain revenue from on-line streaming.

A naive interpretation of this 15 billion figure in 2014 could be that 15 billion divided by a million (the number of artists Spotify aim to support) would come to 15 thousand dollars. So, if each artist were provided with this level of income from recordings then they would just be able to support themselves for a year providing that their additional income from live performances is healthy. A much less naive and more realistic interpretation would be to consider the way that any revenue is distributed. It is far from equitable.

Spotify statistics

I googled spotify, revenue and profitability to come up with the following recent on-line sources.

https://www.rollingstone.com/music/music-features/spotify-profitable-how-happen-910456/
https://qz.com/1736762/spotify-grows-monthly-active-users-and-turns-profit-shares-jump-15-percent/
https://www.digitalmusicnews.com/2018/12/25/streaming-music-services-pay-2019/

If these sources are reliable, then it is fair to state that in October 2019 spotify reported a gross profit of $490 million on $1.92 billion in revenue. At the time it had 248 million monthly active users and 113 million subscribers. According to Rolling Stone around 25% of the revenue of Spotify is spent on marketing, maintenance of the platform and assorted legal fees.

turnover<-1920
profit<-490
expenses <- 1920*0.25
profit_margin<-round(100*490/1920,1)
remainder <- round(turnover - profit - expenses, 2)

This all results in a profit margin that is 25.5 % of total income from streaming. The stats imply that the total amount returned to artists as passed through promoters and licencing companies is around 950 million dollars (around 1 billion). Given that this revenue is shared between all the artists on a totally global platform, including artists that are living and dead, new and long established it is actually a remarkably small amount of money. The sum can be placed in the context of historical revenue from music sales which peaked at 35 times the current spotify returns to artists. Spotify is of course not the only provider of streaming music, but it is the largest. Total revenues from streaming including Itunes, Amazon, Deezer, You Tube (music only) etc are at best around three to five times the Spotify total. Artists publishing their as streams are competing for a about a thrid of the total sales pool that was potentially available through the tradional recorded music market. However the reduction in overheads partially compensates for this.

Spotify streaming figures

Digital music news reports that Spotify currently pay 0.00437 dollars per stream to the copyright holders.

To put this figure into perspective, even a completely independent track produced, mixed and mastered in a home studio would cost at least one thousand dollars to make. One thousand spotify streams are worth around 4.37 dollars to the artist so it takes 229 thousand streams to make up the basic cost of a track, not taking into account all the hours spent on the underlying creative process.

To add to this. according to Wikipedia under the tradional retail model only 176 singles have exceeded a million sales in the UK, rocket man by Elton John reached 5 million. Contemporary charts assume that 100 spotify streams are equivalent to one sale. This is worth 44 cents to the artist.

When an artist publishes their music on spotify the collection of the money earned through streams is passed onto an intermediate company in return for a fee. This fee is only around five dollars per track per year in the case of a completely independent “unsigned” artist. The licencing agent’s cut can rise substantially when traditional record companies and publishing houses become involved. These companies will also charge for promotion and other complex and intangible “services”" including a higher level of legal copyright protection in the case of the most valuable tracks.

Let’s see how much spotify pays out through this mechanism and who it goes to ….

It is not possible to find out the total number of streams for all artists listed on spotify directly. Spotify do publish publicly available charts of the top 200 songs each day or week. The data can be scraped directly into R from the chart page. I adapted some functions found here, https://rpubs.com/argdata/web_scraping and I downloaded data for all global streams, USA streams UK (GB) streams and Mexican streams.

data(spotify)
d<-filter(spotify, country=="Global")  # Use global data

d$year<-year(d$Date)
d$week<-week(d$Date)
d<-filter(d,year>2017)  # Ignore 2017 as data incomplete
pstream<- 0.00437  ## Price per stream
users<- 248 * 10^6 ## total users
d %>% group_by(year) %>% summarise(streams_billions=round(sum(Streams)/10^9,3), pay_out_millions=round(sum(Streams*pstream)/10^6,3), streams_per_user=round(sum(Streams)/users)) -> glbl

dt(glbl)

So, the total pay outs to the artists that featured in the top 200 spotify streaming charts over the 52 weeks of 2019 amounted to a quite suprising 40% of total revenue.

There are only 377 unique artist names featured in the list of those who have appeared for at least a week in the top 200 in 2019. So this 40% is shared amongst this very small number of artists, leaving 60% for all the rest, which includes all the famous artists from past years including the Beatles, The Rolling Stones, Foo Fighters etc etc… There does not look on the face of it as if there is going to be very much left over to pay to all the “million creative artists” cited in Spotify’s mission statement.

The story gets even worse when we look at the distribution of the streams. Even within this tiny number of privileged spotify “elite”" the spoils are still extremely unevenly distributed.

The code below forms an ordered table with a column for the cumulative percentage of the total. The table can be ordered, searched and opened up to show more than 10 rows at a time. Investigating the results shows that 50% of the chart generated revenue is shared between only 29 artists.

d %>% filter(year==2019) %>% group_by(Artist) %>% summarise(n=n(),n_tracks=length(unique(Track)),revenue=round(sum(Streams*pstream)/1000000,4)) %>% mutate(rank=rank(-revenue,ties.method = c("first"))) %>% arrange(rank) %>% mutate(percent_total= round(100*revenue/ sum(revenue),1)) %>%  mutate(cumulative= cumsum(percent_total)) ->rnks

aqm::dt(rnks)

Modelling revenue distribution

The median revenue per top artist is 263 thousand dollars which is well below the mean of 1.02 million dollars. THese sort of extremely unequal distributions of wealth and revenues are the result of processes based on underlying exponential or power laws. Such processes tend to lead to approximately linear relationships between the rank (or in some cases the logarithm of the rank) and the logaritm of income.Let’s see if this prediction holds by plotting out logged income against ranks and fitting a regression.

ggplot(rnks,aes(x=rank,y=log10(revenue* 1000000))) + geom_point()  + geom_smooth(method="lm") + ggtitle("Logged total artist revenue against rank") + ylab("Log10 revenue.") + xlab("Rank")

The fitted line shown on the figure corresponds to a linear regression of log10 revenue on rank.

mod<-lm(data=rnks, log10(revenue* 1000000)~rank)
coef(mod)
##  (Intercept)         rank 
##  6.679674544 -0.006468917

\(log_{10}(revenue) = 6.68 - 0.0064 rank\)

The data points do indeed fall along the fitted line for most of the trajectory. However the top earning artists still earn substantially more than predicted from the model. It is not quite clear why this effect occurs. A model including some curature would provide a better fit. However there might not be very much to be gained from working to find a more sophisticated mathematical model for the data as there appears to be more than one process involved in generating the data. Furthermore a model that fits to only the top 200 streams can’t be easily extrapolated to predict revenue for the artists who do not reach the chart. To see why this is true we will try doing that using the model that provides a reasonable fit to the chart data.

mod<-lm(data=rnks, log10(revenue* 1000000)~rank)
pd<-data.frame(rank=300:1000,revenue=10^predict(mod, newdata=data.frame(rank=300:1000)))
ggplot(pd,aes(x=rank,y=revenue)) + geom_line() + ggtitle("Extrapolated revenue for artists ranked from 300 to 1000")

It is clear that this model probably can’t be extrapolated in this manner to predict individual artist revenues outside the top 200 as the result does not really seem to be reasonable. The predicted revenue drops to near zero after the 500th ranked artist. The top 400 artists do take a very large proportion of the revenue, but they certainly do not take it all! The integrated area under a curve extending to infinity would not add up to the remaining revenue required to make up the total.

So some other processes are in play here.

Analysis by track

An alternative to summing all streams for an artist is to analyse the revenue per individual track that features in the weekly top 200 charts. This produces a finer grained, more detailed, analysis.

d %>% filter(year==2019)%>% group_by(Track, Artist) %>% summarise(streams=round(sum(Streams)/1000000,3), revenue=round(sum(Streams*pstream)/1000000,4)) %>% ungroup() %>%  mutate(rank=rank(-streams, "min")) %>% arrange(rank) -> tracks

dt(tracks)

Modelling the distribution of revenue per track

ggplot(tracks,aes(x=rank,y=log10(revenue* 1000000))) + geom_point()  + geom_smooth(method="lm") + ggtitle("Log to the base 10 revenue against rank for tracks in top 200 weekly charts") + ylab("Log10 revenue.") + xlab("Rank")

Once again the model does not provide a good fit for the highest ranked tracks. In this case there also seems to be a very clear break point around about the track ranked as 750. The curve flattens out quite markedly. This flatter section of the curve seems to represent a different process taking place. It may be that there is a rather different audience for the less well known tracks. If (and this is a big assumption) the streaming habits of this spotify users can be extrapolated a little further we might be able to find the answer to the question regarding the support that spotify might provide for artists. We can assume that most of the tracks that make it into the top 200 are new on the platform each year, although in 2019 Queen returned with a vegeance as a result of Bohemian rapsody.

Fitting a model to the bottom of the curve and extrapolating.

tracks %>% filter(rank> 760) -> tr2
mod<-lm(data=tr2, log10(revenue*1000)~rank)

pd<-data.frame(rank=760:5000,revenue=10^predict(mod, newdata=data.frame(rank=760:5000)))
ggplot(pd,aes(x=rank,y=revenue)) + geom_line() + ggtitle("Extrapolated revenue for tracks ranked from 750 to 5000") + geom_point(data=tr2, aes(x=rank,y=revenue*1000)) + geom_hline(yintercept = 1,col="red")

Now if the cut off point for a break even track is assumed to be around 1000 dollars the model suggests that only around 2000 tracks reach this revenue in a single year based on streams.

There are some very strong caveats associated with this assumption.

  1. The analysis assumes that tracks that make it into the weekly top 200 are representative of the music uploaded to spotify. This is very far from the truth as only a small fraction of the music falls into the popular genres that are required in order to get enough streams to qualify.
  2. The model assumes that the rank vs revenue relationship found at the base of the chart data continues without flattening off. This is unlikely, partly for the reason stated in the first caveat. Different genres will have different patterns.

However the model is not far from the truth. Although there is no direct way of scraping streams for independent artists strait into R I chose a fairly random sub set of songs released in 2019 by independent bands. Red Rum Club’s catchy song “Would You rather be Lonely” got quite a lot of radio 6 airplay and could well be considered to be in the top thousand or so of UK tracks from 2019. It still fell well beneath the cut off line. The top Live band of the year Idles only just made enough from their best song on Spotify as a four piece to pay the rent. The streams for Blossoms and the Magic Gang have accumulated over 3 to 4 years,

d<-read.csv("indy.csv")
d$revenue_thousands<-round(pstream*d$Streams *1000,2)
names(d)[3]<-c("Streams_millions")
dt(d)

Classic artists from the past

So, assuming limited batant cheating, where has the rest of the revenue money gone? The answer must be that Spotify is acting as a glorified version of “So that’s what I call music”. Compilation albums of greatest hits have consitently outsold most other forms of music. Spotify now plays that role. This is particularly obvious at Christmas time. During December 2019 25% of streaams in the top 200 included the word Christmas.

Wikipedia provide a table of number of streams of the most popular song for each year in which it was released. Songs by classic artist will all be streamed several million times per year, far out weighing the numbers for new releases by

library(dplyr)
library(tidyr)
d<-read.csv("chart.csv")
d$month<-as.character(d$month)
d %>% tidyr::separate(month, into=c("Month","Year")) -> dd
dd$Year<-as.numeric(dd$Year)

dd %>% group_by(Year) %>% summarise(slots=n(), nsongs=length(unique(song)),diversity=round(100*nsongs/slots,1)) -> d1
ggplot(d1,aes(x=Year,y=diversity)) + geom_point() + geom_line() + scale_x_continuous(breaks=2000:2019)  + theme(axis.text.x=element_text(angle=45, hjust=1)) 

Appendix

On January 6 the independent band Wave Chase will release an EP consisting of six original songs onto Spotify. The songs are the result of at least twelve months work which includes the creative process of drafting lyrics and melodies, working on guitar riffs and song structures. The actual recording took place over the space of a week. Mixing, re mixing and mastering has been ongoing since the recording session. There are four members of the band and two sound engineers were involved in the recording. So a conservative estimate of the amount of work involved would be at least one thousand person hours. All the band members have music qualifications at at least A level and all aim to make a living through music at some point. The band play in the smaller local live music venues to audiences of around 50 to 100 people.

Million selling singles

d<-read.csv("million_sellers.csv")
d$Date.released<-gsub("\\[[^\\]]*\\]", "", d$Date.released, perl=TRUE)
dt(d)