#devtools::install_github("dgolicher/spotify") ## Uncomment to install data package
library(lubridate)
library(tidyr)
library(ggplot2)
library(dplyr)
library(spotify)

Introduction

Spotify’s mission statement reads … “Our mission is to unlock the potential of human creativity – by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.”

How credible is this mission statement? I used R to evaluate how spotify’s revenue generating process actually works.

Historical revenues from recorded music

The total revenue from all global music sales has been falling steadily since its high point in the late 1990s. Data from https://chart2000.com shows a fall from 35 billion dollars in 2000 to just 15 billion dollars in 2014. There may have been a slight upturn since then as a result of more effective mechanisms being found to gain revenue from on-line streaming.

data(revenue)
d<-revenue
d %>% group_by(Year) %>% summarise(Value=sum(Value)) %>% 
ggplot(aes(x=Year,y=Value/1000,label=round(Value/1000),1)) + geom_line()+ geom_label() + scale_x_continuous(breaks=2000:2018)  + theme(axis.text.x=element_text(angle=45, hjust=1)) + ggtitle("Total global revenue from music sales (US$ Billions)") + ylab("Revenue Billions US$")

A naive interpretation of the 15 billion figure in 2014 could be that 15 billion divided by a million (the number of artists Spotify aim to support) would come to 15 thousand dollars. So, if each artist were provided with this level of income from recordings then they would just be able to support themselves for a year providing that their additional income from live performances is healthy. A much less naive and more realistic interpretation would be to consider the way that any revenue is distributed. It is far from equitable.

Spotify key statistics

I googled spotify, revenue and profitability to come up with the following recent on-line sources.

https://www.rollingstone.com/music/music-features/spotify-profitable-how-happen-910456/
https://qz.com/1736762/spotify-grows-monthly-active-users-and-turns-profit-shares-jump-15-percent/
https://www.digitalmusicnews.com/2018/12/25/streaming-music-services-pay-2019/ https://www.businessofapps.com/data/spotify-statistics/

If these sources are reliable, then it is fair to state that in October 2019 spotify reported a gross profit of $490 million on $1.92 billion in revenue. At the time it had 248 million monthly active users and 113 million subscribers. According to Rolling Stone around 25% of the revenue of Spotify is spent on marketing, maintenance of the platform and assorted legal fees.

turnover<-1920
profit<-490
expenses <- 1920*0.25
profit_margin<-round(100*490/1920,1)
remainder <- round(turnover - profit - expenses, 2)

This results in a profit margin that is 25.5 % of total income from streaming. The stats imply that the total amount returned to artists that is passed through promoters and licencing companies is around 950 million dollars (around 1 billion). Given that this revenue is shared between all the artists through what is a totally global platform, including artists that are living and dead, new and long established it is actually a remarkably small amount of money. The sum can be placed in the context of historical revenue from music sales which peaked at 35 times the current spotify returns to artists. Spotify is of course not the only provider of streaming music, but it is the largest. Total revenues from streaming including Itunes, Amazon, Deezer, You Tube (music only) etc are at best around three to five times the Spotify total. Artists publishing their as streams are competing for a about a thrid of the total sales pool that was potentially available through the tradional recorded music market. However the reduction in overheads partially compensates for this.

Spotify streaming chart numbers

Digital music news reports that Spotify currently pay 0.00437 dollars per stream to the copyright holders.

To put this figure into perspective, even a completely independent track produced, mixed and mastered to a fully professional standard in a home studio would cost at least one thousand dollars to put together if the production expenses are fairly added up. One thousand spotify streams are worth around 4.37 dollars to the artist so it takes 229 thousand streams to make up the basic cost of a track, not taking into account all the hours spent on the underlying creative process.

Another perspective is to consider that according to Wikipedia under the tradional retail model only 176 physical singles have ever exceeded a million sales in the UK. Rocket man by Elton John reached 5 million. Contemporary charts assume that 100 spotify streams are equivalent to one sale. This is worth 44 cents to the artist.

When an artist publishes their music on spotify the collection of the money earned through streams is passed onto an intermediate company in return for a fee. The fee is only around five dollars per track per year in the case of a completely independent “unsigned” artist. The licencing agent’s cut can however rise substantially when traditional record companies and publishing houses become involved. These companies will also charge for promotion and other complex and intangible “services”" including a higher level of legal copyright protection in the case of the most valuable tracks.

Let’s see how much spotify pays out through this mechanism and who it goes to ….

It is not possible to find out the total number of streams for all artists listed on spotify directly. Spotify do publish publicly available charts of the top 200 songs each day or week. The data can be scraped directly into R from the chart page. I adapted some functions found here, https://rpubs.com/argdata/web_scraping and I downloaded data for all global streams, USA streams UK (GB) streams and Mexican streams.

data(spotify)
d<-filter(spotify, country=="Global")  # Use global data

d$year<-year(d$Date)
d$week<-week(d$Date)
d<-filter(d,year>2017)  # Ignore 2017 as data incomplete
pstream<- 0.00437  ## Price per stream
users<- 248 * 10^6 ## total users
d %>% group_by(year) %>% summarise(streams_billions=round(sum(Streams)/10^9,3), pay_out_millions=round(sum(Streams*pstream)/10^6,3), streams_per_user=round(sum(Streams)/users)) -> glbl

dt(glbl)

So, (assuming that the figure for revenue per stream is accurate and applies universally) the total pay outs to the artists that featured in the top 200 spotify streaming charts over the 52 weeks of 2019 amounted to a very suprising 40% of total revenue.

There are only 377 unique artist names featured in the list of those who have appeared for at least a week in the top 200 in 2019. So this 40% is shared amongst a very small number of artists, leaving 60% for all the rest. The rest includes all the famous artists from past years including the Beatles, The Rolling Stones, Foo Fighters etc etc… There does not look on the face of it as if there is going to be very much left over to pay all the “million creative artists” cited in Spotify’s mission statement.

The picture looks even worse when we look at the distribution of the streams. Even within this tiny number of privileged spotify “elite” the spoils are still extremely unevenly distributed.

The code below forms an ordered table with a column for the cumulative percentage of the total. The table can be ordered, searched and opened up to show more than 10 rows at a time. Investigating the results shows that 50% of the chart generated revenue is shared between just 29 artists! I.e 20% of the total revenue generated from Spotify streams globally go to just 29 named artists! This is astonishing.

d %>% filter(year==2019) %>% group_by(Artist) %>% summarise(n=n(),n_tracks=length(unique(Track)),revenue=round(sum(Streams*pstream)/1000000,4)) %>% mutate(rank=rank(-revenue,ties.method = c("first"))) %>% arrange(rank) %>% mutate(percent_total= round(100*revenue/ sum(revenue),1)) %>%  mutate(cumulative= cumsum(percent_total)) ->rnks

dt(rnks)

Modelling revenue distribution

The median revenue per top artist is 263 thousand dollars which is well below the mean of 1.02 million dollars. THese sort of extremely unequal distributions of wealth and revenues are the result of processes based on underlying exponential or power laws. Such processes tend to lead to approximately linear relationships between the rank (or in some cases the logarithm of the rank) and the logaritm of income.Let’s see if this prediction holds by plotting out logged income against ranks and fitting a regression.

ggplot(rnks,aes(x=rank,y=log10(revenue* 1000000))) + geom_point()  + geom_smooth(method="lm") + ggtitle("Logged total artist revenue against rank") + ylab("Log10 revenue.") + xlab("Rank")

The fitted line shown on the figure corresponds to a linear regression of log10 revenue on rank.

mod<-lm(data=rnks, log10(revenue* 1000000)~rank)
coef(mod)
##  (Intercept)         rank 
##  6.679674544 -0.006468917

\(log_{10}(revenue) = 6.68 - 0.0064 rank\)

The data points do indeed fall along the fitted line for most of the trajectory. However the top earning artists still earn substantially more than predicted from the model. It is not entirely clear why this effect occurs. A model including curvature would provide a better fit. However there might not be very much to be gained from working hard to find a more sophisticated mathematical model as there appears to be more than one process involved in generating the data. The top few artists recieve over ten times more than predicted from the linear model, which adds to the inequality found in the distribution.

A model that fits to only the artists with tracks that have found their way into the top 200 chart can’t necessarily be extrapolated to predict revenue for the artists who do not reach the chart at all. However it might be interesting to try this.

mod<-lm(data=rnks, log10(revenue* 1000000)~rank)
pd<-data.frame(rank=300:1000,revenue=10^predict(mod, newdata=data.frame(rank=300:1000)))
ggplot(pd,aes(x=rank,y=revenue)) + geom_line() + ggtitle("Extrapolated revenue for artists ranked from 300 to 1000")

The predicted revenue drops to near zero after the 500th ranked artist. The top 400 artists do take a very large proportion of the revenue, but they certainly do not take it all, so there must be more going on than this. The integrated area under a curve extending to infinity would not add up to the remaining revenue required to make up the total.

So, other processes are also in play here.

Analysis by track

An alternative to summing all streams for an artist is to analyse the revenue per individual track that features in the weekly top 200 charts. This produces a finer grained, more detailed, analysis.

d %>% filter(year==2019)%>% group_by(Track, Artist) %>% summarise(streams=round(sum(Streams)/1000000,3), revenue=round(sum(Streams*pstream)/1000000,4)) %>% ungroup() %>%  mutate(rank=rank(-streams, "min")) %>% arrange(rank) -> tracks

dt(tracks)

Modelling the distribution of revenue per track

ggplot(tracks,aes(x=rank,y=log10(revenue* 1000000))) + geom_point()  + geom_smooth(method="lm") + ggtitle("Log to the base 10 revenue against rank for tracks in top 200 weekly charts") + ylab("Log10 revenue.") + xlab("Rank")

Once again the model does not provide a good fit for the highest ranked tracks. However in this case there also seems to be an interesting break point around about the track ranked at 760. The curve flattens out quite markedly after this. This flatter section of the curve seems to represent an alternative process taking place. It may be that there is a rather different audience for the less well known tracks. If (and this is a big assumption) the streaming habits of this spotify users can be extrapolated a little further we might be able to find the answer to the question regarding the support that spotify might provide for artists.

Fitting a model to the tail of the curve and extrapolating.

tracks %>% filter(rank> 760) -> tr2
mod<-lm(data=tr2, log10(revenue*1000)~rank)

pd<-data.frame(rank=760:5000,revenue=10^predict(mod, newdata=data.frame(rank=760:5000)))
ggplot(pd,aes(x=rank,y=revenue)) + geom_line() + ggtitle("Extrapolated revenue for tracks ranked from 750 to 5000") + geom_point(data=tr2, aes(x=rank,y=revenue*1000)) + geom_hline(yintercept = 1,col="red")

Now if the cut off point for a break even track is assumed to be around 1000 dollars the model suggests that only around 2000 tracks reach this revenue in a single year based on streams.

There are some strong caveats associated with this analysis.

  1. The analysis assumes that tracks that make it into the weekly top 200 are representative of the music uploaded to spotify. This is very far from the truth as only a small fraction of the music falls into the popular genres that are required in order to get enough streams to qualify.
  2. The model assumes that the rank vs revenue relationship found at the base of the chart data continues without flattening off. This is unlikely, partly for the reason stated in the first caveat. Different genres will have different patterns.
  3. The model assumes that all the tracks that make it into the top 200 charts are new and freshly released. This is not true, especially around Christmas when favourites come back into the chart.

The model probably underestimates the number of streams available for the lower ranked tracks. However it clearly does fit the available data. The pattern does represent the extreme inequality involved, even if the reality might be slightly better for independent artists once all the caveats are taken into account.

There is no direct way of scraping streams for independent artists straight into R. In order to test the model I chose a fairly random sub set of songs released in 2019 by relatively independent bands. Red Rum Club’s catchy song “Would You rather be Lonely” got quite a lot of radio 6 airplay so could well be considered to be in the top thousand or so of UK tracks from 2019. Lauren Hibberd was chosen as a representative new local artist who has recieved support from promotors. Both fell beneath the cut off line of making one thousand dollars. The top live band of the year, Idles, only just made enough from their best song on Spotify as a four piece to pay the rent. The streams for Blossoms and the Magic Gang have accumulated over the last 3 to 4 years and these bands have both been quite heavily promoted by Radio one.

data(indy)
d<-indy
d$rank<-rank(-d$Streams_millions)
dt(d)

The same underlying distribution pattern is shown in this rather esoteric and not necessarily representative sample.

ggplot(d,aes(x=rank,y=revenue_thousands)) + geom_point() + geom_smooth(se=FALSE)

So, at best Spotify offers new artist the chance for their music to be heard. It does not support new artists directly by providing a revenue stream.

Classic artists from the past

So, where has the rest of the revenue money gone? The answer must be in part that Spotify is acting as a glorified version of “So that’s what I call music”. Compilation albums of greatest hits have consitently outsold most other forms of music. Spotify now plays that role. This is particularly obvious at Christmas time. During December 2019 25% of streaams in the top 200 included the word Christmas.

Wikipedia provides a table of number of streams of the most popular song for each year in which it was released. These top streaming songs alone have been streamed a total of 38.5 billion times. The total number of streams for any of the prolific popular artists with a large catalogue of popular songs such as the Beatles or Rolling Stones falls in the tens of billions. The intgrated result of these large number of streams over a large number of past artists far out weighs the number of streams for new releases by new artists. This explains where a large portion of the remaining 60% of revenue that falls to non chart entries actually goes. It clearly does not go to the one million artists in Spotify’s mission statement.

dt(streams)

Conclusion

Record companies have always had a bad reputation. It is a cliche for bands to be ripped off when signing contracts. However the traditional model did at least have some element of redistribution built in to it. Record companies competed amongst each other for new acts. They were often prepared to take risks by signing bands at an early stage of their careers. The distribution of sales of physical records was much less heavily skewed than the streaming stats. So while Spotify has opened access to an audience for millions of artists, at the same time the model used to distribute revenue has led to a dramatic reduction in access to any meaningful revenue from recorded music.

Appendix

Million selling singles

For reference here is a table of the singles that have sold over one million physical copies in the UK.

data(million_sellers)
d<-million_sellers
dt(d)