Background

While looking at the statistics on mortality from Covid-19 in the United States I was struck by a rather disturbing observation. My first reaction was to attribute it to a mistake made either in the data or in the way I was aggregating the data. It appeared that the baseline per capita annual mortality differed between US states by around a factor of two. In other words, in order to estimate the excess deaths due to Covid the differences between states in te baseline number for each state that needed to be substracted from Covid deaths was far higher than differences between states in the number of deaths attributable to Covid.

To check this I looked for some independent data. I found the US census page here that had baseline data fro the years between 2011 and 2019.

https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html

The results of a very quick analysis does confirm a very large disparity between US states. See maps below.

It is particularly interesting to see the differences between per capita births and per capita deaths accross the USA. When a population is stable its health service will generally attend to equal numbers of births and deaths. However in some US states the numbers of deaths far outweigh the numbers of births. In others the reverse applies. Some states, such as Florida, see high rates of immigration. Others such as New York have negative rates of migration. There is a movement of the elderly population towards states such as Florida and away from the East coast. High mortality in other states is associated with a static ageing population.

Any consideration of the effectiveness of NPI measures to reduce overall mortality in the United States would need to factor in these elements.

This requires further work …

dt<-function(d) {DT::datatable(d, 
                              filter = "top",                         
                              extensions = c('Buttons'), options = list(
                                dom = 'Blfrtip',
                                buttons = c('copy', 'csv', 'excel'), colReorder = TRUE
                              ))}


# Some data wrangling to reshape the file


data("census")
census %>% mutate(RDeaths=Deaths/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_deaths_per_10k=round(mean(RDeaths),1)) %>% arrange(-mean_deaths_per_10k) -> rdeaths
census %>% mutate(RBirths=Births/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_births_per_10k=round(mean(RBirths),1)) %>% arrange(-mean_births_per_10k) -> rbirths

census %>% mutate(RMigration=Net_Migration/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_migration_per_10k=round(mean(RMigration),1)) %>% arrange(-mean_migration_per_10k) -> rmig

covid_deaths %>% mutate(RCovid_deaths=covid_deaths/(Population/10000)) ->rcovid

colors <- brewer.pal(6, "RdYlGn")[6:1]

Maps

Mean deaths per 10 thousand 2011 to 2019

rdeaths_geo<-merge(us,rdeaths)

mapview(rdeaths_geo, zcol="mean_deaths_per_10k", col.regions = colors, layer.name = "Deaths per 10K",
             at = round(quantile(rdeaths$mean_deaths_per_10k,c(0,0.2,0.4,0.6,0.8,1)))) 
# https://mgimond.github.io/simple_moransI_example/
s<-as(rdeaths_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$mean_deaths_per_10k,lw, alternative="greater")
## 
##  Moran I test under randomisation
## 
## data:  s$mean_deaths_per_10k  
## weights: lw    
## 
## Moran I statistic standard deviate = 3.0153, p-value = 0.001283
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.270922318      -0.020833333       0.009361919

Covid deaths in 2020 per 10 thousand

covid_geo<-merge(us,rcovid)

mapview(covid_geo, zcol="RCovid_deaths", col.regions = colors, layer.name = "Deaths per 10K",
             at = round(quantile(rcovid$RCovid_deaths,c(0,0.2,0.4,0.6,0.8,1)))) 
s<-as(covid_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$covid_deaths,lw, alternative="greater")
## 
##  Moran I test under randomisation
## 
## data:  s$covid_deaths  
## weights: lw    
## 
## Moran I statistic standard deviate = 1.6629, p-value = 0.04817
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.136877353      -0.020833333       0.008995178

Covid deaths in 2020 as a percentage of 2011 to 2020 normalised baseline

comb<-merge(rcovid,rdeaths)
comb %>% mutate(percent_baseline=100*RCovid_deaths/mean_deaths_per_10k) -> comb
comb_geo<-merge(us,comb)

mapview(comb_geo, zcol="percent_baseline", col.regions = colors, layer.name = "Covid deaths as percent of baseline",
             at = round(quantile(comb$percent_baseline,c(0,0.2,0.4,0.6,0.8,1)))) 
comb<-merge(comb,measures)
comb %>% ggplot(aes(x=stringency,y=percent_baseline,label=State)) +geom_point() + geom_smooth(method="lm") +geom_text()

s<-as(comb_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$percent_baseline,lw, alternative="greater")
## 
##  Moran I test under randomisation
## 
## data:  s$percent_baseline  
## weights: lw    
## 
## Moran I statistic standard deviate = 2.7804, p-value = 0.002715
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.249266701      -0.020833333       0.009437006
s$inc.lag <- lag.listw(lw, s$percent_baseline)

s@data %>% ggplot(aes(x=inc.lag, y=percent_baseline, label=State)) + geom_point() + geom_text(size=2, check_overlap = TRUE, nudge_y=0.5) +xlab("Mean percent baseline mortality in neighbouring states") +geom_smooth(method="lm") + ylab("Percent baseline mortality")

dd<-s@data
dd<-merge(dd,measures)

mod<-lm(data=dd,percent_baseline~inc.lag+stringency+schools)
drop1(mod, test="F")
## Single term deletions
## 
## Model:
## percent_baseline ~ inc.lag + stringency + schools
##            Df Sum of Sq    RSS    AIC F value    Pr(>F)    
## <none>                  1030.7 157.26                      
## inc.lag     1   311.775 1342.5 168.21 13.6122 0.0006038 ***
## stringency  1    16.258 1046.9 156.03  0.7098 0.4039628    
## schools     1     0.495 1031.2 155.28  0.0216 0.8837809    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mean births per 10 thousand 2011 to 2019

rbirths_geo<-merge(us,rbirths)

mapview(rbirths_geo, zcol="mean_births_per_10k", col.regions = colors, layer.name = "Births per 10K",
             at = round(quantile(rbirths$mean_births_per_10k,c(0,0.2,0.4,0.6,0.8,1)))) 

Mean migration per 10 thousand 2011 to 2019

rmig_geo<-merge(us,rmig)

mapview(rmig_geo, zcol="mean_migration_per_10k", col.regions = colors, layer.name = "Migration per 10K",
             at = quantile(rmig$mean_migration_per_10k,c(0,0.2,0.4,0.6,0.8,1))) 

Data

A searchable data table shows the results of aggregation.

Mean deaths per 10 thousand 2011 to 2019

dt(rdeaths)

Mean births per 10 thousand 2011 to 2019

dt(rbirths)

Mean net migration per 10 thousand 2011 to 2019

dt(rmig)