While looking at the statistics on mortality from Covid-19 in the United States I was struck by a rather disturbing observation. My first reaction was to attribute it to a mistake made either in the data or in the way I was aggregating the data. It appeared that the baseline per capita annual mortality differed between US states by around a factor of two. In other words, in order to estimate the excess deaths due to Covid the differences between states in te baseline number for each state that needed to be substracted from Covid deaths was far higher than differences between states in the number of deaths attributable to Covid.
To check this I looked for some independent data. I found the US census page here that had baseline data fro the years between 2011 and 2019.
https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html
The results of a very quick analysis does confirm a very large disparity between US states. See maps below.
It is particularly interesting to see the differences between per capita births and per capita deaths accross the USA. When a population is stable its health service will generally attend to equal numbers of births and deaths. However in some US states the numbers of deaths far outweigh the numbers of births. In others the reverse applies. Some states, such as Florida, see high rates of immigration. Others such as New York have negative rates of migration. There is a movement of the elderly population towards states such as Florida and away from the East coast. High mortality in other states is associated with a static ageing population.
Any consideration of the effectiveness of NPI measures to reduce overall mortality in the United States would need to factor in these elements.
This requires further work …
dt<-function(d) {DT::datatable(d,
filter = "top",
extensions = c('Buttons'), options = list(
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel'), colReorder = TRUE
))}
# Some data wrangling to reshape the file
data("census")
census %>% mutate(RDeaths=Deaths/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_deaths_per_10k=round(mean(RDeaths),1)) %>% arrange(-mean_deaths_per_10k) -> rdeaths
census %>% mutate(RBirths=Births/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_births_per_10k=round(mean(RBirths),1)) %>% arrange(-mean_births_per_10k) -> rbirths
census %>% mutate(RMigration=Net_Migration/(Population/10000)) ->census
census %>% group_by(State) %>% summarise(mean_migration_per_10k=round(mean(RMigration),1)) %>% arrange(-mean_migration_per_10k) -> rmig
covid_deaths %>% mutate(RCovid_deaths=covid_deaths/(Population/10000)) ->rcovid
colors <- brewer.pal(6, "RdYlGn")[6:1]
rdeaths_geo<-merge(us,rdeaths)
mapview(rdeaths_geo, zcol="mean_deaths_per_10k", col.regions = colors, layer.name = "Deaths per 10K",
at = round(quantile(rdeaths$mean_deaths_per_10k,c(0,0.2,0.4,0.6,0.8,1))))
# https://mgimond.github.io/simple_moransI_example/
s<-as(rdeaths_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$mean_deaths_per_10k,lw, alternative="greater")
##
## Moran I test under randomisation
##
## data: s$mean_deaths_per_10k
## weights: lw
##
## Moran I statistic standard deviate = 3.0153, p-value = 0.001283
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.270922318 -0.020833333 0.009361919
covid_geo<-merge(us,rcovid)
mapview(covid_geo, zcol="RCovid_deaths", col.regions = colors, layer.name = "Deaths per 10K",
at = round(quantile(rcovid$RCovid_deaths,c(0,0.2,0.4,0.6,0.8,1))))
s<-as(covid_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$covid_deaths,lw, alternative="greater")
##
## Moran I test under randomisation
##
## data: s$covid_deaths
## weights: lw
##
## Moran I statistic standard deviate = 1.6629, p-value = 0.04817
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.136877353 -0.020833333 0.008995178
comb<-merge(rcovid,rdeaths)
comb %>% mutate(percent_baseline=100*RCovid_deaths/mean_deaths_per_10k) -> comb
comb_geo<-merge(us,comb)
mapview(comb_geo, zcol="percent_baseline", col.regions = colors, layer.name = "Covid deaths as percent of baseline",
at = round(quantile(comb$percent_baseline,c(0,0.2,0.4,0.6,0.8,1))))
comb<-merge(comb,measures)
comb %>% ggplot(aes(x=stringency,y=percent_baseline,label=State)) +geom_point() + geom_smooth(method="lm") +geom_text()
s<-as(comb_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$percent_baseline,lw, alternative="greater")
##
## Moran I test under randomisation
##
## data: s$percent_baseline
## weights: lw
##
## Moran I statistic standard deviate = 2.7804, p-value = 0.002715
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.249266701 -0.020833333 0.009437006
s$inc.lag <- lag.listw(lw, s$percent_baseline)
s@data %>% ggplot(aes(x=inc.lag, y=percent_baseline, label=State)) + geom_point() + geom_text(size=2, check_overlap = TRUE, nudge_y=0.5) +xlab("Mean percent baseline mortality in neighbouring states") +geom_smooth(method="lm") + ylab("Percent baseline mortality")
dd<-s@data
dd<-merge(dd,measures)
mod<-lm(data=dd,percent_baseline~inc.lag+stringency+schools)
drop1(mod, test="F")
## Single term deletions
##
## Model:
## percent_baseline ~ inc.lag + stringency + schools
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 1030.7 157.26
## inc.lag 1 311.775 1342.5 168.21 13.6122 0.0006038 ***
## stringency 1 16.258 1046.9 156.03 0.7098 0.4039628
## schools 1 0.495 1031.2 155.28 0.0216 0.8837809
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rbirths_geo<-merge(us,rbirths)
mapview(rbirths_geo, zcol="mean_births_per_10k", col.regions = colors, layer.name = "Births per 10K",
at = round(quantile(rbirths$mean_births_per_10k,c(0,0.2,0.4,0.6,0.8,1))))
rmig_geo<-merge(us,rmig)
mapview(rmig_geo, zcol="mean_migration_per_10k", col.regions = colors, layer.name = "Migration per 10K",
at = quantile(rmig$mean_migration_per_10k,c(0,0.2,0.4,0.6,0.8,1)))
A searchable data table shows the results of aggregation.
dt(rdeaths)
dt(rbirths)
dt(rmig)