library(tidyverse)
library(RColorBrewer)
library(mapview)
library(sf)
library(COVID19)
library(spdep)
library(tmap)
library(usdata)
library(lubridate)
theme_set(theme_bw())

This is an unfinished draft analysis of a very sensitive subject. Do not cite or use any results until they have been reviewed and refined.

Introduction

One of the commonest criticisms of media reporting on the pandemic has been the use of a cumulative death count as a measure of progress of the disease. This has led to the public perception of ever increasing mortality. The number has been used without a reference denominator. At the very least the number of deaths that would be expected to occur naturally over any given period should have been included.

In the highly charged political atmosphere of the United States the raw death toll has been particularly unhelpful as a metric. There has been a tendency to use the raw number as a yardstick to measure the performance of state governors. One difficulty with this that has been widely overlooked is that the baseline mortality differs greatly between individual states.

Baseline mortality

Crude state wide mortality rates: Years 2011 to 2019

In order to establish a suitable baseline for comparison with the Covid-19 associated mortality recorded for 2020 I used data taken from the US census. The complete census only takes place once every decade, but a comprehensive series of household surveys are combined with official records and modelling to fill in the gaps (“Population and Housing Unit Estimates” 2021). Figure 1 shows the results of aggregating the death count by state and year and dividing the total by the state population for the relevant year. This produces the crude death rate (CDR) expressed as deaths per 10 thousand of the population.

The variation between states is much more notable than the weak trend. Crude death rates lie within a range of 75 to 90 deaths per 10 thousand for 50% of the states. There are some notable outliers which leads to an overall range which varies by around a factor of two. West Virginia shows consistent exceptional high mortality while Utah has notably very low intrinsic mortality.

data(census)
d<-census
d$rate<-d$Deaths/(d$Population/10000)
d %>% mutate(lb=ifelse(State %in% c("West Virginia","Utah", "California", "Florida", "New York", "North Dakota","Texas", "Oklahoma") ,State,"")) %>%
  
  ggplot(aes(x=as.factor(Year), y=rate, label=lb)) + geom_boxplot(alpha=0.2,fill="lightgrey",col="lightgrey") + geom_text(size=2) -> g1

g1 + labs(title = "", 
                     caption = "Source: USA census population estimate \n see https://www.census.gov/programs-surveys/popest.html for details",
                     x = "Year",
                     y="Crude death rate (deaths per 10K)") -> g1
g1
Fig 1: Crude death rates per 10 thousand for the  population of the USA States and Puerto Rico. The outlier above the whiskers is West Virginia

Fig 1: Crude death rates per 10 thousand for the population of the USA States and Puerto Rico. The outlier above the whiskers is West Virginia

Crude state wide mortality rates: Years 2019 to 2020

In order to triangulate against the data obtained from the national census estimates data was aggregated from the cdc weekly counts of deaths for 2019 and 2020 (“Weekly Counts of Deaths by State and Select Causes, 2019-2020 - Data.gov” 2021). There are some minor differences between the estimates for all cause deaths in 2019 between the two data sets, but the general pattern and numbers are consistent.

data("weeekly_counts")
cdc<-weekly_counts
cdc$Year<-cdc$MMWR_Year
cdc$State<-cdc$Jurisdiction_of_Occurrence

cdc %>% filter(name=="All_Cause") %>% group_by(State,pop,Year) %>% summarise(deaths=sum(value)) %>% mutate(rate=deaths/(pop/10000)) -> all_cause


all_cause %>% mutate(lb=ifelse(State %in% c("West Virginia","Utah", "California", "Florida", "New Jersey", "North Dakota","Texas") ,State,"")) %>%
  
  ggplot(aes(x=as.factor(Year), y=rate, label=lb)) + geom_boxplot(alpha=0.2,fill="lightgrey",col="lightgrey") + geom_text(size=2) -> g2

g2 + labs(title = "", 
                     caption = "Source: Accessed January 20.\n https://catalog.data.gov/dataset/weekly-counts-of-deaths-by-state-and-select-causes-2019-2020.",
                     x = "Year",
                     y="Crude death rate (deaths per 10K)") 
Fig 1a: Crude all cause death rates per 10 thousand for the  population of the USA States 2019 to 2010

Fig 1a: Crude all cause death rates per 10 thousand for the population of the USA States 2019 to 2010

d %>% group_by(State) %>% summarise(crude_death_rate=round(mean(rate),1)) -> cdr
cdr_totals<-cdr
library(RColorBrewer)
colors <- brewer.pal(7, "RdYlGn")[7:1]
cdr_geo<-merge(us,cdr)
# mapview(cdr_geo, zcol="crude_death_rate", col.regions = colors, layer.name = "Deaths per 10K",
#              at = round(quantile(cdr$crude_death_rate,c(0,0.2,0.4,0.6,0.8,1)),1)) 
# qtm(cdr_geo, fill="crude_death_rate",fill.pallete = colors, breaks=round(quantile(cdr$crude_death_rate,c(0,0.2,0.4,0.6,0.8,1)),1), legend.outside=TRUE) 
map1 <- tm_shape(cdr_geo)+ tm_style_white() +
  tm_fill("crude_death_rate", 
          breaks = round(quantile(cdr$crude_death_rate,c(0,0.1,0.2,0.4,0.6,0.8,0.9,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors )+ tm_text("State", size=0.3) +
  tm_borders() + tm_legend(outside=TRUE) 
#map1

The spatial pattern of crude death rate shown in figure 2 shows a clear pattern. States on both the North East coast and the West tend to have low CDRs, while the Appalachian and Southern States have high values.

map1
Fig 2: Crude death rate calculate as the numbers of deaths per year divided by the population of the state in the year in question. The mean is taken for the years between 2011 to to 2019. During this period the overall CDR has only shown minor change, although with some fluctuations in a few  states

Fig 2: Crude death rate calculate as the numbers of deaths per year divided by the population of the state in the year in question. The mean is taken for the years between 2011 to to 2019. During this period the overall CDR has only shown minor change, although with some fluctuations in a few states

Adjustment for age

A fully comprehensive adjustment of the CDR would require the calculation of age specific death rates for all demographic bands for each state combined with data on the number of deaths for each of the age bands. A less detailed approach that results in an adjustment that follows the same overall pattern can be achieved by regressing the crude death rate on the proportion of the population over 70 and then extracting the residual variability. The analysis is shown in figure 3. States with CDRs that fall above the fitted regression line have more deaths than would be predicted based on the proportion of elderly residents. States that fall below the line have fewer deaths than expected.

The regression itself is a good predictor of overall crude death rates with an \(R^2\) value of 0.43. The position of Florida, falling well below the trend line, suggests that the overall health of the Florida population may be better than would be expected given the age profile for the state as a whole.

cdr<-merge(cdr,acs)
cdr %>% ggplot(aes(x=percent_over_70, y=crude_death_rate, label=State)) + geom_smooth(method="lm",alpha=0.1,col="lightgrey") + geom_text(size=2) -> g2

g2<- g2 + labs(title = "", 
                     caption = "Source: USA census population estimate \n see https://www.census.gov/programs-surveys/popest.html for details",
                     x = "Percent of state population over 70",
                     y="Crude death rate (deaths per 10K)") 
g2
Fig 3: Regression of crude death rate on percent of the state population over 70. The regression is significant (p<0.0001 with an adjusted R squared of 0.432. West Virgina is clear outlier with a much higher than predicted crude mortality rate

Fig 3: Regression of crude death rate on percent of the state population over 70. The regression is significant (p<0.0001 with an adjusted R squared of 0.432. West Virgina is clear outlier with a much higher than predicted crude mortality rate

mod<-lm(data=cdr,crude_death_rate~percent_over_70)
cdr$adjusted_cdr<-residuals(mod)
cdr_geo<-merge(us,cdr)
# mapview(cdr_geo, zcol="adjusted_cdr", col.regions = colors, layer.name = "adjusted_cdr per 10K",
#              at = round(quantile(cdr$adjusted_cdr,c(0,0.2,0.4,0.6,0.8,1)),1)) 
map2 <- tm_shape(cdr_geo)+ tm_style_white() +
  tm_fill("adjusted_cdr", 
          breaks = round(quantile(cdr$adjusted_cdr,c(0,0.1,0.2,0.4,0.6,0.8,0.9,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors )+
  tm_borders() + tm_legend(outside=TRUE) 
#map2
colors2 <- brewer.pal(7, "Reds")
map2a <- tm_shape(cdr_geo)+ tm_style_white() +
  tm_fill("percent_over_70", 
          breaks = round(quantile(cdr$percent_over_70,c(0,0.1,0.2,0.4,0.6,0.8,0.9,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors2 ) + 
  tm_borders() + tm_legend(outside=TRUE) 
#map2a

Adjustment for household income

States with the highest crude death rates, such as West Virginia and Oklahoma, also have the lowest household incomes. Poverty is generally assumed to be linked to poor health outcomes. The median household income is a good proxy indicator for a wide range of socio-economic factors. Figure 4 shows the results of regressing the age adjusted crude death rate against median household income. The regression is highly significant (p<0.0001) with an \(R^2\) of 0.46.

cdr %>% filter(State !="Puerto Rico")-> cdr

cdr%>% ggplot(aes(x=household_income, y=adjusted_cdr, label=State)) + geom_point(size=0) + geom_smooth(method="lm",alpha=0.1,col="lightgrey") + geom_text(size=2) -> g3


g3<- g3 + labs(title = "", 
                     caption = "Source: USA census population estimate \n see https://www.census.gov/programs-surveys/popest.html for details",
                     x = "Median household income",
                     y="CDR adjusted for population over 70") 
g3
Fig 4: Regression of adjusted crude death rate on median household income. The regression is significant (p<0.0001 with an adjusted R squared of 0.46.

Fig 4: Regression of adjusted crude death rate on median household income. The regression is significant (p<0.0001 with an adjusted R squared of 0.46.

mod<-lm(data=cdr,adjusted_cdr~household_income)
cdr$Re_adjusted_cdr<-residuals(mod)

Explanatory power of demographics and income combined.

The analysis demonstrates that both demographic and socio-economic factors are strongly associated with crude death rates. An additive regression model can be fitted in order to provide a measure of the explanatory power of the two factors combined. The model explained 70% of the variability. After holding for differences in the state-wide demography a difference of ten thousand dollars in the mean state-wide household income is associated with a change of around 5.8 in the state-wide CDR.

mod<-lm(data=cdr,crude_death_rate~household_income+percent_over_70)
summary(mod)
## 
## Call:
## lm(formula = crude_death_rate ~ household_income + percent_over_70, 
##     data = cdr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.949  -4.268   0.384   4.735  14.980 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       6.985e+01  1.157e+01   6.034 2.22e-07 ***
## household_income -5.859e-04  9.682e-05  -6.051 2.09e-07 ***
## percent_over_70   5.064e+00  7.559e-01   6.700 2.13e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.952 on 48 degrees of freedom
## Multiple R-squared:  0.7136, Adjusted R-squared:  0.7016 
## F-statistic: 59.79 on 2 and 48 DF,  p-value: 9.308e-14
cdr_geo<-merge(us,cdr)
# mapview(cdr_geo, zcol="re_adjusted_cdr", col.regions = colors, layer.name = "re_adjusted_cdr",
#              at = round(quantile(cdr$re_adjusted_cdr,c(0,0.2,0.4,0.6,0.8,1)),1)) 

map3 <- tm_shape(cdr_geo)+ tm_style_white() +
  tm_fill("Re_adjusted_cdr", 
          breaks = round(quantile(cdr$Re_adjusted_cdr,c(0,0.1,0.2,0.4,0.6,0.8,0.9,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors ) + 
  tm_borders() + tm_legend(outside=TRUE) 
#map3
colors2 <- brewer.pal(7, "PRGn")
map3a <- tm_shape(cdr_geo)+ tm_style_white() +
  tm_fill("household_income", 
          breaks = round(quantile(cdr$household_income,c(0,0.1,0.2,0.4,0.6,0.8,0.9,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors2 ) + 
  tm_borders() + tm_legend(outside=TRUE) 
#map3a

Mapping the results

library(grid)
grid.newpage()
page.layout <- grid.layout(nrow = 2, ncol = 1)
pushViewport(viewport(layout = page.layout))
print(map2a, vp=viewport(layout.pos.row = 1, layout.pos.col = 1))
print(map3a, vp=viewport(layout.pos.row = 2, layout.pos.col = 1))
Fig 5: Input variables used in analysis of crude death rates. The percent of the population over 70 and median hosehold incomes. Household incomes are highly correlated with a wide range of additional socio-economic indicators

Fig 5: Input variables used in analysis of crude death rates. The percent of the population over 70 and median hosehold incomes. Household incomes are highly correlated with a wide range of additional socio-economic indicators

library(grid)
grid.newpage()
page.layout <- grid.layout(nrow = 2, ncol = 1)
pushViewport(viewport(layout = page.layout))
print(map2, vp=viewport(layout.pos.row = 1, layout.pos.col = 1))
print(map3, vp=viewport(layout.pos.row = 2, layout.pos.col = 1))
Fig 6: Crude death rate adjusted for proportion of the state population over 70 and the relationship between CDR and median household income. The most apparent result of the correction for age is found in the case of Florida. Adjustment through holding for the effect of income has relatively little impact on the pattern, as income is a consistent driver of outcome

Fig 6: Crude death rate adjusted for proportion of the state population over 70 and the relationship between CDR and median household income. The most apparent result of the correction for age is found in the case of Florida. Adjustment through holding for the effect of income has relatively little impact on the pattern, as income is a consistent driver of outcome

Summary

  • Baseline crude mortality rates vary by a factor of two between US states.
  • A large fraction of this variability can be attributed directly to variability in the fraction of the population over 70 years of age.
  • Independently of the effect of demography, socio-economic differences between states as captured by media household incomes statistically explains a large proportion of the residual variability

Covid-19 mortality

Covid deaths

At the time of writing (2021-01-22), nine months have passed since the initial phase of the pandemic. The numbers of deaths attributed to Sars-Cov-2 infections in the USA continues to rise. Any analysis will be provisional and the full impact of the pandemic remains to be documented. The baseline CDR analysis used the sum total of deaths over a twelve month period. Therefore a comparable analysis of Covid-19 attributed deaths can use the current cumulative deaths to date as input.

data("States_COVID19")
d<-States_COVID19
d$population<-aqm::clean(d$population)
d$month<-month(d$date)
d$year<-year(d$date) 
d$covid_month<-d$month+(12*(d$year-2020))

d %>% group_by(State, population) %>% summarise(covid_deaths=max(deaths,na.rm=TRUE)) %>% mutate(covid_cdr=covid_deaths/(population/10000))-> cvd

d %>% group_by(State, population, covid_month) %>% summarise(covid_deaths=max(deaths,na.rm=TRUE)) %>% mutate(covid_cdr=covid_deaths/(population/10000))-> cvd2
cvd2 %>% filter(covid_month> 6) %>% mutate(lb=ifelse(State %in% c("Arizona","Utah", "California", "Florida", "New York", "North Dakota","Texas", "South Dakota", "New Jersey") ,State,"")) %>% 
ggplot(aes(as.factor(covid_month),y=covid_cdr, label=lb)) + geom_boxplot(alpha=0.2,fill="lightgrey",col="lightgrey") + geom_text(size=2) ->cg1
cg1<- cg1 + labs(title = "", 
                     caption = "Source: COVID-19 Case Surveillance Public Use Data",
                     x = "Month",
                     y="Covid-19 deaths per 10k population") 
cg1

Mapping Covid deaths

The cumulative numbers of Covid-19 deaths reported for each state divided by the state population (in 2019) is shown in figure 9

cvd_geo<-merge(us,cvd)
covid_map1 <- tm_shape(cvd_geo)+ tm_style_white() +
  tm_fill("covid_cdr", 
          breaks = round(quantile(cvd$covid_cdr,c(0,0.2,0.4,0.6,0.8,1)),1),
          style = "fixed",
          textNA = "No data",
          colorNA = "white", 
          palette = colors )+ tm_text("State", size=0.3) +
  tm_borders() + tm_legend(outside=TRUE) 
covid_map1
Fig 9: Crude cummulative covid death rates per 10k of state population January 2021.

Fig 9: Crude cummulative covid death rates per 10k of state population January 2021.

Correlates of Covid-19 mortality

In contrast to the baseline all cause population adjusted mortality the cumulative deaths attributed to Covid-19 per state show no statistically significant relationship with any of the covariates associated with baseline mortality.

In addition the cumulative NPI stringency index was calculated for the states over the period since non pharmaceutical interventions were introduced. Taking the mean stringency index as a single measure ignores the timing of interventions, but produces a single integrated measure of both intensity and duration of the measures. The duration of school closures in days was also included in the analysis.

Figure 10 shows the results of individual regressions of the crude covid related death ratio on six covariates. Non of the regressions is statistically significant.

cdr<-merge(cdr,cvd)
cdr$covid_percent_of_baseline<-100*(cdr$covid_cdr/cdr$crude_death_rate)
cdr<-merge(cdr,measures)
ggplot(cdr,aes(x=crude_death_rate,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg1

ggplot(cdr,aes(x=percent_over_70,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg2

ggplot(cdr,aes(x=percent_over_70,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg3

ggplot(cdr,aes(x=Re_adjusted_cdr,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg4

ggplot(cdr,aes(stringency,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg5

ggplot(cdr,aes(schools,y=covid_cdr,label=State))  + geom_smooth(method="lm",alpha=0.5,col="black") +geom_point(size=1) ->gg6

multiplot(gg1,gg2,gg3,gg4,gg5,gg6, cols=2)
Fig 10: Relationships between crude cummulative covid death rates per 10k of state population January 2021 and covariates. Non of the regression lines are statistically significant.

Fig 10: Relationships between crude cummulative covid death rates per 10k of state population January 2021 and covariates. Non of the regression lines are statistically significant.

A multiple additive regression model including all six potential explanatory variables did not show any variable to be significantly associated with the covid crude deaths ratio.

model<- lm(data=cdr, covid_cdr~crude_death_rate+percent_over_70+percent_over_70+Re_adjusted_cdr+stringency+schools)
anova(model, test="F")
## Analysis of Variance Table
## 
## Response: covid_cdr
##                  Df Sum Sq Mean Sq F value Pr(>F)
## crude_death_rate  1   9.67  9.6741  0.4423 0.5096
## percent_over_70   1   0.34  0.3439  0.0157 0.9008
## Re_adjusted_cdr   1  25.71 25.7133  1.1756 0.2843
## stringency        1  17.76 17.7630  0.8121 0.3725
## schools           1   1.57  1.5701  0.0718 0.7900
## Residuals        43 940.48 21.8716

Spatial autocorrelation

One alternative explanation for the variability in deaths between states is spatial proximity to neighbouring states with similar death rates. The effect of neighbours can be tested by calculating Moran’s I based on a queen’s move neighbour rule.

cdr_geo<-merge(us,cdr)
s<-as(cdr_geo,"Spatial")
nb <- poly2nb(s, queen=TRUE)
lw <- nb2listw(nb, style="W", zero.policy=TRUE)
moran.test(s$covid_cdr,lw, alternative="greater")
## 
##  Moran I test under randomisation
## 
## data:  s$covid_cdr  
## weights: lw    
## 
## Moran I statistic standard deviate = 3.2181, p-value = 0.0006451
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.293109262      -0.020833333       0.009516844

A more rigorous test involves a Monte Carlo simulation.

moran.mc(s$covid_percent_of_baseline, lw, nsim=999, alternative="greater")
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  s$covid_percent_of_baseline 
## weights: lw  
## number of simulations + 1: 1000 
## 
## statistic = 0.24927, observed rank = 996, p-value = 0.004
## alternative hypothesis: greater

Moran’s I can be visualised by

cdr$mean_neighbours <- lag.listw(lw, s$covid_percent_of_baseline)

cdr %>% ggplot(aes(x=mean_neighbours, y=covid_percent_of_baseline, label=State)) +geom_smooth(method="lm", alpha=0.1,col="lightgrey") + geom_text(size=2, check_overlap = TRUE, nudge_y=0.5) +xlab("Mean percent baseline mortality in neighbouring states")  + ylab("Percent baseline mortality")
 Fig 11: Relationships between crude cummulative covid death rates per 10k of state population by January 2021 and the mean value in neighbouring states. The relationship is significant (Moran's I =0.24, p<0.05).

Fig 11: Relationships between crude cummulative covid death rates per 10k of state population by January 2021 and the mean value in neighbouring states. The relationship is significant (Moran’s I =0.24, p<0.05).

Change in clustering over time

The change the spatial autocorrelation (clustering) of the covid related crude death rate can be depicted as a change in the Moran’s I statistic calculated for the cummulative deaths atributed to covid in each month since the epidemic began in March.

Change in value of Moran’s I

Change in value of Moran’s I

Change in Moran’s I

Change in Moran’s I

Multiple regression model with spatial component

An additive multiple regression model that included the spatial component shows some statistical significance (p=0.017) as a result of inclusion of the spatial term. The only term in the model that was weakly, but statistically significantly, associated with the covid related crude death ratio was the mean crude death ratio related to covid in neighbouring states.

model<- lm(data=cdr, covid_cdr~crude_death_rate+percent_over_70+percent_over_70+Re_adjusted_cdr+stringency+schools + mean_neighbours)
anova(model, test="F")
## Analysis of Variance Table
## 
## Response: covid_cdr
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## crude_death_rate  1   9.67   9.674  0.5788 0.4510368    
## percent_over_70   1   0.34   0.344  0.0206 0.8866240    
## Re_adjusted_cdr   1  25.71  25.713  1.5384 0.2217410    
## stringency        1  17.76  17.763  1.0627 0.3084916    
## schools           1   1.57   1.570  0.0939 0.7607442    
## mean_neighbours   1 238.48 238.478 14.2679 0.0004932 ***
## Residuals        42 702.00  16.714                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model)
## 
## Call:
## lm(formula = covid_cdr ~ crude_death_rate + percent_over_70 + 
##     percent_over_70 + Re_adjusted_cdr + stringency + schools + 
##     mean_neighbours, data = cdr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3608  -2.2657  -0.0326   2.2015   8.1210 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       6.710945   8.347680   0.804 0.425965    
## crude_death_rate -0.024629   0.113786  -0.216 0.829688    
## percent_over_70   0.324547   0.966192   0.336 0.738616    
## Re_adjusted_cdr   0.137003   0.140883   0.972 0.336390    
## stringency       -0.106069   0.090205  -1.176 0.246270    
## schools          -0.005435   0.015113  -0.360 0.720938    
## mean_neighbours   0.814448   0.215617   3.777 0.000493 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.088 on 42 degrees of freedom
## Multiple R-squared:  0.2949, Adjusted R-squared:  0.1941 
## F-statistic: 2.927 on 6 and 42 DF,  p-value: 0.01777
data("States_COVID19")
d<-States_COVID19
d %>% group_by(State) %>% summarise(tests=max(tests, na.rm=TRUE),cases=max(confirmed,na.rm=TRUE) ,deaths=max(deaths, na.rm=TRUE), population = max(population, na.rm=TRUE)) -> d
d <-merge(measures,d)

d$tests_per<-d$tests/(d$population)
d$cases_per<-d$cases/(d$population/10000)
d$deaths_per<-d$deaths/(d$population/10000)

ggplot(d,aes(x=tests_per,y=deaths_per, label=State) ) + geom_text() + geom_smooth(method="lm") + xlab("Tests per capita") + ylab ("Total covid related deaths per 10k of the population")

Summary

  • Baseline crude mortality rates are strongly correlated with demographic and socio economic factors. In particular median household income is a effective predictor of a States CDR after holding for the percentage of the population over 70.
  • There are no statistically significant relationships between covid related crude mortality rate and the key demographic and socio economic determinants of all cause crude death rates.
  • There are no statistically significant relationships between covid related crude death rate and integrated measures of the stringency of no pharmaceutical interventions.
  • The best predictor of covid related crude death rate are the rates measured in neighbouring states.

(National Center for Health Statistics (NCHS) 2020; CDC 2020a; CDC 2020b; Force 2021, (“Population and Housing Unit Estimates” 2021), (“Weekly Counts of Deaths by State and Select Causes, 2019-2020 - Data.gov” 2021))

References and data sources

CDC. 2020a. “Conditions contributing to deaths involving coronavirus disease 2019 (COVID-19), by age group, United States.” https://data.cdc.gov/NCHS/Conditions-contributing-to-deaths-involving-corona/hk9y-quqm.

Force, CDC Case Surveillance Task. 2021. “COVID-19 Case Surveillance Public Use Data | Data | Centers for Disease Control and Prevention.” Accessed January 16. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf.

National Center for Health Statistics (NCHS). 2020. “Provisional COVID-19 Death Counts by Week Ending Date and State.” https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab.

“Population and Housing Unit Estimates.” 2021. Accessed January 17. https://www.census.gov/programs-surveys/popest.html.

“Weekly Counts of Deaths by State and Select Causes, 2019-2020 - Data.gov.” 2021. Accessed January 20. https://catalog.data.gov/dataset/weekly-counts-of-deaths-by-state-and-select-causes-2019-2020.