Exploring Average Medicare Spending Across States

Data Description

Medical spending per beneficiary data and data description was obtained from Data.Medicare.gov. The data displayed here shows average spending levels during hospitals’ Medicare Spending per Beneficiary (MSPB) episodes. An MSPB episode includes all Medicare Part A and Part B claims paid during the period from 3 days prior to a hospital admission through 30 days after discharge. These average Medicare payment amounts have been price-standardized to remove the effect of geographic payment differences and add-on payments for indirect medical education (IME) and disproportionate share hospitals (DSH).

The data included a total of 3,065 hospitals from 50 states with 7 claim types at 3 periods. The average spending per episode was given by hospital, by state and by nation. First we import the data and rename some of the origional variables names using the dplyr package.

Objective

The aim of this post is to explore whether

- The total average spending per complete episode vary across states
- The average spending per episode vary across states for agiven claim type and period

The average spending across providers by period and claim type within states are explored in another blog.

In [2]:
# Load R li
aries 
library(dplyr)
library(ggplot2)
library(stargazer)
library(maps)
In [3]:
# Call the data and rename variable names
mdClm <- read.csv(paste("../Data/Medicare Hospital Spending by Claim.csv", sep=""))
mdClm <- rename(mdClm, AvgSpndgHsptl=Avg_Spending_Per_Episode_Hospital, AvgSpndgState=Avg_Spending_Per_Episode_State, 
                AvgSpndgNation=Avg_Spending_Per_Episode_Nation, PcntSpndgHsptl=Percent_of_Spending_Hospital,
                PctSpndgState=Percent_of_Spending_State, PctSpndgNation=Percent_of_Spending_Nation)
head(mdClm, n=3)
Hospital_NameProvider_IDStatePeriodClaim_TypeAvgSpndgHsptlAvgSpndgStateAvgSpndgNationPcntSpndgHsptlPctSpndgStatePctSpndgNationStart_DateEnd_Date
1HELEN KELLER HOSPITAL 10019 AL During Index Hospital AdmissionSkilled Nursing Facility 0 0 0 0.00% 0.00% 0.00% 01/01/2015 12/31/2015
2HELEN KELLER HOSPITAL 10019 AL During Index Hospital AdmissionDurable Medical Equipment 18 31 24 0.10% 0.16% 0.12% 01/01/2015 12/31/2015
3HELEN KELLER HOSPITAL 10019 AL During Index Hospital AdmissionCarrier 1062 1480 1540 6.01% 7.71% 7.52% 01/01/2015 12/31/2015

A - Exploring total average spending per period and claim type

To explore the total average spending per period and claim type, first agregate the data using the package/function dplyr/select. Then generate a bar plot for each period and claim type combination. Regardless of states, the highest average spending was for inpatient claim during index hospital admission followed by skilled nursing facility 1 through 30 days after discharge from index hospital admission (see the bar plot below). There was no spending for the claim types Home Health Agency, Hospice, Outpatient and Skilled Nursing Facility during hospital admission, and for the hospice claim 1 to 3 days Prior to Index Hospital Admission.

In [4]:
# Aggregate the data by Period and Claim Type to explore where the most average spending is:
periodClmType <- select(mdClm, Period, Claim_Type, AvgSpndgHsptl) %>%
        filter(Claim_Type != "Total") %>%
        group_by(Period, Claim_Type) %>%
        summarize(AvgSpndng = mean(AvgSpndgHsptl , na.rm = TRUE),
                    MedSpndng = median(AvgSpndgHsptl , na.rm = TRUE))

# bar plot of average spending by Claimy Type and Period
ggplot(periodClmType, aes(fill=Period, y=MedSpndng, x=Claim_Type)) +
        geom_bar(position="dodge", stat="identity") + 
        theme(axis.text.x=element_text(size=7, angle=45, vjust=0.5, colour="black"), 
        panel.background = element_rect(fill = "white", colour = "gray"),
        panel.grid.major = element_line(colour = "lightgray"),
        legend.position = "top", aspect.ratio=4/10) + 
        xlab("Claim Type") + ylab("Average Spending")+
        guides(fill=guide_legend(nrow=3,byrow=TRUE))

B - Exploring total average spending per complete episode across state

First call distinct state and average spending data to avoid duplicate data values; i.e. one average spending per state, using dplyr/filer and distinct function or any other appropriate package/function.

In [7]:
# Select distinct state and average spending
totalStateClm <- mdClm %>%
                    filter(Claim_Type == "Total") %>%
                    distinct(State, AvgSpndgState) %>%
                    arrange(AvgSpndgState)

Then summarize the average spending data. I used the stargazer package to generate a well formatted summary statistics. As shown in the table below, the range of total average spending was less than $4,000, and the inter-quartile range is about \$1,000. The total spending across states is shown in the US map that was filled with the total average spending.

In [ ]:
# Summary statistis (Mean, Std, Median, Quartiles, Min and Max)
totalStateClmSpndng <- data.frame(AvgSpending=totalStateClm$AvgSpndgState)
stargazer(totalStateClmSpndng, type = "text", title="Summary of Medicare Average Spending per Episode State",
          digits=0, median=TRUE, iqr=TRUE, min.max=TRUE)
Summary of Medicare Average Spending per Episode State
=====================================================================
Statistic   N   Mean  St. Dev.  Min   Pctl(25) Median Pctl(75)  Max  
---------------------------------------------------------------------
AvgSpending 50 20,233   818    18,991 19,661.5 20,062  20,707  22,432
---------------------------------------------------------------------
In [31]:
# Call the data required to plot USA map (altitude, latitude, region / state)
# And merge with medicare spending data
map <- map_data("state") 
states <- read.csv(paste("../Data/states.csv", sep=""))
dataClm <- merge(totalStateClm, states, by.x="State", by.y="Abbreviation")
dataClm$state <- tolower(dataClm$state)# Convert all state chracter into lower case
In [34]:
# Plot total average spending across USA states
m <- ggplot(dataClm, aes(fill = AvgSpndgState))
m +  geom_map(aes(map_id = state), map = map) +   
        expand_limits(x = map$long, y = map$lat) + ggtitle("Total average spending for complete episode across states")+
        theme(axis.title=element_blank(), axis.text=element_blank(), axis.ticks=element_blank(),  
              legend.position="right", legend.title=element_blank()) +  coord_fixed(ratio = 1.75)

As shown in the map the total average spending was highest in Nevada, Texas and Utah, and lowest in West Virginia, New Mexico and Iowa.

In [8]:
cat("States with highest average spending")
head(totalStateClm, n=5)
cat("States with lowest average spending")
tail(totalStateClm, n=5)
States with highest average spending
StateAvgSpndgState
1WV 18991
2NM 19007
3IA 19147
4AL 19201
5WY 19202
States with lowest average spending
StateAvgSpndgState
46CA 21141
47NJ 21733
48UT 21871
49TX 22110
50NV 22432

C - Exploring total average spending across states per claim type and period

Here I generated a bar plot for each state per period and claim type. There is a variation across states for most of period and claim types.

In [16]:
# Aggregate the data by Period and Claim Type to explore where the most average spending is:
agrgtStPrdClm <- mdClm %>%
                select(State, Period, Claim_Type, AvgSpndgHsptl) %>%
                filter(Claim_Type != "Total") %>%
                group_by(State, Period, Claim_Type) %>%
                summarize(AvgSpndng = mean(AvgSpndgHsptl , na.rm = TRUE),
                MedSpndng = median(AvgSpndgHsptl , na.rm = TRUE))

ggplot(agrgtStPrdClm, aes(x=State, y=MedSpndng)) +
        geom_bar(aes(fill = Period), position = "dodge", stat="identity") +
        facet_wrap(~Claim_Type, nrow=7, scales = "free_y") +
        theme(legend.position = "top", axis.text.x=element_text(size=6, angle=45, vjust=0.5, colour="black")) +
        xlab("States") + ylab("Median Spending") +
        guides(fill=guide_legend(nrow=3,byrow=TRUE))

Summary:

This post helps to visualize average medicare spending distribution and whether there were huge differences across states per complete episode as well as per period and claim type. As expected, there were huge differences due to period and claim type, but not across states for the complete episode.