RegressionDiscontinuity.jl

Estimation and inference for sharp regression discontinuity designs.
View on GitHub Star

Walkthrough

Let us load the dataset from Lee (2018). We will reproduce analyses from Imbens and Kalyanaraman (2012).

using DataFrames, RegressionDiscontinuity, Plots
lee08 = load_rdd_data(:lee08) |> DataFrame
first(lee08, 3)
3×3 DataFrame
│ Row │ Ys      │ Ws   │ Zs      │
│     │ Float64 │ Bool │ Float64 │
├─────┼─────────┼──────┼─────────┤
│ 1   │ 0.0     │ 0    │ -1.0    │
│ 2   │ 0.0     │ 0    │ -1.0    │
│ 3   │ 0.0     │ 0    │ -1.0    │
running_var = RunningVariable(lee08.Zs, cutoff=0.0, treated=:≧);

Let us first plot the histogram of the running variable:

plot(running_var; ylim=(0,600), bins=40, background_color="#f3f6f9", size=(700,400))

Next we plot the regressogram (also known as scatterbin) of the response:

regressogram = plot(running_var, lee08.Ys; bins=40, background_color="#f3f6f9", size=(700,400), legend=:bottomright)

We observe a jump at the discontinuity, which we can estimate, e.g., with local linear regression. We use local linear regression with rectangular kernel and choose bandwidth with the Imbens-Kalyanaraman bandwidth selector:

rect_ll_rd = fit(NaiveLocalLinearRD(kernel=Rectangular(), bandwidth=ImbensKalyanaraman()),
                 running_var, lee08.Ys)
Local linear regression for regression discontinuity design
       ⋅⋅⋅⋅ Naive inference (not accounting for bias)
       ⋅⋅⋅⋅ Rectangular kernel (U[-0.5,0.5])
       ⋅⋅⋅⋅ Imbens Kalyanaraman bandwidth
       ⋅⋅⋅⋅ Eicker White Huber variance
────────────────────────────────────────────────────────────
                          h       τ̂         se         bias
────────────────────────────────────────────────────────────
Sharp RD estimand  0.462024  0.08077  0.0087317  unaccounted
────────────────────────────────────────────────────────────
plot!(regressogram, rect_ll_rd; show_local_support=true)

Let's zoom in on the support of the local kernel and also with more refined regressogram:

local_regressogram = plot(rect_ll_rd.data_subset; bins=40, background_color="#f3f6f9", size=(700,400), legend=:bottomright)
plot!(rect_ll_rd)

Finally, We could repeat all of the above analysis with another kernel, e.g. the triangular kernel.

triang_ll_rd = fit(NaiveLocalLinearRD(kernel=SymTriangularDist(), bandwidth=ImbensKalyanaraman()),
				   running_var, lee08.Ys)
Local linear regression for regression discontinuity design
       ⋅⋅⋅⋅ Naive inference (not accounting for bias)
       ⋅⋅⋅⋅ Triangular kernel
       ⋅⋅⋅⋅ Imbens Kalyanaraman bandwidth
       ⋅⋅⋅⋅ Eicker White Huber variance
───────────────────────────────────────────────────────────────
                          h         τ̂          se         bias
───────────────────────────────────────────────────────────────
Sharp RD estimand  0.293907  0.0799218  0.00834476  unaccounted
───────────────────────────────────────────────────────────────

References

Publications

  • Imbens, Guido, and Karthik Kalyanaraman. "Optimal bandwidth choice for the regression discontinuity estimator." The Review of economic studies 79.3 (2012): 933-959.

  • Lee, David S. "Randomized experiments from non-random selection in US House elections." Journal of Econometrics 142.2 (2008): 675-697.

Related Julia packages

  • GeoRDD.jl: Package for spatial regression discontinuity designs.

Related R packages