class: center, middle, inverse, title-slide # Week 02 - Introduction to Causality ## How do we know if X causes Y?
### Danilo Freire ### 30th January 2019 --- <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 6px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: #EB811B; } .orange { color: #EB811B; } </style> # Today's Agenda .font150[ * Questions about RMarkdown? * What is causal inference? * Correlation vs causation * Treatment effects and potential outcomes * Randomised Controlled Trials ] --- # Brief Recap .font150[ * Last week you learned how to: - Load `.csv` files - Download datasets from the internet - Create and run `.R` scripts - Create `.Rmd` documents (different extension!) - Print a pdf with your results ] --- # RMarkdown .font150[ * Questions? ] .center[data:image/s3,"s3://crabby-images/42209/42209affaf43eac80bb80bf78f0e5f8a320a65ca" alt=":scale 60%"] --- class: inverse, center, middle # Causal Inference <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Causal Inference .font150[ * What is causal inference? ] -- .font150[ * Causal: relationship where one factor causes the other ] -- .font150[ * Inference: our ability to derive conclusion from facts and observations ] -- .font150[ * _Causal inference is an attempt to estimate a causal connection between two variables based on an observed effect_ ] --- # Causal Inference .center[data:image/s3,"s3://crabby-images/3d7a5/3d7a51f9671ed05328113dcd52c35f8cf197a247" alt=":scale 110%"] --- # Causal Inference .center[data:image/s3,"s3://crabby-images/77eeb/77eebfddceb3efceaef8e7a40592a58170b13d4c" alt=":scale 110%"] --- # Causal Inference .center[data:image/s3,"s3://crabby-images/44439/44439d0af0812b6330e4f315c1a56cfc48f2c794" alt=":scale 80%"] -- .font150[Which question is harder to answer?] --- # Causal Inference .center[data:image/s3,"s3://crabby-images/7a1e8/7a1e8c192f9ee02469a8009fd6b12b5899e9be97" alt=":scale 80%"] --- # Does X cause Y? .font150[ * Does _X_ occur at the same time as _Y_? * If _X_ goes up or down, does _Y_ also go up or down? * If _X_ is happens, does _Y_ also happen? ] -- .font150[* _Are X and Y causally related because of that?_] -- .font150[**NO**] --- # Does X cause Y? .center[data:image/s3,"s3://crabby-images/ce539/ce539dff2a2661b3026f44ffa88b2e6e544ad53b" alt=":scale 100%"] --- # Does X cause Y? .center[data:image/s3,"s3://crabby-images/f52a4/f52a46ca40fd3af1b24ab5ed59e8190c6b9e253e" alt=":scale 100%"] --- # Causal Inference - Notation .font150[ * Treatment variable = _T_ * Two potential outcomes = _Y_ when _T = 0_ and _Y_ when _T = 1_ ] -- .font150[ * Example - Treatment: getting a university degree - Potential outcome: salary with a university degree _(Y when T = 1)_ versus salary without a university degree _(Y when T = 0)_ ] --- # Causal Inference - Notation .font150[ * The causal effect of the treatment _T_ is the difference in _Y_ with and without _T_ * `\(Y(T = 1) - Y(T = 0)\)` or `\(Y_{1} - Y_{0}\)` ] .center[data:image/s3,"s3://crabby-images/fb801/fb80104b6023015f33a042368d101e1f17ca5d1a" alt=":scale 100%"] -- .font150[ * Why is that a problem? ] --- # Fundamental Problem .font150[ * _We can never observe Y where T = 1 and T = 0 at the same time_ * We only observe _one_ outcome at a time, that's why it is called the _potential_ outcomes framework ] --- # Fundamental Problem .center[data:image/s3,"s3://crabby-images/c906a/c906aabbdc1cf0098e4d0485a18c5d2efdec821f" alt=":scale 60%"] .center[[Holland, Paul (1986), p. 947](https://www.jstor.org/stable/2289064)] --- # Sample Average Treatment Effect .font150[ * Solution: we estimate the _average causal effect_ in the groups that received and did not receive the treatment * We call it the _Sample Average Treatment Effect_, or SATE ] .center[data:image/s3,"s3://crabby-images/6fbc3/6fbc3e081fd7cf816c2adddd5588185e1c93fd28" alt=":scale 100%"] .font150[ * But is that enough? ] -- .font150[ * _Not quite_ ] --- # Randomisation .font130[ * The best solution to the problem of selection bias is _randomisation_ * If the researcher randomised the treatment, she can be sure that it is unrelated to any observable characteristic * We also assume randomised treatment is uncorrelated with any _unobservable_ characteristic too * Randomised Controlled Trials (RCTs): the researcher creates treatment and control groups * If all other characteristics are equivalent on average between the two groups, _the difference between them is caused by the treatment effect_ ] --- # Randomisation .center[data:image/s3,"s3://crabby-images/74430/744307f837edabf1d8e9452486e978867f1b9b39" alt=":scale 85%"] --- # Internal versus External Validity .font150[ * RCTs have very strong internal validity, that is, there is very little chance the result is derived from causes other than the treatment * However, they may not generalise well. Why? ] -- .font150[ * Samples may not reflect the whole population of interest ] --- # Homework .font150[ * Watch these three videos (they're all short): - <https://youtu.be/vtSCZcKXw1w> - <https://youtu.be/9j_HWkrSxzI> - <https://youtu.be/crpuBZv6XtA> * Start `swirl()` CAUSALITY1 ] --- class: inverse, center, middle # See you on Wednesday! <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html>