-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
190 lines (153 loc) · 6.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
title: "Japan suicide cases reported by National Police Agency"
author: "Mitsuo Shiota"
date: "2020-12-19"
output:
github_document:
toc: TRUE
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
fig.width = 7,
fig.asp = 0.618,
out.width = "90%"
)
library(tidyverse)
library(lubridate)
library(readxl)
library(tsibble)
library(tqr)
library(scales)
```
Updated: `r Sys.Date()`
## Motivation
I was watching News Watch 9, an NHK nightly news program, on December 16, 2020. It told suicide cases have been rapidly surging since July 2020 in Japan by showing a chart, which I will replicate later. I felt the apparent rapid surge might be an exaggeration.
## Get data from National Police Agency
I make a csv file manually from the data in [National Police Agency site](https://www.npa.go.jp/publications/statistics/safetylife/jisatsu.html). I can get monthly suicide cases, not separated by gender but total, since January 2008.
```{r input, message=FALSE}
npa <- read_csv("data/npa.csv")
npa_data <- npa %>%
pivot_longer(!year, names_to = "month", values_to = "cases") %>%
mutate(
year = as.integer(year),
month = as.integer(month),
time = make_date(year, month, 01L)
) %>%
arrange(time)
head(npa_data)
```
## Replicate the chart
I watch the program again in [NHK plus site](https://plus.nhk.jp/), which requires registration, and replicate the chart below, though the original chart did not include November and December 2020. Look at the 2020 line, and find it exceeds the Mean (2017-2019) line since July.
Yes, the 2020 line shows a rapid surge up to October. But, why was it significantly below the Mean (2017-2019) line from April to June in the first place? I remember National Declaration of State of Emergency to fight Covid-19 was effective from April 15 to May 24. As "month" indicates the timing of discovery, not of commitment, in the National Police Agency data, I suspect some cases were not discovered in the emergency period, and were discovered later. This may explain some part of excess from July to October.
```{r replicate, warning=FALSE}
df_last_3_yr_avg <- npa_data %>%
filter(time >= "2017-01-01", time <= "2019-12-01") %>%
select(!time) %>%
pivot_wider(names_from = month, values_from = cases) %>%
summarize(across(!year, mean)) %>%
pivot_longer(everything(), names_to = "month", values_to = "last_3_yr_avg") %>%
mutate(month = as.integer(month))
df_10_12 <- npa_data %>%
filter(time >= "2020-10-01", time <= "2020-12-01") %>%
select(c(month, cases))
npa_data %>%
filter(time >= "2020-01-01", time <= "2020-10-01") %>%
select(c(month, cases)) %>%
right_join(df_last_3_yr_avg, by = "month") %>%
rename(
`2020` = cases,
`Mean (2017-2019)` = last_3_yr_avg
) %>%
pivot_longer(!month, names_to = "year", values_to = "cases") %>%
ggplot() +
geom_line(aes(month, cases, color = year)) +
geom_line(data = df_10_12, aes(month, cases), linetype = "dashed", color = "red") +
scale_color_manual(values = c("red", "blue")) +
scale_x_continuous(breaks = 1:12) +
scale_y_continuous(labels = comma) +
labs(
title = "Suicide cases",
y = NULL, color = NULL
)
```
## Suicide cases are bouncing back, but not as fast as the program suggests
As I read the book "Towards Evidence-Based Suicide Prevention: Perspectives from Economics and Political Science", co-authored by the original chart creator in the program, I know that the unemployed are more likely to commit suicide than the employed, and that unemployment rates are highly correlated with suicide cases.
As Japan has been officially in recession since October 2018, the unemployment rates hit the bottom in late 2019, and began to climb in 2020.
```{r recession}
recessions_df = tribble(
~Peak, ~Trough,
#----------|----------
# "1951-06-01", "1951-10-01",
# "1954-01-01", "1954-11-01",
# "1957-06-01", "1958-06-01",
# "1961-12-01", "1962-10-01",
# "1964-10-01", "1965-10-01",
# "1970-07-01", "1971-12-01",
# "1973-11-01", "1975-03-01",
# "1977-01-01", "1977-10-01",
# "1980-02-01", "1983-02-01",
# "1985-06-01", "1986-11-01",
# "1991-02-01", "1993-10-01",
# "1997-05-01", "1999-01-01",
# "2000-11-01", "2002-01-01",
"2008-02-01", "2009-03-01",
"2012-03-01", "2012-11-01",
"2018-10-01", "2020-05-01"
)
recessions_df <- recessions_df %>%
mutate(
Peak = as.Date(Peak),
Trough = as.Date(Trough)
)
```
```{r unemployment_chart, message=FALSE, results=FALSE}
tf <- tempfile(fileext = ".xlsx") # lt01-a10.xlsx
url <- "https://www.e-stat.go.jp/stat-search/file-download?statInfId=000031831358&fileKind=0"
httr::GET(url, httr::write_disk(tf))
unemployment_rate <- read_excel(tf, col_names = FALSE, skip = 9)
unemployment_rate <- unemployment_rate[, c(1:2, 20)]
names(unemployment_rate) <- c("year", "month", "percent")
unemployment_rate$time <- (as.Date("1953-01-01") + months(0:(nrow(unemployment_rate) - 1)))
unemployment_rate <- unemployment_rate %>%
drop_na(percent)
unemployment_rate %>%
select(c(time, percent)) %>%
filter(time >= "2008-01-01") %>%
ggplot(aes(time, percent)) +
geom_line() +
geom_rect(data = recessions_df, inherit.aes = FALSE,
aes(xmin = Peak, xmax = Trough, ymin = -Inf, ymax = +Inf),
fill='darkgray', alpha=0.5) +
labs(
title = "Unemployment rates (seasonally-adjusted)",
x = NULL
)
```
I draw a chart of both original and seasonally-adjusted cases since 2008. Suicide cases indeed hit the bottom in the beginning of 2020, and are bouncing back, even though I consider the fluctuations in 2020 should be smoothed out.
```{r simple_chart, fig.asp=0.7}
npa_ts <- npa_data %>%
drop_na(cases) %>%
select(c(time, cases)) %>%
mutate(
key = "original",
time = yearmonth(time)
) %>%
as_tsibble(key = key, index = time)
npa_sa <- npa_ts %>%
tq_sa() %>%
mutate(key = "seasonally adjusted")
npa_ts %>%
bind_rows(npa_sa) %>%
ggplot(aes(time, cases, color = key)) +
geom_line() +
scale_y_continuous(labels = comma) +
labs(
title = "Suicide cases",
x = NULL, y = NULL, color = NULL
) +
theme(legend.position = "bottom")
```
EOL