-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
465 lines (336 loc) · 15.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
---
title: "Getting started with matrix kinship models in R using DemoKin"
author: "Instructor: Diego Alburez-Gutierrez (MPIDR);"
date: "The Formal Demography of Kinship: Theory and Application - Dutch Demography Day; Nov 16 2022"
output:
github_document:
pandoc_args: --webtex
toc: true
toc_depth: 1
bibliography: kinship.bib
---
<img src="DemoKin-Logo.png" align="right" width="200" />
# 1. Installation
Install the [development version from GitHub](https://github.com/IvanWilli/DemoKin) (could take ~1 minute). We made changes to the `DemoKin` package ahead of this workshop If you had already installed the package, please uninstall it and and install it again.
```{r, eval=FALSE}
# remove.packages("DemoKin")
# install.packages("devtools")
devtools::install_github("IvanWilli/DemoKin", build_vignettes = TRUE)
```
Load packages:
```{r, warning=F, message=FALSE}
library(DemoKin)
library(dplyr)
library(tidyr)
library(ggplot2)
library(fields)
```
# 2. Built-in data
The `DemoKin` package includes data from Sweden as an example.
The data comes from the [Human Mortality Database](https://www.mortality.org/) and [Human Fertility Database](https://www.humanfertility.org/).
These datasets were loaded using the`DemoKin::get_HMDHFD` function.
### 2.1. `swe_px` matrix; survival probabilities by age (DemoKin's *U* argument)
This is what the data looks like:
```{r}
data("swe_px", package="DemoKin")
swe_px[1:4, 1:4]
```
And plotted over time and age:
```{r}
image.plot(
x = as.numeric(colnames(swe_px))
, y = 0:nrow(swe_px)
, z = t(as.matrix(swe_px))
, xlab = "Year"
, ylab = "Survival probability"
)
```
### 2.2. `swe_asfr` matrix; age specific fertility rate (DemoKin's *f* argument)
This is what the data looks like:
```{r}
data("swe_asfr", package="DemoKin")
swe_asfr[15:20, 1:4]
```
And plotted over time and age:
```{r}
image.plot(
x = as.numeric(colnames(swe_asfr))
, y = 0:nrow(swe_asfr)
, z = t(as.matrix(swe_asfr))
, xlab = "Year"
, ylab = "Age-specific fertility (f)"
)
```
# 3. The function `kin()`
`DemoKin` can be used to compute the number and age distribution of Focal's relatives under a range of assumptions, including living and deceased kin. The function `DemoKin::kin()` currently does most of the heavy lifting in terms of implementing matrix kinship models.
This is what it looks like in action, in this case assuming time-invariant demographic rates:
```{r}
# First, get vectors for a given year
swe_surv_2015 <- DemoKin::swe_px[,"2015"]
swe_asfr_2015 <- DemoKin::swe_asfr[,"2015"]
# Run kinship models
swe_2015 <- kin(U = swe_surv_2015, f = swe_asfr_2015, time_invariant = TRUE)
```
## 3.1. Arguments
- **U** numeric. A vector (atomic) or matrix of survival probabilities with rows as ages (and columns as years in case of matrix).
- **f** numeric. Same as U but for fertility rates.
- **time_invariant** logical. Assume time-invariant rates. Default TRUE.
- **output_kin** character. kin types to return: "m" for mother, "d" for daughter, ...
## 3.2. Relative types
Relatives for the `output_kin` argument are identified by a unique code.
Note that the relationship codes used in `DemoKin` differ from those in Caswell [-@caswell_formal_2019].
The equivalence between the two set of codes is given in the following table:
```{r}
demokin_codes()
```
## 3.4. Value
`DemoKin::kin()` returns a list containing two data frames: `kin_full` and `kin_summary`.
```{r}
str(swe_2015)
```
### `kin_full`
This data frame contains expected kin counts by year (or cohort), age of Focal, and age of kin.
```{r}
head(swe_2015$kin_full)
```
### `kin_summary`
This is a 'summary' data frame derived from `kin_full`. To produce it, we sum over all ages of kin to produce a data frame of expected kin counts by year or cohort and age of Focal (but *not* by age of kin).
This is how the `kin_summary` object is derived:
```{r, message=F}
kin_by_age_focal <-
swe_2015$kin_full %>%
group_by(cohort, kin, age_focal) %>%
summarise(count = sum(living)) %>%
ungroup()
# Check that they are identical (for living kin only here)
kin_by_age_focal %>%
select(cohort, kin, age_focal, count) %>%
identical(
swe_2015$kin_summary %>%
select(cohort, kin, age_focal, count = count_living) %>%
arrange(cohort, kin, age_focal)
)
```
# 4. Example: kin counts in time-invariant populations
Following Caswell [-@caswell_formal_2019], we assume a female closed population in which everyone experiences the Swedish 1950 mortality and fertility rates at each age throughout their life.
We then ask:
> How can we characterize the kinship network of an average member of the population (call her 'Focal')?
For this exercise, we'll use the pre-loaded Swedish data.
```{r}
# First, get vectors for a given year
swe_surv_2015 <- DemoKin::swe_px[,"2015"]
swe_asfr_2015 <- DemoKin::swe_asfr[,"2015"]
# Run kinship models
swe_2015 <- kin(U = swe_surv_2015, f = swe_asfr_2015, time_invariant = TRUE)
```
## 4.1. 'Keyfitz' kinship diagram
We can visualize the implied kin counts for a Focal woman aged 35 yo in a time-invariant population using a network or 'Keyfitz' kinship diagram [@Keyfitz2005] with the `plot_diagram` function:
```{r, fig.height=10, fig.width=12}
swe_2015$kin_summary %>%
filter(age_focal == 35) %>%
select(kin, count = count_living) %>%
plot_diagram(rounding = 2)
```
## 4.2. Living kin
Now, let's visualize how the expected number of daughters, siblings, cousins, etc., changes over the lifecourse of Focal (now, with full names to identify each relative type using the function `DemoKin::rename_kin()`).
```{r, fig.height=6, fig.width=8}
swe_2015$kin_summary %>%
rename_kin() %>%
ggplot() +
geom_line(aes(age_focal, count_living)) +
geom_vline(xintercept = 35, color=2)+
theme_bw() +
labs(x = "Focal's age") +
facet_wrap(~kin)
```
> Note that we are working in a time invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).
How does overall family size (and family composition) vary over life for an average woman who survives to each age?
```{r}
counts <-
swe_2015$kin_summary %>%
group_by(age_focal) %>%
summarise(count = sum(count_living)) %>%
ungroup()
swe_2015$kin_summary %>%
select(age_focal, kin, count_living) %>%
rename_kin(., consolidate_column = "count_living") %>%
ggplot(aes(x = age_focal, y = count)) +
geom_area(aes(fill = kin), colour = "black") +
geom_line(data = counts, size = 2) +
labs(x = "Focal's age", y = "Number of living female relatives") +
coord_cartesian(ylim = c(0, 6)) +
theme_bw() +
theme(legend.position = "bottom")
```
## 4.3. Age distribution of living kin
How old are Focal's relatives? Using the `kin_full` data frame, we can visualize the age distribution of Focal's relatives throughout Focal's life. For example when Focal is 35, what are the ages of her relatives:
```{r, fig.height=6, fig.width=8}
swe_2015$kin_full %>%
DemoKin::rename_kin() %>%
filter(age_focal == 35) %>%
ggplot() +
geom_line(aes(age_kin, living)) +
geom_vline(xintercept = 35, color=2) +
labs(y = "Expected number of living relatives") +
theme_bw() +
facet_wrap(~kin)
```
## 4.4. Deceased kin
We have focused on living kin, but what about relatives who have died during her life?
The output of `kin` also includes information of kin deaths experienced by Focal.
We start by considering the number of kin deaths that can expect to experience at each age. In other words, the non-cumulative number of deaths in the family that Focal experiences at a given age.
```{r}
loss1 <-
swe_2015$kin_summary %>%
filter(age_focal>0) %>%
group_by(age_focal) %>%
summarise(count = sum(count_dead)) %>%
ungroup()
swe_2015$kin_summary %>%
filter(age_focal>0) %>%
group_by(age_focal, kin) %>%
summarise(count = sum(count_dead)) %>%
ungroup() %>%
rename_kin(., consolidate_column = "count") %>%
ggplot(aes(x = age_focal, y = count)) +
geom_area(aes(fill = kin), colour = "black") +
geom_line(data = loss1, size = 2) +
labs(x = "Focal's age", y = "Number of kin deaths experienced at each age") +
coord_cartesian(ylim = c(0, 0.086)) +
theme_bw() +
theme(legend.position = "bottom")
```
Now, we combine all kin types to show the cumulative burden of kin death for an average member of the population surviving to each age:
```{r}
loss2 <-
swe_2015$kin_summary %>%
group_by(age_focal) %>%
summarise(count = sum(count_cum_dead)) %>%
ungroup()
swe_2015$kin_summary %>%
group_by(age_focal, kin) %>%
summarise(count = sum(count_cum_dead)) %>%
ungroup() %>%
rename_kin(., consolidate = "count") %>%
ggplot(aes(x = age_focal, y = count)) +
geom_area(aes(fill = kin), colour = "black") +
geom_line(data = loss2, size = 2) +
labs(x = "Focal's age", y = "Number of kin deaths experienced (cumulative)") +
theme_bw() +
theme(legend.position = "bottom")
```
A member of the population aged 15, 50, and 65yo will have experienced, on average, the death of `r loss2 %>% filter(age_focal %in% c(15, 50, 65)) %>% pull(count) %>% round(1) %>% paste(., collapse = ", ")` relatives, respectively.
# 5. Exercises
**For all exercises, assume time-invariant rates at the 2010 levels in Sweden and a female-only population. All exercises can be completed using datasets included in DemoKin.**
## Exercise 1. Offspring availability and loss
Use `DemoKin` (assuming time-invariant rates at the 2010 levels in Sweden and a female-only population) to explore offspring survival and loss for mothers.
**Answer**: What is the expected number of surviving offspring for an average woman aged 65?
```{r}
# Write your code here
```
**Answer**: What is the cumulative number of offspring deaths experienced by an average woman who survives to age 65?
```{r}
# Write your code here
```
## Exercise 2. Mean age of kin
The output of `DemoKin::kin` includes information on the average age of Focal's relatives (in the columns `kin_summary$mean_age` and `kin_summary$$sd_age`). For example, this allows us to determine the mean age of Focal's sisters over the lifecourse of Focal:
```{r}
swe_2015$kin_summary %>%
filter(kin %in% c("os", "ys")) %>%
rename_kin() %>%
select(kin, age_focal, mean_age, sd_age) %>%
pivot_longer(mean_age:sd_age) %>%
ggplot(aes(x = age_focal, y = value, colour = kin)) +
geom_line() +
facet_wrap(~name, scales = "free") +
labs(y = "Mean age of sister(s)") +
theme_bw()
```
**Instructions**
Compute the the mean and SD of the age of Focal's sisters over the ages of Focal **by hand** (i.e., using the raw output in `kin_full`). Plot separately (1) for younger and older sisters, (2) and for all sisters.
First, get mean and SD of ages of sisters distinguishing between younger and older sisters:
```{r}
# Write your code here
```
Second, get ages of all sisters, irrespective of whether they are older or younger:
```{r}
# Write your code here
```
## Exercise 3. Living mother
What is the probability that Focal (an average Swedish woman) has a living mother over Focal's live?
**Instructions**
Use DemoKin to obtain $M_1(a)$, the probability of having a living mother at age $a$ in a stable population. Conditional on ego's survival, $M_1{(a)}$ can be thought of as a survival probability in a life table: it has to be equal to one when $a$ is equal to zero (the mother is alive when she gives birth), and goes monotonically to zero.
**Answer:** What is the probability that Focal has a living mother when Focal turns 70 years old?
```{r}
# Write your code here
```
## Exercise 4. Sandwich Generation
The 'Sandwich Generation' refers to persons who are squeezed between frail older parents and young dependent children and are assumed to have simultaneous care responsibilities for multiple generations, potentially limiting their ability to provide care. In demography, 'sandwichness' is a generational process that depends on the genealogical position of an individual vis-a-vis their ascendants and descendants.
For this exercise, we consider an individual to be 'sandwiched' if they have at least one child aged $15$ or younger and a parent or parent within $5$ years of death. Alburez‐Gutierrez, Mason, and Zagheni [-@alburezgutierrez_sandwich_2021] defined the probability that an average woman aged $a$ is 'sandwiched' in a stable female population as:
$$
S(a) = \underbrace{\left(1 - \prod_{x=1}^{15} [1 - m_{a-x})] \right)}_{\substack{\text{fertility risk in the}\\ \text{$15$ years preceding age 'a'}}} \times \underbrace{M_1(a)}_{\substack{\text{Prob. that mother of ego}\\ \text{is alive when ego is 'a' years old}}} \times \underbrace{\left(1- \frac{M_1(a+5)}{M_1(a)}\right)}_{\substack{\text{Prob. that mother of ego}\\ \text{would die within $5$ years}}}
$$
where
- $m_{a-x}$ is the fertility of women at age $a-x$, and
- $M_1(a)$ is the probability of having a living mother at age $a$ in a stable population.
These estimates refer to an average woman in a female population, ignoring the role of offspring mortality.
**Instructions**
Use DemoKin to compute the probability that Focal is sandwiched, $S(a)$, between ages 15 and 75. Assume time-invariant rates at the 2010 levels and a female-only population.
**Answer:** At which age is Focal at a highest risk of finding herself sandwiched between young dependent children and fragile older parents?
```{r}
# Write your code here
```
# 6. Vignette and extensions
For more details on `DemoKin`, including an extension to time varying-populations rates, deceased kin, and multi-state models, see `vignette("Reference", package = "DemoKin")`. If the vignette does not load, you may need to install the package as `devtools::install_github("IvanWilli/DemoKin", build_vignettes = TRUE)`.
For a detailed description of extensions of the matrix kinship model, see:
- time-invariant rates [@caswell_formal_2019],
- multistate models [@caswell_formal_2020],
- time-varying rates [@caswell_formal_2021], and
- two-sex models [@caswell_formal_2022].
# 7. Appendix
## 7.1. Mean age of living kin
The output of the `DemoKin::kin()` function can also be used to easily determine the mean age Focal's relatives by kin type.
For simplicity, let's focus on a Focal aged 35 yo and get the mean age (and standard deviation) of her relatives in our time-invariant population.
```{r}
ages_df <-
swe_2015$kin_summary %>%
filter(age_focal == 35) %>%
select(kin, mean_age, sd_age)
ma <-
ages_df %>%
filter(kin=="m") %>%
pull(mean_age) %>%
round(1)
sda <-
ages_df %>%
filter(kin=="m") %>%
pull(sd_age) %>%
round(1)
print(paste0("The mother of a 35-yo Focal woman in our time-invariant population is, on average, ", ma," years old, with a standard deviation of ", sda," years."))
```
## 7.2. Visualizing living kin
Finally, let´s visualize the living kin by type and mean age during Ego's life course:
```{r}
swe_2015$kin_full %>%
filter(kin %in% c("m", "gm", "d", "gd")) %>%
rename_kin() %>%
group_by(age_focal, kin) %>%
summarise(
count = sum(living)
, mean_age = sum(living * age_kin, na.rm=T)/sum(living)
) %>%
ggplot(aes(age_focal,mean_age)) +
geom_point(aes(size=count,color=kin)) +
geom_line(aes(color=kin)) +
scale_x_continuous(name = "Age Focal", breaks = seq(0,100,10), labels = seq(0,100,10))+
scale_y_continuous(name = "Age kin", breaks = seq(0,100,10), labels = seq(0,100,10))+
geom_segment(x = 0, y = 0, xend = 100, yend = 100, color = 1, linetype=2)+
labs(color="Relative",size="Living")+
coord_fixed() +
theme_bw()
```
# 8. Session info
```{r}
sessionInfo()
```
## References