This repository has been archived by the owner on Sep 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
AoGPlots.qmd
95 lines (75 loc) · 3.21 KB
/
AoGPlots.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
title: "Creating multi-panel plots"
---
This notebook shows creating a multi-panel plot similar to Figure 2 of @Fuehner2021.
The data have been saved as an Arrow-format file.
```{julia}
#| code-fold: true
using Arrow
using AlgebraOfGraphics
using CairoMakie # for displaying static plots
using DataFrames
using Statistics
using StatsBase
CairoMakie.activate!(; type="svg") # use SVG (other options include PNG)
datadir = joinpath(@__DIR__, "data");
```
```{julia}
tbl = Arrow.Table(joinpath(datadir, "fggk21.arrow"))
```
```{julia}
typeof(tbl)
```
```{julia}
df = DataFrame(tbl)
typeof(df)
```
## Creating a summary data frame
The response to be plotted is the mean score by `Test` and `Sex` and `age`, rounded to the nearest 0.1 years.
The first task is to round the `age` to 1 digit after the decimal place, which can be done with `select` applied to a `DataFrame`.
In some ways this is the most complicated expression in creating the plot so we will break it down.
`select` is applied to `DataFrame(dat)`, which is the conversion of the `Arrow.Table`, `dat`, to a `DataFrame`.
This is necessary because an `Arrow.Table` is immutable but a `DataFrame` can be modified.
The arguments after the `DataFrame` describe how to modify the contents.
The first `:` indicates that all the existing columns should be included.
The other expression can be pairs (created with the `=>` operator) of the form `:col => function` or of the form `:col => function => :newname`.
(See the [documentation of the DataFrames package](http://juliadata.github.io/DataFrames.jl/stable/) for details.)
In this case the function is an anonymous function of the form `round.(x, digits=1)` where "dot-broadcasting" is used to apply to the entire column (see [this documentation](https://docs.julialang.org/en/v1/manual/functions/#man-vectorized) for details).
```{julia}
transform!(df, :age, :age => (x -> x .- 8.5) => :a1) # centered age (linear)
select!(groupby(df, :Test), :, :score => zscore => :zScore) # z-score
tlabels = [ # establish order and labels of tbl.Test
"Run" => "Endurance",
"Star_r" => "Coordination",
"S20_r" => "Speed",
"SLJ" => "PowerLOW",
"BPT" => "PowerUP",
];
```
The next stage is a *group-apply-combine* operation to group the rows by `Sex`, `Test` and `rnd_age` then apply `mean` to the `zScore` and also apply `length` to `zScore` to record the number in each group.
```{julia}
df2 = combine(
groupby(
select(df, :, :age => ByRow(x -> round(x; digits=1)) => :age),
[:Sex, :Test, :age],
),
:zScore => mean => :zScore,
:zScore => length => :n,
)
```
## Creating the plot
The `AlgebraOfGraphics` package applies operators to the results of functions such as `data` (specify the data table to be used), `mapping` (designate the roles of columns), and `visual` (type of visual presentation).
```{julia}
let
design = mapping(:age, :zScore; color=:Sex, col=:Test)
lines = design * linear()
means = design * visual(Scatter; markersize=5)
draw(data(df2) * means + data(df) * lines)
end
```
- TBD: Relabel factor levels (Boys, Girls; fitness components for Test)
- TBD: Relevel factors; why not levels from Tables?
- TBD: Set range (7.8 to 9.2 and tick marks (8, 8.5, 9) of axes.
- TBD: Move legend in plot?
::: {#refs}
:::