-
Notifications
You must be signed in to change notification settings - Fork 9
/
EX1_Rintro.Rmd
231 lines (156 loc) · 6.1 KB
/
EX1_Rintro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: "DEM 7273 - Example 1 - Introduction to R"
author: "Corey Sparks, PhD"
date: "August 23, 2017"
output:
html_document:
keep_md: no
html_notebook:
toc: yes
---
#Welcome to R.
R is an interpreted languages, not a compiled one. This means, you type something into R and it does it. There is no data step. There are no procs. The SAS and R book is very useful for going between the two programs.
R uses libraries to do different types of analysis, so we will need to install lots of different libraries to do different things. These need to be downloaded from the internet, using the `install.packages()` command. You only need to install a package once. E.g.
`install.packages("car")`
will install the lme4 library. To use the functions within it, type
`library(car)`
Now you have access to those functions.
I strongly recommend you install several packages prior to us beginning. I've written a short script on Github you can use it by running:
```{r, eval=FALSE}
source("https://raw.githubusercontent.com/coreysparks/Rcode/master/install_first_7223.R")
```
Below we will go through a simple R session where we introduce some concepts that are important for R.
###R is a calculator
```{r}
#addition and subtraction
3+7
3-7
#multiplication and division
3*7
3/7
#powers
3^2
3^3
#functions
log(3/7)
exp(3/7)
sin(3/7)
```
###Variables and objects
In R we assign values to objects (object-oriented programming). These can generally have any name, but some names are reserved for R. For instance you probably wouldn't want to call something 'mean' because there's a 'mean()' function already in R. For instance:
```{r}
x<-3
y<-7
x+y
x*y
log(x*y)
```
##vectors
R thinks everything is a matrix, or a vector, meaning a row or column of numbers, or characters. One of R's big selling points is that much of it is completely vectorized. Meaning, I can apply an operation along all elements of a vector without having to write a loop. For example, if I want to multiply a vector of numbers by a constant, in SAS, I could do:
#for (i in 1 to 5)
# x[i]<-y[i]*5
#end
but in R, I can just do:
```{r}
x<-c(3, 4, 5, 6, 7)
#c() makes a vector
y<-7
x*y
```
R is also very good about using vectors, let's say I wanted to find the third element of x:
```{r}
x[3]
#or if I want to test if this element is 10
x[3]==10
x[3]!=10
#of is it larger than another number:
x[3]>3
#or is any element of the whole vector greater than 3
x>3
```
If you want to see what's in an object, use `str()`, for `str`ucture
```{r}
str(x)
```
and we see that x is numeric, and has those values.
We can also see different characteristics of x
```{r}
#how long is x?
length(x)
#is x numeric?
is.numeric(x)
is.character(x)
#is any element of x missing?
is.na(x)
xc<-c("1","2")
#now i'll modify x
x<-c(x, NA) #combine x and a missing value ==NA
x
is.na(x)
```
##replacing elements of vectors
Above, we had a missing value in X, let's say we want to replace it with another value:
```{r}
x<-ifelse(test = is.na(x)==T, yes = sqrt(7.2), no = x)
x
```
Done!
##Dataframes
Traditionally, R organizes variables into data frames, these are like a spreadsheet. The columns can have names, and the dataframe itself can have data of different types. Here we make a short data frame with three columns, two numeric and one character:
```{r}
mydat<-data.frame(
x=c(1,2,3,4,5),
y=c(10, 20, 35, 57, 37),
group=c("A", "A" ,"A", "B", "B")
)
#See the size of the dataframe
dim(mydat)
length(mydat$x)
#Open the dataframe in a viewer and just print it
View(mydat)
print(mydat)
```
##Real data
Now let's open a 'real' data file. This is the [2008 World population data sheet](http://www.prb.org/Publications/Datasheets/2008/2008wpds.aspx) from the [Population Reference Bureau](http://www.prb.org). It contains summary information on many demographic and population level characteristics of nations around the world in 2008.
I've had this entered into a **Comma Separated Values** file by some poor previous GRA of mine and it lives happily on Github now for all the world to see. CSV files are a good way to store data coming out of a spreadsheet. R can read Excel files, but it digests text files easier. Save something from Excel as CSV.
I can read it from github directly by using a function in the `readr` library:
```{r, echo=F, results='hide', message=F, tidy=TRUE}
library(readr)
prb<-read_csv(file = "https://raw.githubusercontent.com/coreysparks/data/master/PRB2008_All.csv")
#names(prb) #print the column names
#View(prb) #open it in a viewer
```
That's handy. If the file lived on our computer, I could read it in like so:
*note, please make a folder on your computer so you can store things for this class in a single location!!!! Organization is Key to Success in Graduate School*
```{r}
#prb<-read_csv("C:/Users/ozd504/Google Drive/classes/dem7273/class_17/data/PRB2008_All.csv")
```
Same result.
The `haven` library can read files from other statistical packages easily, so if you have data in Stata, SAS or SPSS (barf!), you can read it into R using those functions, for example, the `read_dta()` function reads stata files.
Don't know what a function's called use ??
`??stata`
`??csv`
and Rstudio will show you a list of functions that have these strings in them.
What if you know the function name, like `read_csv()` but you want to see all the function arguments?
`?read_csv`
will open up the help file for that specific function
###Save a file
Want to save something as a R data file? Use `save()`
```{r}
#save(prb, file="C:/Users/ozd504/Google Drive/classes/dem7273/class_17/data/prb_2008.Rdata")
```
If you have an R data file, use `load()` to open it:
```{r}
#load("C:/Users/ozd504/Google Drive/classes/dem7273/class_17/data/prb_2008.Rdata")
```
Let's have a look at some descriptive information about the data:
```{r}
#Frequency Table of # of Contries by Continent
table(prb$Continent)
#basic summary statistics for the variable TFR or the total fertility rate
summary(prb$TFR)
prb$Country[is.na(prb$e0Total)]
subset(prb, is.na(prb$e0Total))
```
There is one country missing the Total fertility rate variable. The minimum is 1 and the maximum is 7.1 children per woman.
More next week!!