X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/d3b88b62564f18b0127b1a0abe4706144849defa..31486cfa5c1c5a6b42e9ad55d2d8c2bd908b28e7:/r_tutorials/w03-R_tutorial.html diff --git a/r_tutorials/w03-R_tutorial.html b/r_tutorials/w03-R_tutorial.html index a11adec..8618cb5 100644 --- a/r_tutorials/w03-R_tutorial.html +++ b/r_tutorials/w03-R_tutorial.html @@ -434,7 +434,7 @@ MTS 525
The concept of âcommentsâ in code is pretty intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the #
character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command:
The concept of âcommentsâ in code is potentially intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the #
character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command:
2+2
## [1] 4
# This is a comment. The next line is too:
@@ -443,7 +443,58 @@ MTS 525
Data import is crucial and can be a time-consuming step in quantitative/computational research (maybe especially in R). In the previous tutorial and problem set you needed to load a library/package in R. Many packages come with datasets pre-installed that you will use for assignments in the course and/or to try out example code. You will also need to learn how to import datasets from the web and locally from files stored on your computer. Here are examples of each.
+Letâs find the email50
dataset thatâs included in the openintro
package provided by the textbook authors. First, Iâll load the library, then I can use the data()
command to call the dataset.
library(openintro)
+## Loading required package: airports
+## Loading required package: cherryblossom
+## Loading required package: usdata
+data(email50)
+
+## Take a look at the first few rows of the email50 dataset
+head(email50)
+## # A tibble: 6 x 21
+## spam to_multiple from cc sent_email time image attach
+## <dbl> <dbl> <dbl> <int> <dbl> <dttm> <dbl> <dbl>
+## 1 0 0 1 0 1 2012-01-04 07:19:16 0 0
+## 2 0 0 1 0 0 2012-02-16 14:10:06 0 0
+## 3 1 0 1 4 0 2012-01-04 09:36:23 0 2
+## 4 0 0 1 0 0 2012-01-04 11:49:52 0 0
+## 5 0 0 1 0 0 2012-01-27 03:34:45 0 0
+## 6 0 0 1 0 0 2012-01-17 11:31:57 0 0
+## # ⦠with 13 more variables: dollar <dbl>, winner <fct>, inherit <dbl>,
+## # viagra <dbl>, password <dbl>, num_char <dbl>, line_breaks <int>,
+## # format <dbl>, re_subj <dbl>, exclaim_subj <dbl>, urgent_subj <dbl>,
+## # exclaim_mess <dbl>, number <fct>
+This gets a bit more complicated because you have to use the url()
command to tell R the address you want to use, then you will need to use a second command to actually import the dataset file. In this case, Iâm going to point to another dataset provided by the OpenIntro authors containing NOAA temperature information (more information about the dataset is available on the OpenIntro website). The format for the file is .rda
which is one of several common R dataset file format suffixes (another one is .rdata) and R youâll usually use the load()
command to import an .rda or .rdata file.
load(url("https://www.openintro.org/data/rda/climate70.rda"))
+
+## Again, check out the first few rows to see what you've got.
+head(climate70)
+## station latitude longitude dx70_1948 dx70_2018 dx90_1948 dx90_2018
+## 1 USC00203823 41.93520 -84.64110 131 147 11 16
+## 2 USC00276818 44.25800 -71.25250 80 99 1 1
+## 3 USC00186620 39.41317 -79.40025 143 150 4 1
+## 4 USC00331890 40.24030 -81.87100 156 158 18 15
+## 5 USC00235987 37.83950 -94.37400 216 175 59 51
+## 6 USC00395691 45.56550 -100.44880 138 132 39 18
+Loading from local storage is last because, ironically, it may be the least intuitive. The best practice here is to use an absolute path to point R to the unique location on your computer where the file in question is stored. In the example below, my code reflects the operating system and directory structure of my laptop. Your computer will likely (I assume/hope!) use something quite different. Nevertheless, I am providing an example because I think you may be able to work with it and it can at least provide a demonstration that we can talk about later on.
+load("/home/ads/Documents/Teaching/2020/stats/data/week_03/group_07.RData")
+
+ls() ## list objects in my global environment
+## [1] "climate70" "d" "email50"
+head(d) ## and inspect the first few rows of the new object
+## [1] -2452.018457 2.637751 3.241824 1.183585 15746.070789
+## [6] 65.013141
+