X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/d3b88b62564f18b0127b1a0abe4706144849defa..90b71136ea7a8ce993de3147e196a27e5ca86b87:/r_tutorials/w03-R_tutorial.html diff --git a/r_tutorials/w03-R_tutorial.html b/r_tutorials/w03-R_tutorial.html index a11adec..8618cb5 100644 --- a/r_tutorials/w03-R_tutorial.html +++ b/r_tutorials/w03-R_tutorial.html @@ -434,7 +434,7 @@ MTS 525

Adding comments to your R code chunks

-

The concept of “comments” in code is pretty intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the # character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command:

+

The concept of “comments” in code is potentially intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the # character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command:

2+2 
## [1] 4
# This is a comment. The next line is too:
@@ -443,7 +443,58 @@ MTS 525
 

Importing datasets from libraries, the web, and locally (on your computer)

-

. You will want to import datasets.

+

Data import is crucial and can be a time-consuming step in quantitative/computational research (maybe especially in R). In the previous tutorial and problem set you needed to load a library/package in R. Many packages come with datasets pre-installed that you will use for assignments in the course and/or to try out example code. You will also need to learn how to import datasets from the web and locally from files stored on your computer. Here are examples of each.

+
+

Loading a dataset from an R package

+

Let’s find the email50 dataset that’s included in the openintro package provided by the textbook authors. First, I’ll load the library, then I can use the data() command to call the dataset.

+
library(openintro)
+
## Loading required package: airports
+
## Loading required package: cherryblossom
+
## Loading required package: usdata
+
data(email50)
+
+## Take a look at the first few rows of the email50 dataset
+head(email50)
+
## # A tibble: 6 x 21
+##    spam to_multiple  from    cc sent_email time                image attach
+##   <dbl>       <dbl> <dbl> <int>      <dbl> <dttm>              <dbl>  <dbl>
+## 1     0           0     1     0          1 2012-01-04 07:19:16     0      0
+## 2     0           0     1     0          0 2012-02-16 14:10:06     0      0
+## 3     1           0     1     4          0 2012-01-04 09:36:23     0      2
+## 4     0           0     1     0          0 2012-01-04 11:49:52     0      0
+## 5     0           0     1     0          0 2012-01-27 03:34:45     0      0
+## 6     0           0     1     0          0 2012-01-17 11:31:57     0      0
+## # … with 13 more variables: dollar <dbl>, winner <fct>, inherit <dbl>,
+## #   viagra <dbl>, password <dbl>, num_char <dbl>, line_breaks <int>,
+## #   format <dbl>, re_subj <dbl>, exclaim_subj <dbl>, urgent_subj <dbl>,
+## #   exclaim_mess <dbl>, number <fct>
+
+
+

Loading a dataset from the web

+

This gets a bit more complicated because you have to use the url() command to tell R the address you want to use, then you will need to use a second command to actually import the dataset file. In this case, I’m going to point to another dataset provided by the OpenIntro authors containing NOAA temperature information (more information about the dataset is available on the OpenIntro website). The format for the file is .rda which is one of several common R dataset file format suffixes (another one is .rdata) and R you’ll usually use the load() command to import an .rda or .rdata file.

+
load(url("https://www.openintro.org/data/rda/climate70.rda"))
+
+## Again, check out the first few rows to see what you've got.
+head(climate70)
+
##       station latitude  longitude dx70_1948 dx70_2018 dx90_1948 dx90_2018
+## 1 USC00203823 41.93520  -84.64110       131       147        11        16
+## 2 USC00276818 44.25800  -71.25250        80        99         1         1
+## 3 USC00186620 39.41317  -79.40025       143       150         4         1
+## 4 USC00331890 40.24030  -81.87100       156       158        18        15
+## 5 USC00235987 37.83950  -94.37400       216       175        59        51
+## 6 USC00395691 45.56550 -100.44880       138       132        39        18
+
+
+

Loading a dataset stored locally

+

Loading from local storage is last because, ironically, it may be the least intuitive. The best practice here is to use an absolute path to point R to the unique location on your computer where the file in question is stored. In the example below, my code reflects the operating system and directory structure of my laptop. Your computer will likely (I assume/hope!) use something quite different. Nevertheless, I am providing an example because I think you may be able to work with it and it can at least provide a demonstration that we can talk about later on.

+
load("/home/ads/Documents/Teaching/2020/stats/data/week_03/group_07.RData")
+
+ls() ## list objects in my global environment
+
## [1] "climate70" "d"         "email50"
+
head(d) ## and inspect the first few rows of the new object
+
## [1] -2452.018457     2.637751     3.241824     1.183585 15746.070789
+## [6]    65.013141
+