From: aaronshaw Date: Tue, 22 Sep 2020 17:21:12 +0000 (-0500) Subject: updates to loading/importing data X-Git-Url: https://code.communitydata.science/stats_class_2020.git/commitdiff_plain/1ea333b3aaa997020095c254f9154be0f7483c39?ds=inline;hp=5d5ce043b439baf34131a22ca0ea049551b7e255 updates to loading/importing data --- diff --git a/r_tutorials/w03-R_tutorial.Rmd b/r_tutorials/w03-R_tutorial.Rmd index 6e04b8d..f77505b 100644 --- a/r_tutorials/w03-R_tutorial.Rmd +++ b/r_tutorials/w03-R_tutorial.Rmd @@ -58,7 +58,7 @@ Whatever this says is where R thinks it's "doing stuff" on your machine, so, in ### Adding comments to your R code chunks -The concept of "comments" in code is pretty intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the `#` character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command: +The concept of "comments" in code is potentially intuitive. A comment is just some text that the programming language interpreter ignores and repeats. Comments are generally inserted in code to make it easier for people to read in various ways. R interprets the `#` character and anything that comes after it as a comment. R will not try to interpret whatever comes next as a command: ```{r} 2+2 @@ -71,7 +71,45 @@ Comments are often less common in the context of R Markdown scripts and notebook ### Importing datasets from libraries, the web, and locally (on your computer) -. You will want to import datasets. +Data import is crucial and can be a time-consuming step in quantitative/computational research (maybe especially in R). In the previous tutorial and problem set you needed to load a library/package in R. Many packages come with datasets pre-installed that you will use for assignments in the course and/or to try out example code. You will also need to learn how to import datasets from the web and locally from files stored on your computer. Here are examples of each. + +#### Loading a dataset from an R package + +Let's find the `email50` dataset that's included in the `openintro` package provided by the textbook authors. First, I'll load the library, then I can use the `data()` command to call the dataset. + +```{r} +library(openintro) +data(email50) + +## Take a look at the first few rows of the email50 dataset +head(email50) + +``` + +#### Loading a dataset from the web + +This gets a bit more complicated because you have to use the `url()` command to tell R the address you want to use, then you will need to use a second command to actually import the dataset file. In this case, I'm going to point to another dataset provided by the OpenIntro authors containing NOAA temperature information ([more information about the dataset is available on the OpenIntro website](https://www.openintro.org/data/index.php?data=climate70)). The format for the file is `.rda` which is one of several common R dataset file format suffixes (another one is .rdata) and R you'll usually use the `load()` command to import an .rda or .rdata file. + +```{r} +load(url("https://www.openintro.org/data/rda/climate70.rda")) + +## Again, check out the first few rows to see what you've got. +head(climate70) +``` + +#### Loading a dataset stored locally + +Loading from local storage is last because, ironically, it may be the least intuitive. The best practice here is to use an [absolute path](https://en.wikipedia.org/wiki/Path_%28computing%29) to point R to the unique location on your computer where the file in question is stored. In the example below, my code reflects the operating system and directory structure of my laptop. Your computer will likely (I assume/hope!) use something quite different. Nevertheless, I am providing an example because I think you may be able to work with it and it can at least provide a demonstration that we can talk about later on. + +```{r} + +load("/home/ads/Documents/Teaching/2020/stats/data/week_03/group_07.RData") + +ls() ## list objects in my global environment + +head(d) ## and inspect the first few rows of the new object + +``` ## More (complicated) variable types