SOPP 1% subset for IL 2012-2017

[stats_class_2020.git] / r_tutorials / w03a-R_tutorial.html
diff --git a/r_tutorials/w03a-R_tutorial.html b/r_tutorials/w03a-R_tutorial.html

index 1608264cd3c02ff7a340a05974793a9bc702464d..9ee7dd2fbf7cde57873cffbd59008977c4a4fb82 100644 (file)
--- a/r_tutorials/w03a-R_tutorial.html
+++ b/r_tutorials/w03a-R_tutorial.html
@@ -1621,6 +1621,18 @@ my.mean &lt;- function(z) {
      return(out.value)
  }</code></pre>
  </div>
      return(out.value)
  }</code></pre>
  </div>
+<div id="specifying-variable-classes-with-data-import" class="section level1">
+<h1><span class="header-section-number">4</span> Specifying variable classes with data import</h1>
+<p>Aaron C. asked a question about whether/how you might specify variable classes when you’re importing data. Aaron S. punted at the time, so here’s a slightly more specific reply.</p>
+<p>The short answer is, “yes, R can do this.” The details depend on exactly which function you use to import the data in question (and that depends partly on the file format…etc.).</p>
+<p>The most helpful place to look for more information is the help documentation for whatever import function you might be working with. For example, the <code>read.csv()</code> function that gets introduced in the next R tutorial takes an optional argument for colClasses that allows you to specify a vector of classes (e.g., <code>c(&quot;character&quot;, &quot;factor&quot;, &quot;integer&quot;, &quot;character&quot;)</code>) corresponding to the classes you want R to assume for each incoming column of the data.</p>
+<p>Try reading <code>help(read.csv)</code> and look at the documentation for the <code>colClasses</code> argument to learn more.</p>
+<div id="r-guesses-the-classes-of-variables-when-you-import-them" class="section level2">
+<h2><span class="header-section-number">4.1</span> R guesses the classes of variables when you import them</h2>
+<p>Aaron and Nick both made comments about R guessing the classes of variables when you import data. The nature and quality of these guesses can depend on the import function there too.</p>
+<p>Most Base R import stuff makes guesses you might think of as somewhat brittle (assumptions (e.g., looking at just the first five values to inform the guess. In contrast, the Tidyverse data import commands usually use a larger and more random sample of values from each column to make guesses (which are therefore much better).</p>
+</div>
+</div>