In this lab, you will begin to get oriented with R and work with some data.

How to complete this assignment.

  • Attempt each exercise in order.

  • In each code chunk, if you see “# INSERT CODE HERE”, then you are expected to add some code to create the intended output (Make sure to erase “# INSERT CODE HERE” and place your code in its place).

  • If my instructions say to “Run the code below…” then you do not need to add any code to the chunk.

  • Many exercises may require you to type some text below the code chunk, interpreting the output and answering the questions.

  • Please follow the Davidson Honor Code and rules from the course syllabus regarding seeking help with this assignment.

How to submit this assignment.

  • When you are finished, click the “Knit” button at the top of this panel. If there are no errors, an word file should pop up after a few seconds.

  • Take a look at the resulting word file that pops up. Make sure everything looks correct, your name is listed at the top, and that there is no ‘junk’ code or output.

  • Save the word file (to your local computer, and/or to a cloud location) as: Lab 1 “Insert Your Name”.

  • Use this link to upload your word file to my Google Drive folder. Do not upload the original .Rmd version.

  • This assignment is due Thursday, June 2, 2022, no later than 9:30 am Eastern. Points will be deducted for late submissions.

  • TIP: Start early so that you can troubleshoot any issues with knitting to word.

Grading Rubric

There are 6 possible points on this assignment.

Baseline (C level work)

  • Your .Rmd file knits to word without errors.
  • You answer questions correctly but do not use complete sentences.
  • There are typos and ‘junk code’ throughout the document.
  • You do not put much thought or effort into the Reflection answers.

Average (B level work)

  • You use complete sentences to answer questions.
  • You attempt every exercise/question.

Advanced (A level work)

  • Your code is simple and concise.
  • Unnecessary messages from R are hidden from being displayed in the word.
  • Your document is typo-free.
  • At the discretion of the instructor, you give exceptionally thoughtful or insightful responses.

Exercise 1. (3 points)

This exercise relates to the College data set, which can be found in the ISLR2 package (i.e., ISLR2::College). It contains a number of variables for 777 different universities and colleges in the US.

  1. Load the appropriate packages, read in the data, and view the data (Note: Comment out the View() command in your answer below).

  2. Use the summary() function to produce a numerical summary of the variables in the data set. Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data. Use the ggplot() function to produce side-by-side boxplots of Outstate versus Private.

  3. Create a new qualitative variable, called Elite. The new variable Elite needs to be a factor variable that equals “Yes” if Top10perc is greater than 50 and “No” otherwise. Next, use the summary() command to examine how many elite universities there are. Next, produce side-by-side boxplots of Outstate versus Elite. Lastly, create histograms of the following variables: Apps, Top10perc, Outstate, Books, with differing bin numbers. Additionally, produce all four histograms in the same plot (Hint: Utilize grid.arrange, plot_grid, or something similar if using ggplot.)

#Insert Code Here


Exercise 2. (3 points)

This exercise relates to the Boston data set, which can be found in the ISLR2 packaged (i.e., ISLR2::Boston).

  1. Load the Boston data set. How many rows are in this data set? How many columns? What do the rows and columns represent (Hint: ?Boston)?

  2. Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings.

  3. Are any of the predictors associated with per capita crime rate? If so, explain the relationship (Hint: Utilize cor()).

  4. How many of the census tracts in this data set bound the Charles river?

  5. What is the median pupil-teacher ratio among the towns in this data set?

  6. In this data set, how many of the census tracts average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the census tracts that average more than eight rooms per dwelling.

#Insert Code Here