In this lab, you will begin to get oriented with R and work with some data.
Attempt each exercise in order.
In each code chunk, if you see “# INSERT CODE HERE”, then you are expected to add some code to create the intended output (Make sure to erase “# INSERT CODE HERE” and place your code in its place).
If my instructions say to “Run the code below…” then you do not need to add any code to the chunk.
Many exercises may require you to type some text below the code chunk, interpreting the output and answering the questions.
Please follow the Davidson Honor Code and rules from the course syllabus regarding seeking help with this assignment.
When you are finished, click the “Knit” button at the top of this panel. If there are no errors, an word file should pop up after a few seconds.
Take a look at the resulting word file that pops up. Make sure everything looks correct, your name is listed at the top, and that there is no ‘junk’ code or output.
Save the word file (to your local computer, and/or to a cloud location) as: Lab 1 “Insert Your Name”.
Use this link to upload your word file to my Google Drive folder. Do not upload the original .Rmd version.
This assignment is due Thursday, June 2, 2022, no later than 9:30 am Eastern. Points will be deducted for late submissions.
TIP: Start early so that you can troubleshoot any issues with knitting to word.
There are 6 possible points on this assignment.
Baseline (C level work)
Average (B level work)
Advanced (A level work)
This exercise relates to the College
data set, which can
be found in the ISLR2
package (i.e.,
ISLR2::College
). It contains a number of variables for 777
different universities and colleges in the US.
Load the appropriate packages, read in the data, and view the
data (Note: Comment out the View()
command in your answer
below).
Use the summary()
function to produce a numerical
summary of the variables in the data set. Use the pairs()
function to produce a scatterplot matrix of the first ten columns or
variables of the data. Use the ggplot()
function to produce
side-by-side boxplots of Outstate
versus
Private
.
Create a new qualitative variable, called Elite
. The
new variable Elite needs to be a factor variable that equals “Yes” if
Top10perc
is greater than 50 and “No” otherwise. Next, use
the summary()
command to examine how many elite
universities there are. Next, produce side-by-side boxplots of
Outstate
versus Elite
. Lastly, create
histograms of the following variables: Apps
,
Top10perc
, Outstate
, Books
, with
differing bin numbers. Additionally, produce all four histograms in the
same plot (Hint: Utilize grid.arrange
,
plot_grid
, or something similar if using
ggplot
.)
#Insert Code Here
ANSWER:
This exercise relates to the Boston
data set, which can
be found in the ISLR2
packaged (i.e.,
ISLR2::Boston
).
Load the Boston
data set. How many rows are in this
data set? How many columns? What do the rows and columns represent
(Hint: ?Boston)?
Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings.
Are any of the predictors associated with per capita crime rate? If so, explain the relationship (Hint: Utilize cor()).
How many of the census tracts in this data set bound the Charles river?
What is the median pupil-teacher ratio among the towns in this data set?
In this data set, how many of the census tracts average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the census tracts that average more than eight rooms per dwelling.
#Insert Code Here
ANSWER: