<- c("Magne") head_honcho
Guidelines for R at IPS
This is a short document that gives some suggestions for packages and functions to use for various task in R. It came about on requests from staff at the Department of Psychology (IPS). It is merely a guideline for those who want input or tips for which tools to use for the job. With R there will always be a practically unlimited set of options for how to do anything. Feel free to use whatever speaks to your soul.
Note that this list is based on the text book used in the master’s course in statistics, 3100 - Quantiative methods, (Mehmetoglu & Mittner, 2022). Hence, these are the tools that the students will be taught and should be familiar with.
Background
IPS is moving away from the proprietary software packages SPSS and Stata and towards JASP and R. JASP is a SPSS-like graphical user interface built atop R and should be familiar to those who worked in SPSS. R is a programming language geared towards statistics. This is in line with NTNU’s policy of developing and using open source software, and could potentially save NTNU billions of Norwegian øre in licencing costs.
The guidelines
Naming convention in these guidelines: Package name in bold, function name in backticks
. Alternatively, package_name::function_name()
to indicate both the function and the package it comes from. If no package is specified, the function comes from one of the standard packages loaded when you start R1.
Working with R
We suggest working with an updated version of R, and using RStudio. It is smart to organise your files and workflow in RStudio’s projects. This makes it easier to keep track of files and scripts.
We strongly recommend turning off automatic restoration of workspace at startup. See this blog post for more details. This will save you a lot of headaches down the line.
The text book uses the magrittr pipe %>%
which was common at the time of publication. Since then, the base R pipe |>
has taken over in popular use. Students will be exposed to both. They are mostly interchangeable for the use cases in the course.
We use <-
for assignment instead of =
. Ie:
We recommend working script-based as much as possible. This way, you and any collaborators (like students) start out with identical data sets, and do all your work on it via scripts that are run for each session. This ensures that the collaborators can easily recreate the same state of the data to check on work others did. The alternative, where each set of collaborators start storing different versions of the dataset with different variables can quickly create a mess and make it difficult to know exactly what was done to the dataset. Writing a data setup script that loads the data and wrangles the dataset (recoding variables, dropping specific cases, etc.) is helpful in this regard. Then you can keep a separate script for analysis.
When it comes to how to write code, we recommend working with the Tidyverse style guide. This makes code more readable and thus more understandable.
Data import
- haven for importing files from Stata, SPSS and SAS
- readxl for importing Excel files
readr::read_csv()
for importing csv files. Thoughread.csv()
from base R also works fine.
Datasets should be stored as either tibble()
or data.frame()
.
Data wrangling
Both base R and tidyverse approaches are used. For instance, choosing specific rows or columns of a data frame can be done either in base R with brackets []
or with select()
and filter()
from dplyr:
# load dataset for demonstration
library(palmerpenguins)
# Select columns
"bill_length_mm"]
penguins[|> dplyr::select(bill_length_mm)
penguins
# Select rows
$bill_length_mm > 55, ]
penguins[penguins|> dplyr::filter(bill_length_mm > 55) penguins
- For creating new variables:
mutate()
from dplyr. - For working on categorical variables: forcats
Summary statistics
- summarytools: creates summaries of datasets
- modeest: estimates the mode
- psych: contains helpful functions for correlation
- moments: functions that calculate skewness and kurtosis
dplyr::summarise()
: creates summary tables- stargazer: for creating tables
Graphs
The general recommendation is to use ggplot2 for most things.
Data analysis
Linear regression: lm()
is used for fitting linear regression models. It comes from stats, one of the base R packages.
lm(bill_length_mm ~ sex + body_mass_g,
penguins)
lm.beta::lm.beta()
: for standardised regression coefficients. Alternatively, normalise all variables withscale()
and then run normallm()
- multicomp: test linear combination hypotheses using
glht()
- car: test joint significance of coefficients
- relaimp: computes semi-partial correlations using
calc.relimp()
performance::check_model()
: gives graphs to illustrate the performance of a model- fastDummies: creates dummy variables
- sandwich: propvides robust standard errors
- interactions: has functions for anlaysing interactions, like
sim_slopes()
Logistic regression: We fit a model with glm()
, specifying that glm()
’s family
is binomial
, and that binomial()
’s link
is "logit"
. In other words:
glm(formula = sex ~ bill_length_mm + body_mass_g,
family = binomial(link = "logit"),
data = penguins)
Since link
defaults to "logit"
, this can also be written out like this (if we drop argument names)
glm(sex ~ bill_length_mm + body_mass_g,
binomial(),
penguins)
- lmtest: implements the likelihood ratio test with
lrtest()
DescTools::PseudoR2()
: returns a variety of Pseudo \(R^2\)s- visreg: visualise regression omodels
Exploratory factor analysis. The psych package contains functions for doing factor analysis
fa.parallel()
: for parallel analysisfa()
: for factor analysisprincipal()
: for principal component analysis (PCA)
Structural equation modelling: The lavaan package is used for SEM.
- functions from lavaan
cfa()
: for CFAsem()
: for SEMmodindices()
: for modification indicesstandardizedSolution()
: for standardised estimates
astatur::relicoef()
: for reliability coefficients. Note that astatur has to be downloaded from GitHub, not CRAN. Rundevtools::install_github("ihrke/astatur")
.
References
Footnotes
These are colloquially called base R, though only one of the packages is actually called base↩︎