5 practical ways to improve your R project

Scottish Government R User Day

Alice Byers

Data Division, Scottish Government

6 December 2023

Disclaimer


  • Opinion based on experience

  • One size doesn’t fit all

1. RStudio Projects

Why?

  • Keeps all related files together in one folder

  • Sets working directory so file paths can be relative to project

  • Facilitates use of RStudio Git integration

How to create an RStudio project

  • In RStudio, go to File -> New Project…

  • Create a new project or add a project to an existing directory

A window in RStudio with options to create a project in a new directory, existing directory, or from a version control repository.

Opening an RStudio Project

  • Creating an RStudio Project adds an .Rproj file to your directory

    my-project/
    └── my-project.Rproj

  • Open this file to open your project in RStudio

  • Working directory automatically set to location of my-project.Rproj

  • The ‘Files’ pane in RStudio will show your project directory

Opening an RStudio Project

A screenshot of a project open in RStudio. The project name is displayed at the top-right of the screen, and the project directory is displayed in the bottom-right pane.

Folder structure

  • Where possible, store all files related to your project within your RStudio Project directory

    my-project/
        ├── code/
        ├── functions/
        ├── data/
        ├── lookups/
        ├── outputs/
        └── my-project.Rproj

Relative file paths

  • Absolute file paths are difficult to manage and are likely to break if you move your project to a different folder or ask somebody else to run your code.

    "C:/Documents/Alice's Folder/my-project/data/"

  • Use the here package to write file paths relative to your project directory.

    here::here("data")
    
    here::here("outputs", "2023")

2. Use a setup script

Why?

  • Loading all packages in one place makes it easy to see what is required to run the code

  • Defining all variables / parameters in one place reduces the number of places in the code you need to edit to make a change

Example

# Read in data for current and previous year

schools <- readr::read_csv(
   here::here("data", "2023-03-31_schools.csv")
)

library(readr)

schools_prev <- read_csv(
   here::here("data", "2022-03-31_schools.csv")
)

Example - Setup script

00_setup.R

# Load packages ----

library(here)
library(readr)
library(lubridate)

# Set parameters ----

month_end <- ymd("2023-03-31")

prev_year <- month_end - years(1)
  

Example - Sourcing setup script

01_read-data.R

# Run setup script ----

source(here::here("code", "00_setup.R"))

# Read in data for current and previous year ----

schools <- read_csv(
   here("data", paste0(month_end, "_schools.csv"))
)

schools_prev <- read_csv(
   here("data", paste0(prev_year, "_schools.csv"))
)
  

3. Code style

Why?

  • Makes code easier to write

  • Makes code easier to understand

  • Makes code easier to debug

Tidyverse style guide

Example

  • Bad practice:
do_something_very_complicated(something="that",requires=many,arguments="some of which may be long")
  • Good practice:

    do_something_very_complicated(
      something = "that",
      requires = many, 
      arguments = "some of which may be long"
    )

lintr

lintr runs style checks on existing code and reports back any issues (but doesn’t change any of your code).


code.R
do_something_very_complicated(something="that",requires=many,arguments="some of which may be long")

lintr::lint("code.R")

[infix_spaces_linter] Put spaces around all infix operators.
[commas_linter] Commas should always have a space after.
[infix_spaces_linter] Put spaces around all infix operators.
[commas_linter] Commas should always have a space after.
[infix_spaces_linter] Put spaces around all infix operators.
[line_length_linter] Lines should not be more than 80 characters.

styler

styler re-styles existing code following the tidyverse style guide

The add-in menu for styler in RStudio. The option to style selection is highlighted.

code.R
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

4. Code comments

Why?

  • Help somebody else understand the code you’ve written

  • Help future you understand the code you’ve written

  • Provide context

  • Section your code to provide structure and easier navigation

How?

  • Start a comment line with a single # followed by a space

    # My first comment

  • Help other people (and yourself in the future) understand how and why the code has been written in a particular way

    # Tried solution X, by Y worked better because of Z

What to avoid

  • Repeating what is already obvious from the code

    # Calculate percentage change
    perc_change <- x / y * 100

  • Hard coded values

    perc_change <- x / y * 100
    # 2022: 65%; 2023: 52%

  • Commenting out code to save for later or to be conditionally run

    # Uncomment and run in odd numbered years
    # perc_change <- x / y * 100

Code sections

  • A comment line followed by at least four dashes (-), equal signs (=), or hash signs (#)

    # Section 1 ---------------------------

  • Navigate and fold

An open navigation menu in RStudio giving the option to navigate to Section 1, Section 2 or Section 3 in an R script.

An R script open in RStudio where Section 1 of the code is folded. The contents of the section are not visible.

5. Functions

Why?

  • Minimises repetition

  • Simplifies code

  • Ensures consistent methods across code

  • Easier to maintain

Example

  • Calculating financial year of a date
functions/fin_year.R
fin_year <- function(date) {
  m <- lubridate::month(date)
  y <- lubridate::year(date)

  ifelse(
    m <= 3,
    paste0(y - 1, "/", y),
    paste0(y, "/", y + 1)
  )
}

Using the function


lubridate::today()
[1] "2023-12-07"


fin_year(lubridate::today())
[1] "2023/2024"

Using the function in a script

code/01_read-data.R
# Run setup script ----

source(here::here("code", "00_setup.R"))

# Source functions ----

source(here::here("functions", "fin_year.R"))

# Read in data for current and previous year ----

schools <- read_csv(
   here("data", paste0(fin_year(month_end), "_schools.csv"))
)

schools_prev <- read_csv(
   here("data", paste0(fin_year(prev_year), "_schools.csv"))
)
  

Summary

  1. RStudio Projects

  2. Use a setup script

  3. Code style

  4. Code comments

  5. Functions

Contact


Email:

GitHub: alice-hannah