RAP in Practice

About me

Reproducible Analytical Pipeline (RAP) developer
Background in statistics and data analysis

Is RAP relevent to me and my team?

Would it be a nightmare to have to go back and rerun your process from the beginning if you found a mistake?
Do you have to make a lot of manual edits to code before each run?
Is there a lot of repetition in your code?
Would a new person find it difficult to understand the process?

Features of RAP

In order to achieve the full benefits, at a minimum a RAP must:

Minimise manual steps
Be built using open-source software; e.g. R, python
Be peer reviewed by colleagues
Be version controlled; e.g. git
Be open to anyone; e.g. code published on GitHub
Follow good practice for quality assurance
Contain well-commented code and have documentation embedded

Organised folder structure

Keep everything you need to run your code within one repository
Plan ahead for how you’re going to organise future data submissions, outputs, etc.

Organised folder structure

A good place to start:

    my-project/
    ├── code/
    │   ├── 00_setup.R
    │   └── 01_clean_data.R
    ├── functions/
    ├── data/
    ├── lookups/
    ├── outputs/
    │   ├── 2022/
    │   └── 2023/
    └── README.md

Standardised naming conventions

Underscores and dashes instead of spaces
All lower case
Date stamp data and output files
Number R scripts
Document the agreed naming convention

Standardised naming conventions

Examples:

2023-11-16_attendance.rds
2022_school-report.html
01_prepare-data.R

Relative file paths

Avoid hard-coding file paths
Use RStudio Projects
Use the here package to define file paths relative to your project folder

Writing manual processes as code

Creating new folders
Updating parameters; e.g. dates, geographies
Creating outputs; e.g. data visualisation, reports, spreadsheets

Functions

Don’t repeat yourself
Use function arguments to re-use the function for ‘similar’ actions
Keep as simple as possible

Documentation

Code comments to provide context
Include a README file
- Description of the process
- Requirements and dependencies
- Guidance to run the process
- Contact details

Version control

Alternative to saving multiple copies of files to keep version history
When a change is made to a file, create a Git ‘commit’ to record:
- what change was made,
- when the change was made,
- why the change was made, and
- who made the change.

Open code

Host a Git repository on GitHub and make it public
Increase trust by making analysis transparent
Facilitate peer review
Make it easier for others to reuse code

Summary

Organised folder structure

Functions

Standardised naming conventions

Documentation

Relative file paths

Version control

Writing manual processes as code

Open code

Where to start

Open-source software

Would it be a nightmare to have to go back and rerun your process from the beginning if you found a mistake?
- Git

Do you have to make a lot of manual edits to code before each run?
- Set parameters in a setup script

Where to start

Is there a lot of repetition in your code?
- Functions

Would a new person find it difficult to understand the process?
- Documentation

Links & Contact

Government Analysis Function RAP Resources
Duck book
Civil Service RAP Strategy and Scottish Government Implementation Plan
Blog: How we saved 3 analysts 6 weeks of copying and pasting
Email me – I’m always happy to talk about RAP!
- alice.hannah@gov.scot