Reproducible Analytical Pipelines

Alice Byers

Data Innovation Team, Scottish Government

11 October 2023

Is RAP relevent to me?

  • Would it be a nightmare to have to go back and rerun your process from the beginning if you found a mistake?

  • Do you have to make a lot of manual edits to code before each run?

  • Is there a lot of repetition in your code?

  • Would a new person find it difficult to understand the process?

What is RAP?

Automated statistical and analytical processes that are:

  • Reproducible

  • Auditable

  • Efficient

  • High quality

Features of RAP

In order to achieve these benefits, at a minimum a RAP must:

  • Minimise manual steps

  • Be built using open-source software; e.g. R, python

  • Be peer reviewed by colleagues

  • Be version controlled; e.g. git

  • Be open to anyone; e.g. code published on GitHub

  • Follow good practice for quality assurance

  • Contain well-commented code and have documentation embedded

RAP Strategy

Case Study - Existing Process

  • School Information Dashboards

  • 10 data sources

  • Data cleaned, linked and analysed manually in Excel

  • Dashboards created in Tableau

  • Updated twice a year; each update took three weeks of work for three statisticians - longer if errors were found

Case Study - Planning

  • Engage with SG RAP support team

  • Define aims – what will success look like?

  • Mock ups of what dashboards would look like

  • Planning how best to structure datasets

  • Work with data providers to improve process

Case Study - RAP Principles Applied


Organised folder structure


Writing manual processes as code

Standardised naming conventions


Functions

Open-source software



Version control using git

Relative file paths



Open code on GitHub

Case Study - Result

  • Faster

    • Previously took three weeks for three statisticians (twice per year)

    • Now takes at most one day for one statistician (twice per year)

  • More accurate

  • Reduced risk

  • Well documented

  • Developed skills to apply to other projects

Where to start

  • Open-source software
  • Would it be a nightmare to have to go back and rerun your process from the beginning if you found a mistake?

    • Git
  • Do you have to make a lot of manual edits to code before each run?

    • Setup script

Where to start

  • Is there a lot of repetition in your code?

    • Functions
  • Would a new person find it difficult to understand the process?

    • Documentation