Reproducible Analytical Pipelines in the Scottish Government

Alice Byers

Data Innovation Team, Scottish Government

What is RAP?

Automated statistical and analytical processes that are:

  • Reproducible

  • Auditable

  • Efficient

  • High quality

Features of RAP

At minimum, a RAP must:

  • Minimise manual steps

  • Be built using open-source software; e.g. R, python

  • Be peer reviewed by colleagues

  • Be version controlled; e.g. git

  • Be open to anyone; e.g. code published on GitHub

  • Follow good practice for quality assurance

  • Contain well-commented code and have documentation embedded

RAP Strategy

  • Developed by Government Analysis Function to help analysts and analyst leaders understand why RAP is important and how to deliver it.

  • Vision: RAP is the default approach to analysis

  • Three goals

    • Tools

    • Capability

    • Culture

RAP Support

  • Data Innovation Team, Data Division

  • We can provide the following support:

    • Review existing process and plan RAP project
    • Best practice for project structure and code style
    • Code review and general feedback
    • Training to use Git and GitHub
    • Advanced topics such as R package development, data visualisation and interactive dashboard development
  • Contact us

Case Study - Existing Process

  • School Information Dashboards

  • 10 data sources

  • Data cleaned, linked and analysed manually in Excel

  • Dashboards created in Tableau

  • Updated twice a year; each update took three weeks of work for three statisticians - longer if errors were found

Case Study - Planning

  • Engage with SG RAP support team

  • Define aims – what will success look like?

  • Mock ups of what dashboards would look like

  • Planning how best to structure datasets

  • Work with data providers to improve process

Case Study - RAP Principles Applied


Organised folder structure


Writing manual processes as code

Standardised naming conventions


Functions

Open-source software



Version control using git

Relative file paths



Open code on GitHub

Case Study - Result

  • Faster

    • Previously took three weeks for three statisticians (twice per year)

    • Now takes at most half a day for one statistician (twice per year)

  • More accurate

  • Reduced risk

  • Well documented

  • Developed skills to apply to other projects