Introduction to Git and GitHub

Alice Hannah

Data Innovation Team, Data Division

20 August 2024

What is version control?

Version control is the practice of tracking and managing changes to files.

Does this look familiar?

├── stats-publication
│   ├── publication-analysis-code.R
│   ├── publication-analysis-code-v2.R
│   ├── publication-analysis-code-v2 NEW METHODOLOGY.R
│   ├── publication-analysis-code-v2 NEW METHODOLOGY AB-changes.R
│   ├── publication-analysis-code-final.R
│   ├── publication-analysis-code-final April 2023.R

Git

git logo

Git is a free and open source software for version control.

It allows you to:

  • Record any changes you make to files (these records are called ‘commits’)

  • Undo changes and revert to previous version of files (if required)

  • Collaborate with others on the same project using ‘branches’

Git Commits

git logo

Commits contain information on:

  • what change was made,
  • when the change was made,
  • why the change was made, and
  • who made the change.

Git tracks changes to the content of files, not just the file as a whole. This means the information above can be recorded for changes as small as one character on one line of code.

What is in a code repository?

A version controlled code repository will usually contain files for one project and can contain:

  • Code (e.g. R scripts)
  • Documentation (e.g. README)
  • Configuration files
  • …but NOT DATA!

Ignoring files

  • A .gitignore file tells Git which files in the code repository it shouldn’t track changes to.

  • Can include specific files, folders or file extensions

  • Use the Scottish Government template .gitignore to get started.

  • More information on using Git safely can be found in the Duck Book.

GitHub

GitHub logo

GitHub is a web interface for hosting version controlled code and can be used to:

  • Make code publicly available (although repositories can also be private)

  • Facilitate code review (using ‘pull requests’)

  • Manage projects using tools such as issue tracking

  • Navigate Git history and view previous versions of files

  • View other people’s code and collaborate

GitHub Organisation

GitHub logo

Scottish Government Analysis GitHub organisation

Screenshot of Scottish Government Analysis GitHub organisation homepage

How to use Git and GitHub

  • Git can be used without GitHub

  • GitHub is often used as the main copy of a code repository (or ‘remote’). Analysts or developers can take a copy (or ‘clone’) of the repository from GitHub to work on locally.

  • Use Git locally to track changes and regularly ‘push’ to GitHub

  • Use GitHub to facilitate code review and merging of branches

Why use Git and GitHub?

  • Preferable to lots of copies of the same file with various names!

  • Reproducible Analytical Pipelines (RAP)

    • Reproducible: You can rerun your code as it was at any point in time.

    • Auditable: You have a record of when changes were made and why.

    • Transparent: Code is publicly available on GitHub and available for others to review or reuse.

    • Good quality: Code review is built into the GitHub workflow.

Getting started on SCOTS (1)

Getting started on SCOTS (2)

External learning

External guidance

Contact

Alice Hannah
RAP Developer
Data Innovation Team, Scottish Government