Data protection extends to every layer of the architecture that may at one time or another have access to the data. This includes the software and other digital tooling that is used to examine or otherwise interact with the data.
Researchers with time and experience cull their digital toolbox to include a vast array of languages, applications and packages to interact with data in ways that best suits their needs. Often times this can lead to a heterogenous mix of tools that is difficult to make security claims about.
For this reason, we’ve chosen to specify a Software Packaging protocol that can be verified as secure, yet accommodate the researchers preferences for the tools they are wanting to use.
How do we do this?
We’ve developed workflows for Singularity-based images that are checked into GitHub, built and tested in a Jenkins environment, and finally published to a secure and managed instance of Singularity Hub before being usable within the enclave.
GitHub is a web-based hosting service for version control using Git. It is mostly used for computer code. It offers all of the distributed version control and source code management functionality of Git as well as adding its own features.
Software that is to be used within the enclave at UNC Odum must be submitted to the Odum controlled GitHub repository designated for Singularity based enclave software. Once in GitHub, the software can be reviewed and revised until it meets security guidelines.
Singularity enables users to have full control of their containerized environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.
Singularity containers are compressed using Squashfs when built. Squashfs is a read-only, and compressed filesystem, and well suited for confident archive and re-use. The resultant image can be checksummed as a means of validating that the container image was not modified between the source and destination.
Singularity Hub is a registry for scientific linux containers. What is a Linux container? A container image is an encapsulated, portable environment that is created to distribute a scientific analysis or a general function. Containers help with reproducibility of such content as they nicely package software and data dependencies, along with libraries that are needed. The core of Singularity Hub are these Singularity container images, and by way of being on Singularity Hub they can be easily built, updated, referenced with a url for a publication, and shared by members of the Hub.
The Hub that has been deployed for the enclave at UNC Odum is privately maintained and only accessible to the enclave for image retrieval via a private network.
Jenkins offers a simple way to set up a continuous integration or continuous delivery environment for almost any combination of languages and source code repositories using pipelines, as well as automating other routine development tasks. While Jenkins doesn’t eliminate the need to create scripts for individual steps, it does provide a faster and more robust way to integrate an entire chain of build, test, and deployment tools than would otherwise be available.
The Jenkins instance deployed for the enclave at UNC Odum has access to both the public network to monitor GitHub, as well as the private network to publish images to Singularity Hub.
Putting it all together
- Code is committed to the Odum Institute in Github.
- Jenkins web hook detects the commit and builds / tests the commit code as a job.
- Jenkins reports the build / test outcome of the job back to Github.
- If the build / test of the job passes and the commit was to the specified deployment branch, the image is pushed to the Odum Singularity Hub for use in the enclave.
Once images are published to the Singularity Hub, they become available to the enclave to use for exploring and interacting with the data.