The bulk of the data access protocol involves converting the .accdb
salvagae database file on the remote ftp to a local set of .csv
files named by the tables in the database. We accomplish this in two lines of code by pulling and then running a stable Docker
software container that contains a set of bash
scripts designed specifically for this task. The specific image used for data access is called accessor
, is freely available on Docker Hub, and has default setting configured for the salvage database. Code for the construction of the accessor
image is available in its repository.
For accessability and reproducibility, we provide an up-to-date version of the salvage data as .csv
s from the “current” (1993 - Present) salvage database file (Salvage_data_FTP.accdb
). The data can be downloaded via various methods from the repository, including from the website.
Updates to the data are executed via cron
jobs on travis-ci
and pushed to GitHub as tagged Releases.
To use the current image to generate an up-to-date container with data for yourself
An additional conversion makes the data available in R
as a list
of data.frames
that is directly analagous to the .accdb
database of tables.
The reading into R is conducted via functions included in the r_functions.R
script in the accessor
image and available in the public code repository.
Building on the list above, you can leverage the r_script.R
script included in the image, which sources the r_functions.R
script and loads the database in as an R
object named database
. Docker provides ample access and avenues to run R within the container. For example, the docker exec
command opens a full API for running within the top (read/write) layer of the container, allowing an endless supply of R code to be included within a single character input:
bash
R
script from the command lineYou can copy your own scripts into the image and then run them from that environment:
Note that we are running the main r_script.R
first still here; that script does not save any files externally, so the R session in 6. is gone when we run 8.. For the sake of simplifying the command line call, it is therefore recommended that expanded uses follow 8. and use my_r_script.R
as a hub file that directs all of your specific functions, including files saved out from R. For simplicity, saving out all files into a single folder, e.g. output
allows a single docker command for retrieving the results from the top layer of the container:
Alternatively, the functions are written in only base R, so they should be reproducibly functional outside the image (in an open session).
Within an instance of R
, navigate to where you have read the data out from the container into/where this code repository located and source r_script.R
:
The resulting database
object is a named list
of the database’s tables, ready for analyses.
Data preparation code is in development!
Having brought the data into R as-is, we can now prepare them for summaries and analyses. We use the functions included in the salvage_functions.R
R script, which is included within the and salvage
docker image, which provides a stable runtime environment for the analyses and output generation (including website rendering).
The data and output are updated daily via cron
jobs on travis-ci
with a recipe (a.k.a. job lifecycle) described by the .travis.yml
file.