A new Docker image is available which installs tools on top of the
default rocker/tidyverse
to help persist files over Docker containers. This image is part of the
public
Docker images built on top of googleComputeEngineR
.
With this image, there are three ways to save files between Docker sessions:
googleCloudStorageR
to save and read R working directories between machines, including your
GitHub/SSH configurations.A combination of the above should be used for what best fits your workflow.
These files will disappear if you delete the VM, so it is recommend if they are important to write them somewhere else as well.
If relying on this, you will probably want to create a larger VM disk
than the default 10GBs using the disk_size_gb
argument:
Generally git is the best place for code under version control across many computers. The below details how you can pull code to your Docker container each restart without needing to resupply your GitHub SSH keys.
See also these references:
The below assumes you have started a VM using the
persistent-rstudio
image, which includes SSH tools:
vm <- gce_vm("vm-ssh",
predefined_type = "n1-standard-1",
template = "rstudio",
username = "mark", password = "blah",
dynamic_image = "gcr.io/gcer-public/persistent-rstudio")
Tools > Global Options > Git/SVN > Create RSA Key
Tools > Shell...
,
and configure you GitHub email and username:git config --global user.email "[email protected]"
git config --global user.name "GitHubUserName"
cat .gitconfig
and SSH keys in ls .ssh
,
ssh -T [email protected]
should succeed.Do the below for each new RStudio Project to download from GitHub:
Clone or download
green button and
copy the Clone with SSH
URI. Do not copy the
browser URL! - it won’t workNew Project > Version Control > Git > Repository URL
This configuration should now persist across Docker sessions e.g. you can stop/start the VM and still have GitHub configured.
gce_vm_stop()
gce_vm_start()
cat .gitconfig
and SSH
keys in ls .ssh
and ssh -T [email protected]
worksThis can be combined with the above GitHub settings to persist the GitHub settings over VMs.
The authentication for the googleCloudStorageR
backups
is re-using the credentials you used to launch the VM
It is not intended as a replacement for Git - it only adds files if they are not present locally. I use it to copy projects over to more powerful VMs as required.
googleCloudStorageR
s
gce_create_bucket()
function.Choose a bucket region that is closest to you and your VM for best performance
.Renviron
as the
GCS_SESSION_BUCKET
argument:GCS_SESSION_BUCKET=gcer-bucket-name
The .Renviron
usually sits in your computer home
directory, see ?Startup
for details.
gcs_first()
and gcs_last()
functions to your .RProfile
file like so:.First <- function(){
cat("\n# Welcome Mark! Today is ", date(), "\n")
cat("\n# Loading .Rprofile from", path.expand("~"))
googleCloudStorageR::gcs_first()
}
.Last <- function(){
# will only upload if a _gcssave.yaml in directory with bucketname
googleCloudStorageR::gcs_last()
message("\nGoodbye Mark at ", date(), "\n")
}
_gcssave.yaml
file specifying the GCS bucket to
save to.It can carry various settings shown below:
## The GCS bucket to save/load R workspace from step 1
bucket: my-bucket
## set to FALSE if you want to load on R session startup
load_on_startup: FALSE
## on first load, whether to look for a different directory on GCS than present getwd()
loaddir:
## regex to only save these files to GCS
pattern:
Saving data to Google Cloud Storage:
your-gcs-bucket
2017-08-18 23:25:43 -- File size detected as 1.3 Mb
When you startup that project again you should see:
There are three files to configure:
.Renviron
- environment arguments ìncluding
GCS_SESSION_BUCKET=gcer-bucket-name
that will be looked for
as where your session files are.Rprofile
- general R startup behaviour that carry the
googleCloudStorageR::gcs_last()
and
googleCloudStorageR::gcs_first()
functions_gcssave.yaml
- per folder settings for what to save
that specifies which files to save in which folderNow the R data is saved to GCS under the local folder name. We can load this data in an RStudio server cloud instance via:
gcr.io/gcer-public/persistent-rstudio
that has appropriate
libraries loaded.vm <- gce_vm("mark-rstudio",
template = "rstudio",
username = "mark", password = 'mypassword',
predefined_type = "n1-standard-2",
dynamic_image = "gcr.io/gcer-public/persistent-rstudio")
Login to RStudio server and create an RStudio project
As you did on your local machine, you need to create an .Rprofile
so googleCloudStorageR
can load and save and load data. For
example:
.First <- function(){
cat("\n# Welcome Ignacio! Today is ", date(), "\n")
## will look for download if GCS_SESSION_BUCKET env arg set
googleCloudStorageR::gcs_first()
}
.Last <- function(){
# will only upload if a _gcssave.yaml in directory with bucketname
googleCloudStorageR::gcs_last()
message("\nGoodbye Ignacio at ", date(), "\n")
}
message("n*** Successfully loaded .Rprofile ***n")
_gcssave.yaml
file at the root of the project with these
entries:You can also use the above in conjunction with the GitHub setup to persist over VMs.
To do so, you need to :
GCS_SESSION_BUCKET
or in the
_gcssave.yaml
gcr.io/gcer-public/persistent-rstudio
The configurations of GitHub that are saved in .ssh
and
.gitconfig
folders in your home directory will be backed up
to Google Cloud Storage.
_gcssave.yaml
file to your home folder that will
download/upload the configurations.## The GCS bucket to save/load R workspace from
bucket: gcer-store-my-rstudio-files
## regex to only save these files to GCS
pattern: "id_rsa|.gitconfig"
getwd()
is /home/you
) save the yaml file
and quit the R session:You should see a message saying its saving the home folder. Upon restart, that folder will load from the bucket.
vm2 <- gce_vm("mark-rstudio",
template = "rstudio",
username = "mark", password = 'mypassword',
predefined_type = "n1-standard-2",
dynamic_image = "gcr.io/gcer-public/persistent-rstudio")
gce_set_metadata(list(GCS_SESSION_BUCKET = "your-session-bucket"), vm2)
ssh -T [email protected]
successfullyYou can now delete VMs and start up new ones using RStudio Docker, and the GitHub configurations will persist so long as you follow the steps above.
Since the compute and the data are now separated, you can now become fully cloud native by running RStudio Server on App Engine. This means you don’t need to worry about servers at all. Each time you visit your RStudio Server App Engine URL, a new instance will start, loading your data from your last session. When you finish, close the browser and the VM will tear down itself.
See more at GitHub: RStudio on App Engine.
Running on App Engine has many advantages, including:
This build includes the newest version of
googleCloudStorageR
and googleComputeEngineR
which have had functions added to help with the workflow above.
The functions can store data to Google’s dedicated store via
googleCloudStorageR
s gcs_first
and
gcs_last
functions. This Dockerbuild puts the functions
into a custom .Rprofile
file that will save the projects
workspace data to its own bucket, if they have a
_gcssave.yaml
file in the folder, or if the directory
matches one already saved.
The .yaml
tells googleCloudStorageR
which
bucket to save the folder to, or if not present an environment argument
GCS_SESSION_BUCKET
- this is used on first load when no
.yaml
file is present.
Thus, you can save an RStudio project via your local computer, then
launch an RStudio server in the cloud with the loaddir:
argument set to that directory name to load the files onto your cloud
server. Once done, when you quit the R session it will save your work to
its own new folder, that when you stop/start a Docker container with
RStudio within and create a project with the same name, will
automatically load.
It will only download files to your folder that don’t exist, so local changes won’t be overwritten if they already exist. It is not git, treat it more as a backup that will load if the files are not already present (such as when you relaunch a Docker container)
If you upload to GCS, make sure to load the directory and files you
want - delete the GCS folder if you want to stop backups via
gcs_delete_all()
Example _gcssave.yaml
:
## The GCS bucket to save/load R workspace from
bucket: gcer-store-my-rstudio-files
## set to FALSE if you dont want to load on R session startup
load_on_startup: TRUE
## on first load and init, whether to look for a different directory on GCS than present getwd()
loaddir: /Users/mark/the/folder/on/local
## regex to only save these files to GCS
pattern:
An advantage on using R on a GCE instance is that you can reuse the
authentication used to launch the VM for other cloud services, via
googleAuthR::gar_gce_auth()
so you don’t need to supply
your own auth file.
To use, the VM needs to be supplied with a bucket name environment. Using a separate bucket means the same files can be transferred across Docker RStudio stop/starts and VMs. This is set in the instance running the Docker’s metadata, that will get copied over to an environment argument R can see.