By Jacob Montgomery and Ryan T. Moore
This post is co-authored by Jacob Montgomery of Washington University in St. Louis and Ryan T. Moore of American University.
This post summarizes our full TPM article, available at this link.
Political methodologists increasingly develop complex computer code for data processing, statistical analysis, and data visualization — code that is intended for eventual distribution to collaborators and readers, and for storage in replication archives. This code can involve multiple functions stored in many files, which can be difficult for others to read, use, or modify.
For researchers working in R , creating package is an attractive option for organizing and distributing complex code. A basic R package consists of a set of functions, documentation, and some metadata. Other components, such as datasets, demos, or compiled code may also be included. Turning all of this into a formal R package makes it easy to distribute it to other scholars either via the Comprehensive R Archiving Network (CRAN) or simply as a compressed folder.
However, transforming R code into a package can a tedious process requiring the generation and organization of files, metadata, and other information in a manner that conforms to R package standards. It can be particularly difficult for users less experienced with R’s technical underpinnings.
Here, we discuss two packages designed to streamline the package development process — devtools and roxygen2.
Building an example package: squaresPack
Readers unfamiliar with the basic structure of an R packages, may wish to consult our full article. Here, we build a toy package called squaresPack using the code stored here.
R package development requires building a directory of files that include the R code, documentation, and two specific files containing required metadata. (The canonical source for information on package development for R is the extensive and sometimes daunting document, Writing R Extensions.)
As an example, imagine that we wish to create a simple package containing only the following two functions.
## Function 1: Sum of squares
addSquares
function
(x, y){
return
(
list
(square=(x^2 + y^2), x = x, y = y))
}
## Function 2: Difference of squares
subtractSquares
function
(x, y){
return
(
list
(square=(x^2 - y^2), x = x, y = y))
}

Here is an example of how the directory for a simple package should be structured.
First, we store all R source code in the subdirectory R. Second, corresponding documentation should accompany all functions that users can call. This documentation is stored in the subdirectory labeled man. As an example, the file addSquares.Rd would be laid out as follows.
\name{addSquares}
\alias{addSquares}
\title{Adding squared values}
\usage{
addSquares
(x, y)
}
\arguments{
\item{x}{A numeric object.}
\item{y}{A numeric object with the same dimensionality as \code{x}.}
}
\value{
A list with the elements
\item{squares}{The sum of the squared values.}
\item{x}{The first object input.}
\item{y}{The second object input.}
}
\description{
Finds the squared sum of numbers.
}
\note{
This is a very simple
function
.
}
\examples{
myX
c
(20, 3); myY
c
(-2, 4.1)
addSquares
(myX, myY)
}
\author{
Jacob M. Montgomery
}
Third, the directory must contain a file named DESCRIPTION that documents the directory in a specific way. The DESCRIPTION file contains basic information including the package name, the formal title, the current version number, the date for the version release, and the name of the author and maintainer. Here we also specify any dependencies on other R packages and list the files in the R subdirectory.
Package: squaresPack
Title: Adding and subtracting squared values
Version: 0.1
Author: Jacob M. Montgomery and Ryan T. Moore
Maintainer: Ryan T. Mooore
Description: Find sum and difference of squared values
Depends:
R
(>= 3.1.0)
License:
GPL
(> = 2)
Collate:
`addSquares.R'
`subtractSquares.R'
Finally, the NAMESPACE file is a list of commands that are run by R when the package is loaded to make the R functions, classes, and methods defined in the package visible to R and the user. This is a much more cumbersome process when class structures and methods must be declared, as we discuss briefly below. For the present example, the
NAMESPACE file is quite simple, telling R to allow the user to call our two functions.
Once all of that is set up, however, several steps remain. A minimal checklist for updating a package and submitting it to CRAN might look like the following:
- Edit DESCRIPTION file
- Change R code and/or data files.
- Edit NAMESPACE file
- Update man files
- R CMD build –resave-data=no pkg
- R CMD check pkg
- R CMD INSTALL pkg
- Build Windows version to ensure compliance by submitting to: http://win-builder.r-project.org/
-
Upload to CRAN (Terminal below, or use other FTP client):
> ftp cran.r-project.org
> cd incoming
> put pkg_0.1-1.tar.gz - Email R-core team: cran@r-project.org
We have been part of writing four R packages over the course of the last six years. In order to keep track of all the manual updating steps, one of us created an 17-point checklist outlining the steps required each time a package is edited, and we expect that most authors will welcome some automation. The packages devtools and roxygen2 promise to improve upon this hands-on maintenance and allow authors to focus more on improving the functionality and documentation of their package rather than on bookkeeping.
Building with devtools and roxygen2
The devtools approach streamlines several steps: it creates and updates appropriate documentation files; it eliminates the need to leave R to build and check the package from the terminal prompt; and it submits the package to win-builder and CRAN and emails the R-core team from within R itself. After the initial directory structure is created, the only files that are edited directly by the author are contained in the R directory (with one exception — the DESCRIPTION file should be reviewed before the package is released). This is possible because devtools automates the writing of the help files, the NAMESPACE file, and updating of the DESCRIPTION file relying on information placed directly in *.R files.
We will provide some examples below, but here is a helpful video we recently discovered that covers some of the same ground for users of RStudio:
There are several advantages to developing code with devtools, but the main benefit is improved workflow. For instance, adding a new function to the package using more manual methods means creating the code in a *.R file stored in the R subdirectory, specifying the attendant documentation as a *.Rd file in the man subdirectory, and then updating the DESCRIPTION and NAMESPACE files. In contrast, developing new functions with devtools requires only editing a single *.R file, wherein the function and its documentation are written simultaneously. devtools then updates the documentation, and package metadata with no further attention.
Thus, one key advantage of using devtools to develop a package is that the R files will themselves contain the information for generating help files and updating metadata files. Each function is accompanied by detailed comments that are parsed and used to update the other files. As an example, here we show the addSquares.R file as it should be written to create the same help files and NAMESPACE files shown above.
#' Adding squared values
#'
#' Finds the sum of squared numbers.
#'
#' @param x A numeric object.
#' @param y A numeric object with the same dimensionality as \code{x}.
#'
#' @return A list with the elements
#' \item{squares}{The sum of the squared values.}
#' \item{x}{The first object input.}
#' \item{y}{The second object input.}
#' @author Jacob M. Montgomery
#' @note This is a very simple function.
#' @examples
#'
#' myX
#' myY
#' addSquares(myX, myY)
#' @rdname addSquares
#' @export
addSquares
function
(x, y){
return
(
list
(square=(x^2 + y^2), x = x, y = y))
}
The text following the #’ symbols is processed by R during package creation to make the *.Rd and NAMESPACE files. The @param, @return, @author, @note, @examples, and @seealso commands specify the corresponding block in the help file. The @rdname block overrides the default setting to specify the name of the associated help file, and @export instructs R to add the necessary commands to the NAMESPACE file. We now walk through the steps required to initialize and maintain a package with devtools.
Setting up the package
Creating an R package from these augmented *.R files is straightforward. First, we must create the basic directory structure using
setwd
(
"~/Desktop/MyPackage/"
)
## Set the working directory
create
(
"squaresPack"
)
Second, we edit the DESCRIPTION file to make sure it contains the correct version, package name, dependencies, licensing, and authorship of the package. The create() call will produce a template for you to fill in. The author will need to add something like
Author: Me
Maintainer: Me@myemail.edu
to this template DESCRIPTION file. You need not keep track of the various R files to be collated; devtools will automatically collate all R files contained in the various subdirectories. Third, place the relevant R scripts in the R directory. Finally, making sure that the working directory is correctly set, we can create and document the package using three commands:
current.code
as.package
(
"squaresPack"
)
load_all
(current.code)
document
(current.code)
The as.package() command will load the package and create an object representation (\texttt{current.code}) of the entire package in the user’s workspace. The load_all() command will load all of the R files from the package into the user’s workspace as if the package was already installed. The document() command will create the required documentation files for each function and the package, as well as update the NAMESPACE and DESCRIPTION files.
Sharing the package
Once all of this is in place, the author prepares the package for wider release from within R itself. To build the package as a compressed file in your working directory, run build(current.code, path=getwd()). The analogous build_win() command will upload your package to the win-builder website. Your package will be built in a Windows environment and an email will be sent to the address of the maintainer in the DESCRIPTION file with results in about thirty minutes. Both of these compressed files can be uploaded onto websites, sent by email, or stored in replication archives. Other users can simply download the package and install it locally.
The list below provides a minimal checklist for editing and submitting an existing R package using devtools.
- Edit R code and/or data files
- Run as.package(), load_all(), and document()
- Check the code: check(current.code)
- Make a Windows build: build_win(current.code)
- Double-check the DESCRIPTION file
- Submit the package to CRAN: release(current.code, check=FALSE)
The check() command is analogous to the R CMD check from the terminal, but it also (re)builds the package. Assuming that the package passes all of the required checks, it is now ready for submission to CRAN. As a final precaution, we recommend taking a moment to visually inspect the DESCRIPTION file one last time to ensure that it contains the correct email address for the maintainer and the correct release version. Finally, the release() command will submit the package via FTP and open up the required email using the computer’s default email client.
Conclusion
We have outlined the components of a simple R package and two approaches for developing and maintaining them. In particular, we illustrated how the devtools package can aid package authors in package maintenance by automating several steps of the process. The package allows authors to focus on only editing *.R files since both documentation and metadata files are updated automatically. The package also automates several steps such as submission to CRAN via ftp.
While we believe that the devtools approach to creating and managing R packages offers several advantages, there are potential drawbacks. We routinely use other of Hadley Wickham’s excellent packages, such as reshape, plyr, lubridate, and ggplot2. On one hand, each of them offers automation that greatly speeds up complex processes such as attractively displaying high-dimensional data. However, it can also take time to learn a new syntax for old tricks (like specifying x and y limits for a plot). Such frustrations may make package writers hesitant to give up full control from a more manual maintenance system. By making one’s R code conform to the requirements of the devtools workflow, one loses some degree of flexibility.
Yet, devtools makes it simpler to execute the required steps efficiently. It promises to smoothly integrate package development and checks, cut out the need to switch between R and the command line, and greatly reduce the number of files and directories that must be manually edited. Moreover, the latest release of the package contains many further refinements. It is possible, for instance, to build packages directly from GitHub repositories, create vignettes, and create clean environments for code development. Thus, while developing R packages and code in a manner consistent with devtools does require re-learning some basic techniques, we believe that it comes with significant advantages for speeding up development while reducing the degree of frustration commonly associated with transforming a batch of code into a package.