By Thomas J. Leeper
Political scientists increasingly use data visualizations to communicate statistical results. These visualizations make those results clear and intuitive, and perhaps beautiful. Yet the quality of the data story is sometimes lost in pixelated, low-resolution images. While the visualization itself might be strong, the physical rendering of that visualization is less than ideal. Though we are trained to make statistical graphics, the instructions we receive from publishers about how to make those graphics look their best in published form can be vague or confusing. This post discusses the formatting requirements for image files in journal publishing, points out some examples of high- and low-resolution graphics in a recent journal issue, and describes how to produce high-resolution image files from R, Stata, and Excel (possibly with the help of one or more simple add-on utilities).
File Formats and Resolution
Journal publishers have reached an industry consensus on how image files should be formatted for academic journals. As a few examples, Cambridge, Elsevier, Oxford, SAGE, Taylor and Francis, and Wiley agree that all image files should be in one of three formats: TIFF (Tagged Image File Format), EPS (Encapsulated PostScript), or PDF (Portable Document Format). All other formats are discouraged or not allowed, due to their lossy image compression.
Two of the recommended file formats (EPS and PDF) are vector graphics formats, which store an image as a series of lines of specified shape, color, and size between relative points in a space. Thus the image is rendered by drawing each of those vectors from the instructions encoded in the file. Vector graphics look sharp at any size because the vectors are simply drawn larger or smaller, depending on the requested size of the image. This differs from raster graphics, like TIFF, BMP, JPEG, and GIF. Unlike vector graphics, raster graphics are stored as a matrix of colored pixels.
Due to the scalability of vector graphics, they are the logical choice for producing high resolution graphics in academic publishing, and R, Stata, and Excel all produce these formats. When a vector graphic format is not available or for some reason not accepted, graphics need to be saved as high-resolution TIFF files of an appropriate size in order to avoid the grainy or pixelated images sometimes found in journals.
Academic publishers have a clear and shared standard for raster image resolution: a minimum 600 dots per inch (DPI) for most images, and 300 DPI for “halftone” figures, such as multicolor photographs (rare in political science journals). Many computer applications output image files at a default DPI far below this, often 72-96 DPI, which is the standard resolution for internet graphics. Thus, the default file settings for saving to TIFF are almost never appropriate for academic publishing.
Exporting Graphics from R
R offers drivers for all three of the required file formats in the pdf
, postscript
, and tiff
functions. In RGui, one can also save images to any of these formats (or other formats) using the File > Save as
menu on the graphics device window.
For LaTeX users, the PDF format is probably the most familiar due to its easy incorporate into .tex
files. Indeed outputting an R plot to PDF is simple. For all of the below examples, we’ll consider the simple case of plotting a scatterplot. To produce a PDF, one simply needs to precede any plotting function(s) by a call to pdf
, with an appropriate filename specified:
pdf("image.pdf")
plot(x, y)
dev.off()
Because PDF is a vector format, one need not specify height
and width
arguments in order for the image to have desirable resolution. That said, the plot as shown in the R console will not map directly onto the plot saved to file. Specifying a larger image size (with height
and/or width
) will yield a PDF with smaller text and more space between points. These arguments can also be used to control the aspect ratio (i.e., to produce a wider or taller rectangular plotting region).
The other approved vector format is EPS, which we can produce by replicating the above commands almost verbatim, replacing pdf
with postscript
. The R documentation recommends, however, that you first call setEPS()
in order to instruct the postscript
device to render a single-page image (i.e., postscript
can produce multi-page files and using setEPS()
restricts the result to a single page). We can see this code at work below:
setEPS()
postscript("image.eps")
plot(x, y)
dev.off()
Another strategy is to use the Cairo device (cairo_ps
) without the call to setEPS()
:
cairo_ps("image.eps")
plot(x, y)
dev.off()
Again, because EPS is a vector format, one need not specify a resolution, but both postscript
and cairo_ps
support height
and width
arguments for controlling aspect ratio.
Producing TIFF images with the tiff device (tiff
) is similar, but here we must explicitly specify a resolution (with the res
argument), as well as height
and width
in order to yield an appropriately high resolution image at the intended output size (and for height
and width
to be interpretable, one should also specify units
). To save space, these files can be compressed using lossless compression (i.e., with no damage to the resulting image). The preferred compression algorithm is called LZW, which can be specified by the compression
argument to tiff.
Here’s an example:
tiff("image.tif", res=600, compression = "lzw", height=5, width=5, units="in")
plot(x, y)
dev.off()
An alternative workflow to the above methods is to use dev.print
to send a currently open plot window to an image file:
plot(x, y)
dev.print(pdf, "image.pdf")
dev.print(cairo_ps, "image.eps")
dev.print(tiff, "image.tiff", res=600, height=5, width=5, units="in")
This can be especially useful for saving to multiple formats (as above) or for separately saving layers of a multi-layer plot (e.g., to use in a presentation where parts of the figure are revealed sequentially). This is the same general approach used by the ggplot2 graphics package, which offers a ggsave
function that is called after the plotting call. For TIFF formats, the dpi
argument should also be specified to produce an appropriately high resolution. Here are some examples:
qplot(x,y)
ggsave("image.pdf", height=5, width=5, units='in')
ggsave("image.eps", height=5, width=5, units='in')
ggsave("image.tiff", height=5, width=5, units='in', dpi=600)
Exporting Graphics from Stata
Saving graphics interactively in Stata is straightforward. From the graphics window, one can simply press the save icon or select File > Save
and be presented with a simple menu to save the file. PDF, EPS, and TIFF formats are all available by default. Plots can also be saved from the console using graph export
after a plotting command:
graph twoway scatter x y
graph export image.pdf
graph export image.eps
graph export image.tif
Using the menu or icon, the TIFF format prints only at 96 DPI by default. Thus we should turn to the console and use the width
option in our call to graph export
. Unfortunately, it is not possible to specify a resolution directly, so we need to do some hackery. You need to calculate the width of the graph (in pixels) necessary to produce the desired resolution at final printed size. In a full-page two-column journal, a one-column width is about 3 inches and two-column width is about 6.5 inches, whereas in a small-format journal, a one-column width is about 4.25 inches. Thus, if we want 600 DPI, we would need pixel widths of 1800, 3900, or 2550, respectively:
graph twoway scatter x y
graph export image1.tif, width(1800)
graph export image2.tif, width(3900)
graph export image3.tif, width(2550)
The result of each of these is, however, a very large image with 96 DPI resolution (i.e., Stata simply produces a larger image with the same low resolution). Of course, because size and resolution are related, the images are visually equivalent, but you will need to modify the file with another utility to actually see the 600 DPI resolution at the intended output size.
Exporting Graphics from Excel
While versions of Microsoft Excel prior to 2007 allowed users to directly save charts as image files. This is no longer possible. Instead, one needs to use the File > Save As menu to output a chart to PDF (the only built-in file format for printing charts). To do this, select the chart (in need not be on its own tab), then follow File > Save As
. In the pop-up menu, select PDF from the Save as type drop-down menu and specify an appropriate filename. Clicking the Options...
menu will open another small window that allows one to confirm that only the Selected chart
will be output to PDF. Attempting to save a chart (as an object in a spreadsheet tab) without first selecting the chart will cause the Options...
pop-up to display a different set of options, none of which include Selected chart
. Another method is to move the chart to its own sheet, then under File > Save As
, select Options...
, and choose Active sheet(s)
to save just the chart to a PDF file. If, for some reason, a publisher will not accept a PDF file, one can use any of the options described in the next section for converting the PDF to TIFF.
Image Utilities
With our files exported from R, Stata, Excel, or another software application, we may still need to make changes. For example, because Excel can only output PDF, we may need to convert our image to TIFF; or, because Stata can only output TIFF at 96 DPI, we may want to rescale the image to the appropriate size and resolution. Most publishers recommend using Adobe Photoshop to do this. Unfortunately, Adobe Photoshop is proprietary, expensive, and may not be readily accessible. Luckily several free, open-source, and easy-to-use alternatives exist.
For the most direct analogue to Photoshop, one should try GIMP (GNU Image Manipulation Program). For command line manipulation of images, GhostScript works well for PostScript (EPS) and PDF formats and ImageMagick offers diverse functionality for manipulating almost all image formats in addition to EPS and PDF. All of these programs should work on all modern operating systems.
GIMP: GNU Image Manipulation Program
GIMP allows you to easily adjust the resolution of images as well as save images into other file formats. For example, to convert a Stata graph saved at 96 DPI to 600 DPI, we can open GIMP, choose File > Open
to select and import the TIFF file. With the TIFF open, we can choose Image > Print Size...
and a small window will open describing the size and resolution of the file. From that menu, changing the resolution from its default to 600 makes GIMP adjust the image size accordingly.
We can then save the file using File > Export...
, specifying a filename, and selecting TIFF image from the file format drop-down. (We could also save the file in any other format.) If saving to TIFF, a small pop-up window will offer the option to choose file compression such as LZW. The new file will have the intended dimensions and resolution. GIMP is a fully featured image manipulation program, so can also be helpful for converting color to grayscale, cropping, etc.
Because GIMP can read almost any image format, if we have files in other formats, we can easily open them in GIMP and export them to any of the supported formats (e.g., to convert a TIFF to a PDF or vice versa). These features are straightforward and simply require opening the input file and using File > Export...
to save in an appropriate output format. Note, however, that GIMP does not – by default – support EPS format. To convert to or from EPS, we need to have the GhostScript command line utility installed first. Then, when opening an EPS or PDF in GIMP, you can specify the size and resolution at which to render the vector image (because GIMP will coerce the vector graphics to a raster).
Command-line utilities: GhostScript and ImageMagick
File conversion operations can also be performed on the command line. To convert PDF or EPS to TIFF, one can use GhostScript, a command-line utility for working with PDF and PostScript files. To use GhostScript, it must be installed and its directory must be on the system path. On Windows, you will probably have to manually add GhostScript to the system path after it is installed. You can check your path by opening Command Prompt (run it as an administrator) and typing:
echo %PATH%
That will output a long string of delimited directories that point to particular applications. The directory for GhostScript should be among them. For a current version of GhostScript on Windows 7, this directory is listed as: C:/Program Files/gs/gs9.10/bin
. If this (or a similar directory) is not listed in the path, one can easily add it to the Windows path by typing:
set PATH=%PATH%;C:\Program Files\gs\gs9.10\bin
You can also update the Windows system path by visiting Control Panel > System
, clicking on Advanced System Settings
, and pressing the Environment Variables...
button. This will open a small pop-up window where you can edit the PATH
variable for your user account and the Path
variable for all users on the machine. You can simply select the variable, press the Edit...
button, and paste the path to GhostScript (preceded by a semicolon) at the end of the current path.
With GhostScript on the path, we can reopen Command Prompt (or Terminal, on a UNIX-alike) and navigate to the directory containing our images. Let’s say we want to create image.tif
from image.pdf
, we can simply type the following:
gs -r600 -sDEVICE=tiffg4 -sOutputFile=image.tif image.pdf -dBATCH
The result is a high resolution TIFF file called image.tif
created in our working directory. Here’s a breakdown of the command:
-
gs
refers to GhostScript (On Windows,gs
may instead need to begswin64
orgswin32
, referring to the name of the actual GhostScript application.) -
-r600
requests a 600 DPI resolution -
-sDEVICE=tiffg4
says what image device to use (in our case, one for monochrome TIFF files, though if working with color graphics, another device might be appropriate such astiff24nc
for 24-bit color ortiff12nc
for 12-bit color) -
-dBATCH
cleans up a little bit after everything is done.
Replacing image.pdf
with image.eps
, we can do the same conversion from EPS to TIFF:
gs -r600 -sDEVICE=tiffg4 -sOutputFile=image.tif image.eps -dBATCH
Thus both GIMP and GhostScript can convert between relevant formats. By installing both GhostScript and ImageMagick, you should also be able to do the conversion even more easily. To use ImageMagick, it must also be on the system path (and you can follow the above directions to ensure it is available on the path).
The following code for ImageMagick is equivalent to the above line for GhostScript to convert PDF to TIFF or to convert EPS to TIFF:
convert -density 600 image.pdf image.tif
convert -density 600 image.eps image.tif
We can also reverse the process to turn an EPS or TIFF into a PDF:
convert -density 600 image.eps image.pdf
convert -density 600 image.tif image.pdf
Note that this final conversion will not improve the resolution of a TIFF or convert it to a vector image. Instead it will simply encode the original raster into a PDF file. ImageMagick can also be used to perform a large number of other image manipulations.