An Open Collection of Political Science Research with OLS Models and Cross-Sectional Data.
Department of Politics, University of Virginia
Finding accessible and interesting examples and material for exercises is a common teaching challenge in political science, especially in research methods classes. Introductory courses in political science research methods often cover ordinary least squares (OLS) models in cross-sectional data. Locating examples from published research to illustrate linear regression and give students practice with replicating and assessing published findings is difficult, however. While OLS analyses of cross-sectional data are common in older papers, many of these papers do not have public replication data. Recent publications with accessible replication data rarely use linear regression in cross-sectional data. They instead employ maximum likelihood estimation, panel data, or other techniques outside the scope of introductory courses. Therefore, locating suitable papers and replication data with cross-sectional data and an OLS estimator is difficult even for faculty and is doubly challenging for students.
To provide material for accessible examples and replication exercises in undergraduate and introductory graduate research methods courses, I created an open repository of published research in political science with cross-sectional data and an OLS model. The collection reduces the cost of searching for published work that matches the material in many introductory methods classes. As such, this resource helps teachers and students spend less time searching for suitable examples and replication data, leaving more time for teaching and learning research methods.
Replications, Extensions and Examples.
Teachers and students of political science can use this data collection for any project they imagine, but I anticipate two core uses. First, the collection facilitates assignments to replicate and diagnose published work. Because examining cross-sectional OLS models helps students develop statistical knowledge and model-checking skills, replication assignments are a common part of introductory undergraduate and graduate courses. One recent textbook guides students through replicating and diagnosing published research (Li 2018).
In a replication assignment, students read a paper, load the associated dataset into their preferred software and reproduce the estimates and inferences from the paper. After replicating the estimates, students can then diagnose the regression. Common OLS diagnostics include tests for influential observations, heteroscedasticity and other violations of the Gauss-Markov assumptions. If the students find that the model violates an assumption, they can then correct the model and see if the results change. Reproducing and diagnosing results can be done with any software package, once students identify the correct variable labels.
Replicating published scholarship allows students opportunities to “practice by doing”, which is essential for learning research methods (Adriaensen et al 2011). Writing up a replication report reinforces common learning objectives such as interpreting regression estimates and understanding statistical inference. Moreover, replication exercises help students learn that all statistical procedures rely on assumptions, and that violating those assumptions can affect conclusions from the model. When a class emphasizes a deeper understanding of regression models, asking students to diagnose a published model also allows teachers to assess knowledge of the Gauss-Markov assumptions and the consequences of violations for regression estimates.
Reproducing and analysing published results makes students more informed consumers of political science research. Assessing the validity of published findings encourages students to engage critically with scholarship. In the face of a “replication crisis’’ in science (Simmons et al 2011), encouraging students to scrutinize published research and carefully check inferences is worthwhile.
Students can also use materials from the repository to answer their own research questions. Extension projects could include changing model specifications, adding new data, or using a dataset to answer a different question, to give three examples. Because most of these datasets are well-organized, students can implement original projects without wrangling data into a usable form.
The repository also provides a store of materials for teaching examples. As the repository expands, it will give teachers a diverse set of examples to show how regression works in practice. Using examples from published research can demonstrate to students how scholars use quantitative methods to answer important and interesting questions.
Currently, the repository contains eleven papers from international relations and comparative politics. Table 1 summarizes the authors, year of publication and title of each article. While I am confident that there are more papers in comparative and American politics scholarship that fit the collection, finding such examples and replication data is difficult, for some of the reasons I outlined in the introduction. By posting the repository publically and inviting contributions of papers and data, I hope the collection will expand and diversify. The next section describes how the access data and references in the collection.
[Table 1 is a the end of this essay]
How to access the repository
The replication collection is freely available online in a Github repository. I use Github to store the collection as it facilitates transparent management of a public resource for teachers and students of political science. Git and Github offer a transparent way to document changes in the repository over time (Jones 2013). Placing the repository on Github also allows other scholars to upload suitable publications themselves by opening issues or making pull requests. Finally, this repository is meant to be a public resource for the discipline of political science and using Github makes it fully public.
I divided the repository into folders, with one folder for each paper. All the papers contain at least one OLS model with cross-sectional data, although some employ additional data and estimators. Each folder contains data, reference information for the article in a text file, and code if available. Each reference allows users to find the article for themselves, and the file names use the general form author-reference.txt. Most of the data is in Excel or STATA format, and most code is in STATA do-files. I have also included codebooks and online appendices, whenever they are available.
There are two ways to access the contents of the repository. First, teachers and students are free to download clone or fork thewhole repository. Users can also download individual code, or datasets from the folders for each paper. Thus, teachers have three options. They can download the whole repository and provide it for students in a different location such as a campus learning management system, give students data and code from one paper or direct students to the Github repository and ask them to download the materials themselves. After finding and reading a paper, downloading the data and loading it into their software of choice, students can begin replicating and assessing published findings.
The repository has two limitations. First, although Github offers a simple and accessible way to store the collection, the interface is more cluttered than a standard website. When teachers ask students to identify and download a paper themselves, providing an in-class walkthrough could alleviate some of this difficulty. Second, some papers do not have replication code or a codebook, which makes identifying which variables go into models more difficult. These limitations can require individual attention for students, but they are still far less onerous than searching the breadth of political science scholarship.
Finding replication materials for published work with cross-sectional data and an OLS model is difficult. Therefore, I created an online repository of such studies to facilitate teaching examples and replication exercises in introductory research methods and statistics classes. Replication exercises with data from the collection offer research methods students an opportunity to assess their understanding of key concepts. It also gives students experience checking estimates from published papers, so they can critically engage with methodology in published research.
In closing, I invite other teachers of political science to use and contribute to the repository. Teaching examples and replication exercises are the most likely uses, but I am sure that teachers will find other productive uses. This replication collection is meant to be a public good. I hope that scholars will use and contribute to the collection to advance teaching research methods in political science.
Acknowledgements: Thanks to Quan Li and Wendi Kaspar for helpful comments.
Appel, Benjamin J. and Cyanne E. Loyle. 2012. “The economic benefits of justice: Post-conflict justice and foreign direct investment.” Journal of Peace Research 49(5): 685-99.
Adriaensen, Johan, Bart Kerremans & Koen Slootmaeckers (2015) “Editors’ Introduction to the Thematic Issue: Mad about Methods? Teaching Research Methods in Political Science.” Journal of Political Science Education (11):1, 1-10
Braithwaite, Alex. 2006. “The Geographic Spread of Militarized Disputes.” Journal of Peace Research 43(5): 507-22.
Fuhrmann, Matthew. 2008. “Exporting Mass Destruction? The Determinants of Dual-Use Trade.” Journal of Peace Research 45(5): 633-652.
Furia, Peter A. and Russel A. Lucas. 2008. “Determinants of Arab Public Opinion on Foreign Relations.” International Studies Quarterly 50: 585-605.
Ghobarah, Hazem Adam, Paul Huth, & Bruce Russett. 2004. “Comparative Public Health: The Political Economy of Human Misery and Well-Being.” International Studies Quarterly, 48(1): 73-94.
Goldsmith, Benjamin E. and Yusaku Horiuchi. 2012. “In Search of Soft Power: Does Foreign Public Opinion Matter for US Foreign Policy?” World Politics 64(3): 555-85.
Jones, Zachary M. 2013. “Git/GitHub, Transparency, and Legitimacy in Quantitative Research.” The Political Methodologist 21(1): 6-7.
Kono, Damiel Y. 2006. “Optimal Obfuscation: Democracy and Trade Policy Transparency.” American Political Science Review 100(3): 369-84.
Kono, Daniel Y. 2007. “When Do Trade Blocs Block Trade?” International Studies Quarterly 51(1): 165-181.
Leblang, David A. 1996. “Property Rights, Democracy and Economic Growth.” Political Research Quarterly 49(1) 5-26.
Li, Quan. 2018. Using R for Data Analysis in Social Sciences: A Research Project-oriented Approach. New York: Oxford University Press.
Potter, Joshua D. and Margit Tavits. 2013. “The Impact of Campaign Finance Laws on Party Competition” British Journal of Political Science 45(1): 73-95.
Reuveny, Rafael and Quan Li. 2003. “Economic Openness, Democracy and Income Inequality: An Empirical Analysis.” Comparative Political Studies 36(5): 575-601.
Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359-1366.
 Data-wrangling, or taking raw data and organizing it for analysis, is a valuable skill. The repository provides opportunities to work with different data formats, but data wrangling and cleaning is often falls beyond the scope of a research methods course.
 Examples are concentrated in these subfields due to my substantive interests.
 Alternatively, contributors can submit references, code and data by email.
 If users do not have a STATA license, they can use R or SPSS to load the data, and examine the contents of do-files using any text editor, such as the Notepad on Windows computers.