Title: | Reproducible Data Retrieval from the Roper Center Data Archive |
---|---|
Description: | Reproducible, programmatic retrieval of datasets from the Roper Center data archive. The Roper Center for Public Opinion Research <https://ropercenter.cornell.edu> maintains the largest archive of public opinion data in existence, but researchers using these datasets are caught in a bind. The Center's terms and conditions bar redistribution of downloaded datasets, but to ensure that one's work can be reproduced, assessed, and built upon by others, one must provide access to the raw data one employed. The `ropercenter` package cuts this knot by providing registered users with programmatic, reproducible access to Roper Center datasets from within R. |
Authors: | Frederick Solt [aut, cre, cph], Jennifer Lin [ctb], Paul Gronke [ctb], Dave Peterson [ctb] |
Maintainer: | Frederick Solt <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.2.9000 |
Built: | 2024-10-29 03:01:04 UTC |
Source: | https://github.com/fsolt/ropercenter |
read_ascii
helps format ASCII data files downloaded from the Roper Center.
read_ascii( file, total_cards = 1, var_names, var_cards = 1, var_positions, var_widths, card_pattern, respondent_pattern )
read_ascii( file, total_cards = 1, var_names, var_cards = 1, var_positions, var_widths, card_pattern, respondent_pattern )
file |
A path to an ASCII data file. |
total_cards |
For multicard files, the number of cards in the file. |
var_names |
A string vector of variable names. |
var_cards |
For multicard files, a numeric vector of the cards on which |
var_positions |
A numeric vector of the column positions in which |
var_widths |
A numeric vector of the widths used to record |
card_pattern |
For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's card identifier; e.g., if the card number is stored in the last digit of each line, "\d$". |
respondent_pattern |
For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's respondent identifier; e.g., if the respondent number is stored in the first four digits of each line, preceded by a space, "(?<=^\s)\d4". |
Many older Roper Center datasets are available only in ASCII format, which is notoriously difficult to work with. The 'read_ascii' function facilitates the process of extracting selected variables from ASCII datasets. For single-card files, one can simply identify the names, positions, and widths of the needed variables from the codebook and pass them to read_ascii
's var_names
, var_positions
, and var_widths
arguments. Multicard datasets are more complicated. In the best case, the file contains one line per card per respondent; then, the user can extract the needed variables by adding only the var_cards
and total_cards
arguments. When this condition is violated—there is not a line for every card for every respondent, or there are extra lines—the function will throw an error and request the user specify the additional arguments card_pattern
and respondent_pattern
.
A data frame containing any variables specified in the var_names
argument, plus a numeric respondent
identifier and as many string card
variables (card1
, card2
, ...) as specified by the total_cards
argument.
## Not run: # a single-card file roper_download("USAIPO1982-1197G", # Gallup Poll for June 25-28, 1982 download_dir = tempdir()) # remember to specify a directory for your download gallup1982 <- read_ascii(file = file.path(tempdir(), "USAIPO1982-1197G", "1197.dat"), var_names = c("q09j", "weight"), var_positions = c(38, 1), var_widths = c(1, 1)) # a multi-card file, with extra lines that make the card_pattern and respondent_pattern arguments necessary roper_download("USAIPOCNUS1996-9603008", # Gallup/CNN/USA Today Poll: Politics/1996 Election download_dir = tempdir()) # remember to specify a directory for your download gallup1996 <- read_ascii(file = file.path(tempdir(), "USAIPOCNUS1996-9603008", "a9603008.dat"), var_names = c("q43a", "q44", "weight"), var_cards = c(6, 6, 1), var_positions = c(62, 64, 13), var_widths = c(1, 1, 3), total_cards = 7, card_pattern = "(?<=^.{10})\\d", # (a digit, preceded by the start of the line # and ten other characters) respondent_pattern = "(?<=^\\s{2})\\d{4}") # (# four digits, preceded by the start of the line # and two whitespace characters) ## End(Not run)
## Not run: # a single-card file roper_download("USAIPO1982-1197G", # Gallup Poll for June 25-28, 1982 download_dir = tempdir()) # remember to specify a directory for your download gallup1982 <- read_ascii(file = file.path(tempdir(), "USAIPO1982-1197G", "1197.dat"), var_names = c("q09j", "weight"), var_positions = c(38, 1), var_widths = c(1, 1)) # a multi-card file, with extra lines that make the card_pattern and respondent_pattern arguments necessary roper_download("USAIPOCNUS1996-9603008", # Gallup/CNN/USA Today Poll: Politics/1996 Election download_dir = tempdir()) # remember to specify a directory for your download gallup1996 <- read_ascii(file = file.path(tempdir(), "USAIPOCNUS1996-9603008", "a9603008.dat"), var_names = c("q43a", "q44", "weight"), var_cards = c(6, 6, 1), var_positions = c(62, 64, 13), var_widths = c(1, 1, 3), total_cards = 7, card_pattern = "(?<=^.{10})\\d", # (a digit, preceded by the start of the line # and ten other characters) respondent_pattern = "(?<=^\\s{2})\\d{4}") # (# four digits, preceded by the start of the line # and two whitespace characters) ## End(Not run)
roper_download
provides a programmatic and reproducible means to download
datasets from the Roper Center's data archive
roper_download( file_id, affiliation = getOption("roper_affiliation"), email = getOption("roper_email"), password = getOption("roper_password"), reset = FALSE, download_dir = "roper_data", msg = TRUE, convert = TRUE, delay = 2 )
roper_download( file_id, affiliation = getOption("roper_affiliation"), email = getOption("roper_email"), password = getOption("roper_password"), reset = FALSE, download_dir = "roper_data", msg = TRUE, convert = TRUE, delay = 2 )
file_id |
The unique identifier (or optionally a vector of these identifiers) for the dataset(s) to be downloaded (see details). Both new Archive Numbers and old Historical Archive Numbers, if the latter are available, may be used. |
affiliation , email , password
|
Your Roper Center affiliation, email, and password (see details) |
reset |
If TRUE, you will be asked to re-enter your Roper Center affiliation, email, and password. |
download_dir |
The directory (relative to your working directory) to which files from the Roper Center will be downloaded. |
msg |
If TRUE, outputs a message showing which data set is being downloaded. |
convert |
If TRUE, converts downloaded file(s) to .RData format. |
delay |
If the speed of your connection to the Roper Center data archive is particularly slow,
|
To avoid requiring others to edit your scripts to insert their own affiliation,
email, and password or to force them to do so interactively, the default is set
to fetch this information from the user's .Rprofile. Before running
roper_download
, then, you should be sure to add these options to your
.Rprofile (usethis::edit_r_profile()
is one particularly easy way),
substituting your own info for the example below:
options("roper_affiliation" = "Upper Midwest University",
"roper_email" = "[email protected]",
"roper_password" = "password123!")
The function returns nothing, but has the side effect of downloading all files of the datasets identified in the file_id argument.
## Not run: roper_download(file_id = c("31117412", "USPEW2015-GOVERNANCE"), download_dir = tempdir()) # remember to specify a directory for your download ## End(Not run)
## Not run: roper_download(file_id = c("31117412", "USPEW2015-GOVERNANCE"), download_dir = tempdir()) # remember to specify a directory for your download ## End(Not run)