Over the last year and a half, I have been working within a project on the scientific elite within sociology and economics. We requested and received a large dataset with all articles from sociological journals in JSTOR. Naturally this posed quite a challenge regarding import and analysis of this dataset. Since the statistical language R is where I do all my quantitative work, we deiced that I should write some scripts to import and clean the data. During that process, I opted to put those utilities into an R package, which makes it easier to develop, organize and maintain the functions. At some stage the further decision was made to publish that package, since it might be valuable to many other researchers dealing with data from DfR/JSTOR.
The process of finalizing and publishing the package took quite some effort, but the result is a peer-reviewed and well-documented package which helps you with importing/converting the data to .csv-files. You can find all relevant information, from how to install the package in R and a general introduction on how to use the package to an extended case study, on the package homepage: https://ropensci.github.io/jstor/
If you prefer a visual and auditive introduction over a written one, you can watch my talk at this year’s useR! in Brisbane: https://www.youtube.com/watch?v=kNRbT-ki9tU. The slides from the presentation are at https://speakerdeck.com/tklebel/jstor-an-r-package-for-analysing-scientific-articles.
To give you a glimpse on how you can use the package, here are a few lines of code:
import <- jst_define_import(article = c(jst_get_article, jst_get_references, jst_get_footnotes),
book = jst_get_book)
jst_import_zip(zip_archive = "path_to_archive_from_DfR.zip",
import_spec = import,
out_file = "imported_data", out_path = "output_folder")
What this does is:
- Install the package on your machine.
- Load the package
- Import data directly from the .zip-archive and save it as .csv files to your disk. `jst_define_import` lets you define which parts you want to import. In this case we import general metadata, references and footnotes for articles, and only general metadata for books (there is no data on references for books afaik).
In case you use the package in a publication, it would be great if you would cite this short paper: http://joss.theoj.org/papers/ba29665c4bff35c37c0ef68cfe356e44
I hope you will find the package useful and would be glad for feedback and suggestions for improvements!
Please sign in to leave a comment.