What's in this article:
- How to access JSTOR's content for text and data mining via a free program called Data for Research.
- What you can get from the self-service portion of DfR
- How to get large (over 25,000 articles) data sets
Data for Research is a free service from JSTOR available to computer scientists, digital humanists, independent scholars and anyone interested in mining data for the purpose of uncovering new trends and patterns. DFR allows users to select and interact with content data in the JSTOR archive: data from scholarly journal literature (more than 12 million articles), primary resources (26,000 19th Century British Pamphlets) and eBooks.
What to expect from the new interface:
- You'll need a MyJSTOR account to participate.
- You can get metadata (with references) and Ngram (1-3) files.
- Self-service data sets are up from 1,000 to 25,000 articles.
- Technical specifications, sample datasets and detailed instructions all in one place.
Getting OCR files for journals, reports, pamphlets, and datasets beyond 25k documents: