What's in this article:
- How to access JSTOR's content for text and data mining via a free program called Data for Research.
- What you can get from the self-service portion of DfR.
- How to get large (over 25,000 articles) data sets or data sets with full-text OCR.
Data for Research is a free JSTOR service available to anyone interested in mining data for their research project. DFR allows users to select and interact with content data in the JSTOR archive, including data from scholarly journal literature (more than 12 million articles), primary sources (26,000 19th Century British Pamphlets), and Books.
All you need to get started is an individual JSTOR account (which you can create for on our registration page). Once you are logged in you can, you can obtain the following data:
- Metadata (with references) and word counts, bi-grams, and tri-grams for up to 25,000 articles at a time.
- Technical specifications, sample datasets, and detailed instructions all in one place.
To obtain OCR files for journals, reports, pamphlets, and datasets beyond 25,000 documents, please complete the following form so our team can evaluate your request: DFR Custom Request Form.