For my first foray into studying the history of scholarly communication, I’d like to study the history of a particular journal. Since there has already been some work done on this, I’m starting with the Journal of the American Chemical Society. Fortunately, the complete run of the journal is available at HathiTrust, and I’ve created a collection of all the issues from 1876 (journal’s founding) until 1922 (last year of copyright).
I would like to
- Do a network analysis of the authors in the Journal and see who is writing for it and what relationships exist.
- Do some topic modelling to see what these authors are talking about and what, if any, relationship there is between these topics and the network of authors.
Issues to solve:
- How to get text out of Hathi-Trust
- Once I get text, how to deal with the Proceedings of the American Chemical Society
On issue 1:
I’ve written to Hathi and am trying to get RSync set up. If anyone has done this before and could help, that would be much appreciated.
On issue 2:
The journal started in 1876 as the Proceedings and became the Journal after about a year. The Proceedings continued to be published, however. How do these two differ? How should I analyze them?