First Project: Journal of the American Chemical Society

For my first foray into studying the history of scholarly communication, I’d like to study the history of a particular journal.  Since there has already been some work done on this, I’m starting with the Journal of the American Chemical Society.  Fortunately, the complete run of the journal is available at HathiTrust, and I’ve created a collection of all the issues from 1876 (journal’s founding) until 1922 (last year of copyright).

I would like to

  1. Do a network analysis of the authors in the Journal and see who is writing for it and what relationships exist.
  2. Do some topic modelling to see what these authors are talking about and what, if any, relationship there is between these topics and the network of authors.

Issues to solve:

  1. How to get text out of Hathi-Trust
  2. Once I get text, how to deal with the Proceedings of the American Chemical Society

On issue 1:

I’ve written to Hathi and am trying to get RSync set up.  If anyone has done this before and could help, that would be much appreciated.

On issue 2:

The journal started in 1876 as the Proceedings and became the Journal after about a year.  The Proceedings continued to be published, however.  How do these two differ?  How should I analyze them?


2 thoughts on “First Project: Journal of the American Chemical Society

  1. Our rsync questions are ongoing with IT. Hopefully we’ll have some forward movement there soon.

    In terms of the questions: We’ve talked briefly about the natural break that happens between the Proceedings and the Journal. If the question is how scholarly communication changes over time, and why, that break becomes an analytical data point that’s imposed by the editors themselves, a historical artifact that helps your historical analysis. I’d treat the before and after as subsets of a corpora that can be analyzed together or separately, and networked in similar sets and subsets, in order to better understand what drove the editors to change.


    • Colin actually thinks there may be a way around the Rsync issue. So, I may have a solution by Friday to that problem. On the Proceedings vs. Journal question, as I’ve been investigating more, it seems that the Proceedings actually was published intermittently throughout the period 1879-1922 (sometimes as a supplement to the Journal). So my first impression that publication of the Proceedings stopped when the Journal started may not be entirely accurate. For now, I think I may use the Journal as a self-contained corpus and then see how I may be able to analyze the Proceedings as a kind of separate story that interweaves with how the Journal is working.

      In the meantime, I’m working on NER for a self-OCR’d version of volume one of the Journal. Another blog post about that soon.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s