I have been working on trying to make sense of my data about the history of the Journal of the American Chemical Society, and after doing some rough categorization of “expected” (meaning topics that my historiography on the American Chemical Society specifically mentions) and “unexpected” topics (which are not specifically mentioned in the historiography). Here is a graph of the expected and unexpected topics year by year.
Roughly 20% of the time (sometimes less, and occasionally more), the topics in the journal are discussing the kinds of issues the historiography describes. Thus, one might conclude that the historiography does a fairly good job of understanding the history of the field. What is interesting, however, is how widely divergent the unexpected topics are. There is tremendous variation which opens up another question. might there be some external influence that is causing this variation? The historiography of the society also divides its timelines of the journal by editor. Therefore, I decided to see what the expected vs. unexpected topics looked like if you viewed them by editorial years (note that there is some overlap between editors)
Here it appears that the number of unexpected nearly doubles during later years. This might indicate that the journal is indeed influenced by particular editorial policies. If one breaks out the unexpected topics from these later years, the division looks something like this.
Aside from the appearance of one article on chemistry education, the division of unexpected topics seems to be primarily society business (eg. who is being elected president, who are presiding officers, or where the annual meeting should be held), and methodology (eg. what is chemistry, what kinds of experimental procedures are acceptable). Generally speaking, when looking at influence of the editorial board, it seems that methdology (somewhat surprisingly) takes a leading role.
I want to return to my original question, however: Do Journals Help Understand Professionalization? (at least in this particular case of one professional society). On the one hand, I am inclined to say yes. We actually see how the writers for the journal are deciding on what chemistry is, and though one might think that this would be a more important issue at the start of a journal’s lifetime, the major debates seem to be happening thirty to forty years after the journal’s foundation.
On the other hand, I am also inclined to question this data. Though I am confident that I have broken down the topics that are not dealing with chemical experiments and reasonable confident that I have been able to separate out society business and methodology topics, I am somewhat skeptical of what this really shows. I used Mallet to create topic models for years in a journal. In essence this is showing me not article by article what is being discussed, but rather topics, or general ideas being discussed in the journal. Thus, my topic models are a somewhat generalized overview of the journal data itself. Furthermore, I have abstracted out that data into even higher topics of expected and unexpected. Finally, when I divided by editors, certain trends seemed to become even greater (like the doubling of unexpected topics).
What does all of this mean? I am trying to determine how a professional society defined itself by using their means of communication, the journal. There could be multiple ways of doing this, and I experimented with topic modelling. I am very happy that I was able to find some interesting trends, but I wonder how much the generalization of particular articles (and my interpretations on top of that), may be skewing things even more. As historians think about using big data for interpretation, I think this question becomes even more important. When we choose a certain method that a computer then models what we’re seeing, and we’re not doing the traditional close readings that historians do, does that methodology then skew our results. Furthermore, if we then apply our findings to policy or other practical ends, what are the implications of that? I don’t pretend to have the answers to these questions in a blog post, but I think we nee to be asking these questions, and also thinking about ways of discussing how our data were manipulated in order to make it clear to our colleagues critiquing our work and also to our readers, who may not understand the particularities of dealing with statistical topic modeling or working with historical data sets.