Do Journals Help Understand Professionalization?

I have been working on trying to make sense of my data about the history of the Journal of the American Chemical Society, and after doing some rough categorization of “expected” (meaning topics that my  historiography on the American Chemical Society specifically mentions) and “unexpected” topics (which are not specifically mentioned in the historiography).  Here is a graph of the expected and unexpected topics year by year.


Roughly 20% of the time (sometimes less, and occasionally more), the topics in the journal are discussing the kinds of issues the historiography describes.  Thus, one might conclude that the historiography does a fairly good job of understanding the history of the field.  What is interesting, however, is how widely divergent the unexpected topics are.    There is tremendous variation which opens up another question.  might there be some external influence that is causing this variation?  The historiography of the society also divides its timelines of the journal by editor.  Therefore, I decided to see what the expected vs. unexpected topics looked like if you viewed them by editorial years (note that there is some overlap between editors)


Here it appears that the number of unexpected nearly doubles during later years.  This might indicate that the journal is indeed influenced by particular editorial policies.  If one breaks out the unexpected topics from these later years, the division looks something like this.


Aside from the appearance of one article on chemistry education, the division of unexpected topics seems to be primarily society business (eg. who is being elected president, who are presiding officers, or where the annual meeting should be held), and methodology (eg. what is chemistry, what kinds of experimental procedures are acceptable).  Generally speaking, when looking at influence of the editorial board, it seems that methdology (somewhat surprisingly) takes a leading role.

I want to return to my original question, however: Do Journals Help Understand Professionalization? (at least in this particular case of one professional society).  On the one hand, I am inclined to say yes.  We actually see how the writers for the journal are deciding on what chemistry is, and though one might think that this would be a more important issue at the start of a journal’s lifetime, the major debates seem to be happening thirty to forty years after the journal’s foundation.

On the other hand, I am also inclined to question this data.  Though I am confident that I have broken down the topics that are not dealing with chemical experiments and reasonable confident that I have been able to separate out society business and methodology topics, I am somewhat skeptical of what this really shows.  I used Mallet to create topic models for years in a journal.  In essence this is showing me not article by article what is being discussed, but rather topics, or general ideas being discussed in the journal.  Thus, my topic models are a somewhat generalized overview of the journal data itself. Furthermore, I have abstracted out that data into even higher topics of expected and unexpected.  Finally, when I divided by editors, certain trends seemed to become even greater (like the doubling of unexpected topics).

What does all of this mean?  I am trying to determine how a professional society defined itself by using their means of communication, the journal.  There could be multiple ways of doing this, and I experimented with topic modelling.  I am very happy that I was able to find some interesting trends, but I wonder how much the generalization of particular articles (and my interpretations on top of that), may be skewing things even more.  As historians think about using big data for interpretation, I think this question becomes even more important.  When we choose a certain method that a computer then models what we’re seeing, and we’re not doing the traditional close readings that historians do, does that methodology then skew our results.  Furthermore, if we then apply our findings to policy or other practical ends, what are the implications of that?  I don’t pretend to have the answers to these questions in a blog post, but I think we nee to be asking these questions, and also thinking about ways of discussing how our data were manipulated in order to make it clear to our colleagues critiquing our work and also to our readers, who may not understand the particularities of dealing with statistical topic modeling or working with historical data sets.


Computational vs. Traditional Methods

After doing some further work on my topic models for the Journal of the American Chemical Society I began to think a bit about the methods we use for doing history and how computational analyses fit into this.  I know I’m not the first to think about these issues, and, in fact, I have already thought about them briefly just in this project.  Nonetheless, at the risk of excessive navel gazing, what I found interesting is that even with extensive topic models for every year, I still had to go topic by topic and do some organization of the topics according to my interpretation of a history of the American Chemical Society in a long, cumbersome, and manual process.  Thus what started out as a somewhat quantitative project based on numbers of words and putting them into topics became an extremely qualitative and very subjective analysis.  Once I have had a chance to clean up my spreadsheets, I can share them on this blog.  Suffice it to say, however, that in the end I still had to resort to the same kinds of methods that historians and social scientists use when analyzing data, that is categorizing things in a way that makes sense to an individual scholar.  When I was thinking about how to analyze this data, I thought about all kinds of ways of trying to come up with statistical methods or writing scripts that could do the tasks I wanted to do more objectively.  In the end though I could not think about a way of doing those things that ultimately would get at the question I wanted to answer.

My goal with this project has changed over time.  Originally I wanted to determine if there was a network between editors and authors.  There was no meaningful one that I could find.  My goal now was to determine what these topics I had meant, and whether they reflected conventional historiography about the history of the American Chemical Society.  I couldn’t topic model a history book in any meaningful way I could think of and even if I did, trying to measure those topics against the journal topics would kind of be like measuring apples against oranges.  Therefore, I decided to read the history book (or at least the bits relevant to the journal), create my own topic categories, and then manually assign each of the topics Mallet so kindly found for me to a category.

Admittedly doing this work over such a large corpus would have been impossible without computational methods.  In the long run,though, I still resorted to the old fashioned way of making sense out of this information.  Do all computational methods in the end come down to human sense-making?  Perhaps they do, but as we think about how quantitative and qualitative methods interact, this seems to me an interesting example.  Mining a corpus of textual articles is certainly quantitative.  In the end, however, it took qualitative analysis to really attempt to understand what was going on.

Query Sampling Journals

One of the other projects I’ve been working on is sampling the corpus of the Journal of the American Chemical Society against the Stanford Encyclopedia of Philosophy (SEP).  I’ve been using some models set up by the INPHO project here at Indiana.  The hope here is to see whether between 1879 and 1922 (the years I can analyze right now), the journal represents the philosophy of chemistry according to what the SEP thinks should be going on.  Here is a quick spreadsheet of the results.  This is organized year by year, and the articles are ordered in how close the query sampling matches to articles in the SEP (i.e. article 1 is closest, article 2 is next closest, and article 10 is relatively far away).  Overall, not surprisingly, the number one result is Chemistry.  That means at least that the Query samples are picking up on the main topic of the journal.  What is more interesting is the articles that follow.  There are not really any patterns that stand out, and, in some cases, like Duhem is the second closest article in 1889 and a bit further down the list in 1891. Duhem is mentioned by name 3 times in one article in 1891 but not at all in 1889.  To me, this seems to be representing two things.  First, there is some bias in the SEP toward particular topics.  Second, at least for earlier periods, standard encyclopedias like the SEP may not be the best sources for how the field is progressing. I’m still thinking more about that second argument, so more on that later.


Visually Representing History

I’m beginning to think now about some visual ways of representing what is happening in the Journal of the American Chemical Society.  This presents some interesting challenges.  Though there is some historiography about the society, for the most part, what I have been able to find is in a history of the society written in 1952 in celebration of its 75th anniversary.  From a historian’s point of view, this history has multiple methodological problems (technological determinism, whiggism, full of details without much analysis, and I could go on).  It is however what I have.  Thus, a way I am thinking about representing my topic models is by using this (admittedly problematic) history in order to create a kind of timeline. To put this another way, this history presents a narrative of topical progress between the founding of the society and 1952, or, at least it states what certain chemists thought was topical progress.  My data shows reality (at least within the flagship journal).  So, I think it would be interesting to construct some topics from the 1952 history and then see whether the journal does or doesn’t reflect that reality.

For those of you more visually oriented, as I was trying to think about how to do this, I went to Ted Underwood’s site to some pages about methodology.  I think my visualization might look something like this:


The line  (thought it probably wouldn’t be as curved as this one) would represent the topics that the 1952 history says are happening over a set number of years (in my case 1879 – 1922).  The dots would represent the actual topics.

This visualization would, I think, show whether the 1952 history is accurately reflecting the topics in the Journal itself.

Tied to this, I am also doing query sampling of the same data against the Stanford Encyclopedia of Philosophy (another blog post on that later).  What I hope to answer is a related with a different method.  Does a standard reference work in the field reflect what is actually happening?  So far, not surprisingly, the query sampling shows that my corpus is mostly related to chemistry.  I suspect though that an analysis of the secondary entries will be what is most interesting.

Finding Patterns in Scientific Journals

The last topic models I showed for the Journal of the American Chemical Society showed topics across the entire corpus I have (all issues between 1879 and 1922). Now, I have been working on seeing if there are any patterns in the topics from year to year.  Since I ran a 20 topic model using Mallet, the list is quite long, so I created another page for those who want to look at the original data.  For now, I’ll just summarize what I think is happening.  First a few general points.

  • The word molecule first appears within a topic in 1883 and in 1891 it appears in three different topics.  It continues to appear throughout the corpus but not regularly.
  • The word atom first appears in a topic for 1880 and seems to appear more regularly than molecule.
  • The word patent first appears in 1884 but then does not show up all that frequently (only 6 times within the topic models and only until 1892).
  • Many years, though not all, also have topics that seem to pertain to the business of the society with words like journal, meeting, or city names.  Interestingly this was also one of the topics in the overall model, but it is interesting to see how the topic seems to be more dominant in some years than in others.
  • The word method shows up in the topics practically every year and seems to appear more frequently in the earlier years of the journal.

These are just some general observations from admittedly someone who is not trained as a chemist.  There may be other interesting issues that might be clearer to a trained eye.  For my next steps on this project I intend to look at two sources on the history of Chemistry:

There may be other sources, but I think I can at least try to show a proof of concept on these two. Hopefully, there is some way to measure what the topic models are showing against what these more general histories say is happening in the history of the society and in chemistry more generally.

Big Data – A 19th Century Problem

Recently, I was reading an article entitled “Big data problems we face today can be traced to the social ordering practices of the 19th century.”  It led me to think a bit about this history of scholarly communication project which I think is very related to the larger issues they’re discussing.  The first reaction, at least from a historian’s point of view is that the “Big Data” conversation is not the second time this has happened but (at least) the third.  The first arguably would be what Ann Blair has discussed in Too Much to Know, which dealt with the large amount of information produced with the explosion of print also led to new ways of thinking about information management.  Additionally, Peter Burke’s two volume Social History of Knowledge traces some of the same trends over an even longer period of time.  All of that said, the link Robertson and Travaglia make here that I think is unique is the connection between the explosion of data and the political implications.  For the first time, managers felt the need to tie the information society collected to things like performance, productivity, and other metrics, particularly through statistical methods of analysis.  This development of measuring people via statistically sampling data, is certainly true today, and I would agree that in some ways it almost seems like an extension of these earlier trends.

I wanted to comment specifically however on the implications of the larger issues the article discusses with scholarly communication, some of which they actually mention briefly by stating “In some ways growing academic specialisations created a situation in which what was gained through a narrowing of focus and growth in sub-disciplinary activity was also lost in generalisability. This distinctly Victorian problem endures to the present day despite interdisciplinary projects of various kinds.”  They then continue on to suggest that nineteenth century ideologies (including some that are distinctly contrary to modern notions of equality) have continued into the analyses of present day big data issues, and that those underlying ideologies need to be changed.

One of the ideologies not specifically mentioned, but I think very relevant, is a “Whiggish” view of historical progress.  To put that another way, in the early nineteenth and even into the twentieth century, there was a view that the world will continually progress into something new and better.  One of the other strands of historical argument that has played into this Whiggish notion of progress is a belief in technological determinism which posits that technological change drives such progress.  Though such ideas are mostly anathema now, I think one can see the discussions we currently have about the university and its purpose might be tied to these Whiggish views.  For instance discussion that we should eliminate humanities departments and to increase STEM education seem at least to me to fit into these notions of Whiggish and technologically deterministic history.

What does this have to do with scholarly communication?  As Robertson and Travaglia suggest, disciplinarity and the ways universities developed in the nineteenth and twentieth centuries, arguably play into these very notions of creating a method for continuing progress.  The fields of history, history of science, and other disciplines have moved on to other philosophies of interpreting the past, and may even be using big data to prove theories about why technological determinism and whig history are wrong.  How do we bring this discourse into the conversation, especially since policy makers even now may be using discredited Whig notions to decide the future of university education and the production of knowledge?