Query Sampling Results

After finishing my first test run of query sampling the Journal of the American Chemical Society (JACS) against the Stanford Encyclopedia of Philosophy (SEP), I’m not sure that I can say much meaningful other than there are some potentially interesting questions to ask when I am able to get the data cleaned up more.

The top articles in the query sampling were:

  1. Philosophy of Chemistry
  2. Chaos
  3. Reductionism in Biology
  4. Mechanisms in Science
  5. Models in Science

Article 1 of the SEP at least shows that the query sampling recognized that the articles in JACS were about Chemistry.  Articles 4 & 5 of the SEP may show a recognition that the JACS articles also discuss methodological issues.  Articles 2 & 3 of the SEP are to me the most mysterious.  Article 3 of the SEP may show that the query sampling is picking up on terminology within chemistry (the article is largely about how biology can be reduced to chemistry).  Article 2  of the SEP also discusses positivism and unpredictability within complex systems so again may be picking up on what is largely the experimental procedures within this data.

Also, I tried to see if I could confirm some trends that the query sampling showed with some topic modelling from the InPho Topic Explorer.  For example, here is a quick visualization for the trend (year by year) of the topic for Life.  A score of 10 would mean that “Life” is the number 1 article for that year, a score of 0 would mean that the article does not show up at all.

Picture1  So, “Life does appear as the number 2 article for a few years, but then significantly drops off and by the after 1900 or so becomes an unimportant topic according to this data.

If we do a topic model on words like “organic and protein” which might signify discussion of life, we get this

Picture2

The top of the graph shows the years when the topic of “life” is most prevalent, in this case 1900, and this graph at least does not seem to reflect the same trends as the earlier graph.

One of the big problems here I think is the fact that I only have the data broken out by year.  When I am able to slice off finer chunks of data (like just the methodology articles for certain years), I think I may be able to get more interesting results.  Another problem is the fact that the SEP does not talk much about chemistry, so it might also be interesting to compare this data with other subjects, like physics, that are better covered.  Do physics show similar haphazard trends or do they reflect the historiography of the field better?

In all, I think this is an interesting proof of concept, but would be significantly more interesting with cleaner data and perhaps some comparisons of different subjects.

Advertisements

Do Journals Help Understand Professionalization?

I have been working on trying to make sense of my data about the history of the Journal of the American Chemical Society, and after doing some rough categorization of “expected” (meaning topics that my  historiography on the American Chemical Society specifically mentions) and “unexpected” topics (which are not specifically mentioned in the historiography).  Here is a graph of the expected and unexpected topics year by year.

expected_vs_unexpected_yearly2

Roughly 20% of the time (sometimes less, and occasionally more), the topics in the journal are discussing the kinds of issues the historiography describes.  Thus, one might conclude that the historiography does a fairly good job of understanding the history of the field.  What is interesting, however, is how widely divergent the unexpected topics are.    There is tremendous variation which opens up another question.  might there be some external influence that is causing this variation?  The historiography of the society also divides its timelines of the journal by editor.  Therefore, I decided to see what the expected vs. unexpected topics looked like if you viewed them by editorial years (note that there is some overlap between editors)

expected_vs_unexpected_editor3

Here it appears that the number of unexpected nearly doubles during later years.  This might indicate that the journal is indeed influenced by particular editorial policies.  If one breaks out the unexpected topics from these later years, the division looks something like this.

unexpected_breakdown2

Aside from the appearance of one article on chemistry education, the division of unexpected topics seems to be primarily society business (eg. who is being elected president, who are presiding officers, or where the annual meeting should be held), and methodology (eg. what is chemistry, what kinds of experimental procedures are acceptable).  Generally speaking, when looking at influence of the editorial board, it seems that methdology (somewhat surprisingly) takes a leading role.

I want to return to my original question, however: Do Journals Help Understand Professionalization? (at least in this particular case of one professional society).  On the one hand, I am inclined to say yes.  We actually see how the writers for the journal are deciding on what chemistry is, and though one might think that this would be a more important issue at the start of a journal’s lifetime, the major debates seem to be happening thirty to forty years after the journal’s foundation.

On the other hand, I am also inclined to question this data.  Though I am confident that I have broken down the topics that are not dealing with chemical experiments and reasonable confident that I have been able to separate out society business and methodology topics, I am somewhat skeptical of what this really shows.  I used Mallet to create topic models for years in a journal.  In essence this is showing me not article by article what is being discussed, but rather topics, or general ideas being discussed in the journal.  Thus, my topic models are a somewhat generalized overview of the journal data itself. Furthermore, I have abstracted out that data into even higher topics of expected and unexpected.  Finally, when I divided by editors, certain trends seemed to become even greater (like the doubling of unexpected topics).

What does all of this mean?  I am trying to determine how a professional society defined itself by using their means of communication, the journal.  There could be multiple ways of doing this, and I experimented with topic modelling.  I am very happy that I was able to find some interesting trends, but I wonder how much the generalization of particular articles (and my interpretations on top of that), may be skewing things even more.  As historians think about using big data for interpretation, I think this question becomes even more important.  When we choose a certain method that a computer then models what we’re seeing, and we’re not doing the traditional close readings that historians do, does that methodology then skew our results.  Furthermore, if we then apply our findings to policy or other practical ends, what are the implications of that?  I don’t pretend to have the answers to these questions in a blog post, but I think we nee to be asking these questions, and also thinking about ways of discussing how our data were manipulated in order to make it clear to our colleagues critiquing our work and also to our readers, who may not understand the particularities of dealing with statistical topic modeling or working with historical data sets.