After finishing my first test run of query sampling the Journal of the American Chemical Society (JACS) against the Stanford Encyclopedia of Philosophy (SEP), I’m not sure that I can say much meaningful other than there are some potentially interesting questions to ask when I am able to get the data cleaned up more.
The top articles in the query sampling were:
Article 1 of the SEP at least shows that the query sampling recognized that the articles in JACS were about Chemistry. Articles 4 & 5 of the SEP may show a recognition that the JACS articles also discuss methodological issues. Articles 2 & 3 of the SEP are to me the most mysterious. Article 3 of the SEP may show that the query sampling is picking up on terminology within chemistry (the article is largely about how biology can be reduced to chemistry). Article 2 of the SEP also discusses positivism and unpredictability within complex systems so again may be picking up on what is largely the experimental procedures within this data.
Also, I tried to see if I could confirm some trends that the query sampling showed with some topic modelling from the InPho Topic Explorer. For example, here is a quick visualization for the trend (year by year) of the topic for Life. A score of 10 would mean that “Life” is the number 1 article for that year, a score of 0 would mean that the article does not show up at all.
So, “Life does appear as the number 2 article for a few years, but then significantly drops off and by the after 1900 or so becomes an unimportant topic according to this data.
If we do a topic model on words like “organic and protein” which might signify discussion of life, we get this
The top of the graph shows the years when the topic of “life” is most prevalent, in this case 1900, and this graph at least does not seem to reflect the same trends as the earlier graph.
One of the big problems here I think is the fact that I only have the data broken out by year. When I am able to slice off finer chunks of data (like just the methodology articles for certain years), I think I may be able to get more interesting results. Another problem is the fact that the SEP does not talk much about chemistry, so it might also be interesting to compare this data with other subjects, like physics, that are better covered. Do physics show similar haphazard trends or do they reflect the historiography of the field better?
In all, I think this is an interesting proof of concept, but would be significantly more interesting with cleaner data and perhaps some comparisons of different subjects.