American Journal of Science

untitled-1

(Second Edition of the first volume of the journal, available at from Carnegie Mellon’s digital collection)

Prior to the professional scholarly journal system of today, there was only one major journal for American science,  the American Journal of Science which still exists today and is focused on geology.  In the nineteenth century, however, the journal focused on every scientific topic.  The table of contents for the issues of the first volume (pictured) includes:

  1. Mineralogy, Geology & Topography
  2. Botany
  3. Zoology
  4. Fossil Zoology
  5. Mathematics
  6. Miscellaneous
  7. Physics, Mechanics, & Chemistry
  8. Fine Arts
  9. Useful Arts
  10. Agriculture & Economics
  11. Intelligence

Each article is roughly two to three pages and each contains an “intelligence” section which seems to be general news.  This section continues into the twentieth century, when the journal was more focused on geology, but the intelligence section will talk about important findings of Physics & Chemistry, and other scientific areas.

The journal was founded by Benjamin Silliman and later edited by his son. There is a good overview of the foundation of the journal, and of course multiple references to it, but so far I have not been able to find any articles using a computational approach to analyzing its contents.  In particular, I think it would be a great candidate for the topic modelling and query sampling techniques I have used earlier.  I haven’t done much of this in the past (I intended to do so for the Journal of the American Chemical Society), but this journal may even be a good candidate for a network analysis since it would contain a large number of scientists in the United States and potentially would show the network as it was beginning to split into different disciplines.  Fortunately, there is also over 100 years of textual data available for this journal in the public domain, making it a potentially very rich source.  I am going to see if some initial tests may get some interesting results, and I’m looking forward to seeing whether this journal helps understand the professionalization of science and the origins of the scholarly communication system in even more interesting ways than the Journal of the American Chemical Society has done so far.

Advertisements

Query Sampling Results

After finishing my first test run of query sampling the Journal of the American Chemical Society (JACS) against the Stanford Encyclopedia of Philosophy (SEP), I’m not sure that I can say much meaningful other than there are some potentially interesting questions to ask when I am able to get the data cleaned up more.

The top articles in the query sampling were:

  1. Philosophy of Chemistry
  2. Chaos
  3. Reductionism in Biology
  4. Mechanisms in Science
  5. Models in Science

Article 1 of the SEP at least shows that the query sampling recognized that the articles in JACS were about Chemistry.  Articles 4 & 5 of the SEP may show a recognition that the JACS articles also discuss methodological issues.  Articles 2 & 3 of the SEP are to me the most mysterious.  Article 3 of the SEP may show that the query sampling is picking up on terminology within chemistry (the article is largely about how biology can be reduced to chemistry).  Article 2  of the SEP also discusses positivism and unpredictability within complex systems so again may be picking up on what is largely the experimental procedures within this data.

Also, I tried to see if I could confirm some trends that the query sampling showed with some topic modelling from the InPho Topic Explorer.  For example, here is a quick visualization for the trend (year by year) of the topic for Life.  A score of 10 would mean that “Life” is the number 1 article for that year, a score of 0 would mean that the article does not show up at all.

Picture1  So, “Life does appear as the number 2 article for a few years, but then significantly drops off and by the after 1900 or so becomes an unimportant topic according to this data.

If we do a topic model on words like “organic and protein” which might signify discussion of life, we get this

Picture2

The top of the graph shows the years when the topic of “life” is most prevalent, in this case 1900, and this graph at least does not seem to reflect the same trends as the earlier graph.

One of the big problems here I think is the fact that I only have the data broken out by year.  When I am able to slice off finer chunks of data (like just the methodology articles for certain years), I think I may be able to get more interesting results.  Another problem is the fact that the SEP does not talk much about chemistry, so it might also be interesting to compare this data with other subjects, like physics, that are better covered.  Do physics show similar haphazard trends or do they reflect the historiography of the field better?

In all, I think this is an interesting proof of concept, but would be significantly more interesting with cleaner data and perhaps some comparisons of different subjects.

Query Sampling Journals

One of the other projects I’ve been working on is sampling the corpus of the Journal of the American Chemical Society against the Stanford Encyclopedia of Philosophy (SEP).  I’ve been using some models set up by the INPHO project here at Indiana.  The hope here is to see whether between 1879 and 1922 (the years I can analyze right now), the journal represents the philosophy of chemistry according to what the SEP thinks should be going on.  Here is a quick spreadsheet of the results.  This is organized year by year, and the articles are ordered in how close the query sampling matches to articles in the SEP (i.e. article 1 is closest, article 2 is next closest, and article 10 is relatively far away).  Overall, not surprisingly, the number one result is Chemistry.  That means at least that the Query samples are picking up on the main topic of the journal.  What is more interesting is the articles that follow.  There are not really any patterns that stand out, and, in some cases, like Duhem is the second closest article in 1889 and a bit further down the list in 1891. Duhem is mentioned by name 3 times in one article in 1891 but not at all in 1889.  To me, this seems to be representing two things.  First, there is some bias in the SEP toward particular topics.  Second, at least for earlier periods, standard encyclopedias like the SEP may not be the best sources for how the field is progressing. I’m still thinking more about that second argument, so more on that later.