100 Years of American Science

Untitled-3

Over the past few weeks, I have been working on a project to topic model the American Journal of Science between 1819 (its first year of publication) and 1922; this journal, during much of the 19th century, was the only specialized scientific journal in the United STates.  I can release data sets later, but just wanted to share some preliminary results.  Though this research is far from conclusive, it does provide a useful proof of concept for the method of using topic modelling to determine how genres of material change over a long period of time.  Moreover, understanding this evolution of topics within a single important journal in 19th century America, helps to understand how these topics can provide a useful source of evidence to supplement more traditional historical and “close reading” methods.

The above graph shows that over the entire roughly 100 year period, topics discussing geology are the most dominant topic over time, representing roughly 35% of topics between 1819 and 1922.  Interestingly, however, the “other sciences” are also represented equally at 35%.  Yet, no one of the subtopics within “other sciences” dominates.  Astronomy, Botany, Engineering, Medicine, Meteorology, Physics, and Zoology, individually represent less than 10% of whole.  In any given year, none of these topics represent more than 13%, physics being the only exception which represents 17.5% of topics in 1840.  Chemistry is one major exception.  As a discipline, it represents 13% of the total topics over this 100 year period, and, in individual years within the period, often represents 20% – 25% of topics.  Topics related to news, another important genre of content during most of the 19th century, represent 17% of total topics, and often represent 20% of topics for individual years.   Every issue had a section called Intelligence that was dedicated to news from the field.  Additionally, individual articles, particularly in the earlier years of the journal, would be dedicated to translating articles published abroad and commenting on them and also on publishing letters to the editor that would discuss scientific endeavors both in the U.S. and abroad.

Untitled-1

The topic models also demonstrate some other interesting, though not particularly surprising trends.  Above is a simple line graph showing the number of topics within particular categories; the graph shows that geology topics increase over time, whereas other topics generally decrease.  The graph also shows that until about 1871, “other sciences” were actually significantly higher than geology.  Also in 1871 “other sciences” decline precipitously and geological topics increase and overtake “other sciences.”  Since the American Journal of Science is currently a journal dedicated to geology, one would expect to see this trend.  It is interesting to note, however, that this shift happens in the period from 1871 to 1897.  The 1890s are a period when multiple other scientific professional societies are created, along with related scholarly journals.  For instance the Journal of the American Chemical Society was founded in 1879 and the American Physical Review (journal for the American Physical Society, the society for physicists) began in 1893.  The trend line for chemistry topics also shows a decline during this period.  Clearly more detailed analysis of these topical trends is needed.  Nonetheless, the trends illustrated in this line graph may be evidence of scientists leaving the more generalized American Journal of Science for more specialized journals when they are created.  The decline of “other sciences” does seem to happen at exactly the right period of time.

Untitled-2

Finally, I have one more graph that shows much the same data, however it represents the topics as a percentage of the whole, rather than as raw numbers of topics as shown in the line graph.  This graph of percentages presents some nuance to the picture presented in the line graph.  Geology topics represent fewer than 30% of the entire number of topics in 1819, and that number gradually increases to nearly 40% in 1922.  Conversely, other sciences represent a high of nearly 60% in 1845, but decrease to a low of about 35% in 1922.  Thus, one can see that other sciences are still an important number of topics even as late as 1922.  This could complicate the story about scientists departing to other journals.  It is possible that many scientists, despite the appearance of alternative journals, are still choosing to publish in the American Journal of Science.  Additionally this relatively high percentage of “other science” topics could simply demonstrate that geology is a discipline that requires knowledge of other disciplines such as physics or biology in order to perform geological work.  Again, more research and closer reading of the individual articles represented by these rather broad topics is needed to better understand how individual scientists are responding to a changing scholarly communication landscape.

The gradual decline of other sciences in these graphs may demonstrate that the nature of the authority within the American Journal of Science changed over time.  As other societies created their own authority in competing journals like the Journal of the American Chemical Society, scientists within fields such as physics and chemistry decided to publish their work in those other venues. At the same time, many scientists, particularly geologists, continued to publish in the journal long after the death of Benjamin Silliman, the journal’s founder, in 1864.  Therefore, one has to assume that the journal created a kind of authority that outlasted its founder.  The nature of that authority, most likely, is through the same kinds of trust-building that other journals established, such as affiliation with a professional scientific society, peer-review, and reliance on authors’ credentialing within university hiring systems.  Perhaps the method of topic modelling and text analysis by itself cannot answer the question of how authority is constructed.  Topic modelling can, however, provide a useful source of evidence that identifies trends for further investigation and can be used to further strengthen traditional historical analyses of the history of scholarly communication.

Advertisements

Untangling (American) Academic Publishing

I have recently been reading  Untangling Academic Publishing: A history of the relationship between commercial interests, academic prestige and the circulation of research, an excellent report which I highly recommend to anyone interested scholarly communication, and particularly those interested in looking at historical perspectives on the issues of scholarly publishing.  The report has also been covered by the press in the Guardian and Times Higher Education.  In a very eloquent way, Aileen Fyfe and her team have been able to distill four hundred years worth of academic publishing history in Britain into a clear call for new ways of thinking about scholarly communication.  I can only hope to achieve a fraction of what they have been able to do with my own work on the history of the American academic publishing system.

Largely, I agree with everything stated in this report.  I do, however, wonder how the situation in the US might differ from that in the UK, particularly in the pre-1940 period that I’ve studied more extensively.  In the 19th century, there are at least three key differences between the situation in the UK and the US.  First, the US had a much larger number of institutions of higher education than the UK, and in the late nineteenth century these colleges and universities ranged from small religious seminaries sponsored by a single denomination to large agricultural and mechanical universities sponsored by state governments.  Second, and perhaps more important, there was always a strong emphasis on “practical” knowledge of use to industry rather than the kind of gentlemanly prestige discussed in Untangling Academic Publishing report.  This is not to suggest that there was not some element of prestige capital in US academic publishing during the 19th century, far from it.  It does seem though that the culture of US scholarly publishing, even from the beginning emphasized industrial use, perhaps more than its European counterparts.  Third, and this may be similar to the context in the UK, there was a strong emphasis on “professional” academics as the main market for scholarship from the very beginnings of the academic publishing system.

In the Preface to the first issue of the Transactions of the American Philosophical Society (the rough equivalent of the Philosophical Transactions of the Royal Society within the American colonies)published in 1769, it states that, “Knowledge is of little use when confined to mere speculation:  But when speculative truths are reduced to practice. . . are applied to the common purposes of life; and when by these agriculture is improved, trade enlarged, the arts of living made more easy and comfortable. . . .knowledge then becomes really useful.”  One could also find statements similar to this within the Philosophical Transactions of the Royal Society.  Nonetheless, the emphasis on practicality seems to become more pronounced over time.  In 1818, the preface of the first issue  of Benjamin Silliman’s American Journal of Science (the major scientific journal in the U.S. during much of the 19th century) said that it would focus on certain scientific areas because, “the applications of these sciences are obviously as numerous as physical arts and physical wants; for one of these arts or wants can be named which is not connected with them.”

Moreover, some of the earliest professional associations were strongly tied to industry.  The American Chemical Society, though unique in some respects, was one of the first professional scientific societies to form in the United States in 1876, and many of the early leaders of science within the US were part of the chemical industry.  I have mentioned the work of Andrew Abbott before and his emphasis on the ties between industry and academe, particularly in his book The Chaos of Disciplines.  This linkage between practical knowledge within scientific journals and the industrial emphasis of many of the early professional associations seems to make the situation different from that of the UK where the previous history of “gentlemanly” pursuits was not as strong (though still present in some ways), but professional identifications were arguably very strong.

Why is this focus on industry important? Marcel LaFollette in “Crafting a communications infrastructure:  Scientific and technical publishing in the United States.” in A History of the Book in America:  Print in Motion:  The Expansion of Publishing and Reading in the United States, 1880 – 1940 traces the business of scholarly publishing in the US during the late 19th and early twentieth centuries.  LaFollette suggests that the market for academic publishing in the US was unique  because the consumers and the producers were the same people. This phenomenon created an insularity that encouraged research communities to believe that they owned their content when in fact they did not.  For scholars of the late nineteenth and early twentieth century U.S., universal access to knowledge meant only that professional scientists who served as both producers and consumers of content were able to read the scholarship within their fields.

Thus in the United States context these three characteristics (different configuration of universities, focus on “practical” and industrial knowledge, and focus on academic publishing as part of a profession also tied to industry), somewhat differentiate the US from the UK.  In particular, I am interested in whether the emphasis on “practical”/industrial knowledge does or does not separate the two academic cultures.  Does “open access” , at least in the US historical context, not really mean universal access to knowledge by all citizens, but rather access by professionals who are meeting industrial needs?  If so, then this characteristic has, I think, profound implications for scholarly communication.  It would mean that the ideal of university research was always (at least practically) secondary to the needs of industry.  Thus, the present situation of scholarship being itself a commodity seems a logical continuation of previous trends.  Does this differentiate U.S. science from the U.K.?  More importantly, how should the current U.S. scholarly communication system evolve to meet future needs?

American Journal of Science

untitled-1

(Second Edition of the first volume of the journal, available at from Carnegie Mellon’s digital collection)

Prior to the professional scholarly journal system of today, there was only one major journal for American science,  the American Journal of Science which still exists today and is focused on geology.  In the nineteenth century, however, the journal focused on every scientific topic.  The table of contents for the issues of the first volume (pictured) includes:

  1. Mineralogy, Geology & Topography
  2. Botany
  3. Zoology
  4. Fossil Zoology
  5. Mathematics
  6. Miscellaneous
  7. Physics, Mechanics, & Chemistry
  8. Fine Arts
  9. Useful Arts
  10. Agriculture & Economics
  11. Intelligence

Each article is roughly two to three pages and each contains an “intelligence” section which seems to be general news.  This section continues into the twentieth century, when the journal was more focused on geology, but the intelligence section will talk about important findings of Physics & Chemistry, and other scientific areas.

The journal was founded by Benjamin Silliman and later edited by his son. There is a good overview of the foundation of the journal, and of course multiple references to it, but so far I have not been able to find any articles using a computational approach to analyzing its contents.  In particular, I think it would be a great candidate for the topic modelling and query sampling techniques I have used earlier.  I haven’t done much of this in the past (I intended to do so for the Journal of the American Chemical Society), but this journal may even be a good candidate for a network analysis since it would contain a large number of scientists in the United States and potentially would show the network as it was beginning to split into different disciplines.  Fortunately, there is also over 100 years of textual data available for this journal in the public domain, making it a potentially very rich source.  I am going to see if some initial tests may get some interesting results, and I’m looking forward to seeing whether this journal helps understand the professionalization of science and the origins of the scholarly communication system in even more interesting ways than the Journal of the American Chemical Society has done so far.