19th Century Information Use

I’ve finished gathering data on Theophilus Wylie’s personal library and his work as the librarian of Indiana University.  Overall, I think what is interesting, is a clear indication that Wylie seems to have different ideas about what is important to his own work as a scholar and what is important for the library to maintain.

First, some visualization of his personal library.  It contains about 700 books, and thanks to the director of the Wylie House, I have a list of all of the books which are still held at the Wylie House Museum. I went through all of the titles and created some general categories to see what we might say were the most important subjects in the collection.

Wylie_Personal_LibraryReligious subjects are clearly favored with the largest category with near even coverage in Humanistic and Scientific disciplines (with Science having a slight edge), followed by books about education and a few miscellaneous items (like cookbooks).  It seems that Wylie takes his role as a Presbyterian minister quite seriously, and it is likely that many of the religious works helped him prepare his sermons.  Wylie also taught science and languages, with science being his primary subjects in his later career.

There some additional questions though.  Did Wylie collect the same subjects for the Indiana University Library?  If not, how where they different?  Why? There  are a few ways to answer these questions.  Unfortunately no complete catalog of the library exists from Wylie’s tenure as librarian.  The library burned down twice between 1840 and 1880 and many of the records were lost.  There are, however, a few hints.

The first is a catalog that Wylie created of the library in 1842, shortly after he took over as librarian.  It likely does not show much of his collecting interest, but it does show what the subjects of the library were when he took over.  Fortunately there is a dissertation by Mildred Lowell on the History of Indiana University Library which has already done some analysis on this topic.  Instead of re-categorizing the thousands of books held in the library, I mapped her work onto the categories I used for the Wylie’s personal library and this is what the subject categorization looks like.


Clearly there is quite a difference.  The Humanities are very dominant.  The “other” category contains mostly reference works (like dictionaries and encyclopedias of various kinds), and neither science nor religion are particularly well represented.  The question still remains though as to what influence Wylie himself may have had when he collected books for the library.

There are two lists of books Wylie procured for the library both through gift and donation, one of which is available digitally.  Though this is probably not a representative sample containing just over 100 books, it is the best I could find to try and answer this question.  Here is the visualization of that sample.

purchasesAgain there are some interesting difference.  The stress on the humanities seems to be the same.  There is clearly more emphasis on scientific subjects, a slight increase in religious subjects, and some less emphasis on “other” subjects.

In all, it seems like there are some clear differences between what Wylie felt was important for a university library to hold and what it was important for him to use personally.  I am still working through the Indiana University archives which house his papers.  Fortunately there are some existing reports on his activities as librarian and a lecture he gave on books and libraries.  Perhaps there are some hints there about his views on the difference between personal information use and the perceived information needs of the students and faculty of Indiana University.

Do Journals Help Understand Professionalization?

I have been working on trying to make sense of my data about the history of the Journal of the American Chemical Society, and after doing some rough categorization of “expected” (meaning topics that my  historiography on the American Chemical Society specifically mentions) and “unexpected” topics (which are not specifically mentioned in the historiography).  Here is a graph of the expected and unexpected topics year by year.


Roughly 20% of the time (sometimes less, and occasionally more), the topics in the journal are discussing the kinds of issues the historiography describes.  Thus, one might conclude that the historiography does a fairly good job of understanding the history of the field.  What is interesting, however, is how widely divergent the unexpected topics are.    There is tremendous variation which opens up another question.  might there be some external influence that is causing this variation?  The historiography of the society also divides its timelines of the journal by editor.  Therefore, I decided to see what the expected vs. unexpected topics looked like if you viewed them by editorial years (note that there is some overlap between editors)


Here it appears that the number of unexpected nearly doubles during later years.  This might indicate that the journal is indeed influenced by particular editorial policies.  If one breaks out the unexpected topics from these later years, the division looks something like this.


Aside from the appearance of one article on chemistry education, the division of unexpected topics seems to be primarily society business (eg. who is being elected president, who are presiding officers, or where the annual meeting should be held), and methodology (eg. what is chemistry, what kinds of experimental procedures are acceptable).  Generally speaking, when looking at influence of the editorial board, it seems that methdology (somewhat surprisingly) takes a leading role.

I want to return to my original question, however: Do Journals Help Understand Professionalization? (at least in this particular case of one professional society).  On the one hand, I am inclined to say yes.  We actually see how the writers for the journal are deciding on what chemistry is, and though one might think that this would be a more important issue at the start of a journal’s lifetime, the major debates seem to be happening thirty to forty years after the journal’s foundation.

On the other hand, I am also inclined to question this data.  Though I am confident that I have broken down the topics that are not dealing with chemical experiments and reasonable confident that I have been able to separate out society business and methodology topics, I am somewhat skeptical of what this really shows.  I used Mallet to create topic models for years in a journal.  In essence this is showing me not article by article what is being discussed, but rather topics, or general ideas being discussed in the journal.  Thus, my topic models are a somewhat generalized overview of the journal data itself. Furthermore, I have abstracted out that data into even higher topics of expected and unexpected.  Finally, when I divided by editors, certain trends seemed to become even greater (like the doubling of unexpected topics).

What does all of this mean?  I am trying to determine how a professional society defined itself by using their means of communication, the journal.  There could be multiple ways of doing this, and I experimented with topic modelling.  I am very happy that I was able to find some interesting trends, but I wonder how much the generalization of particular articles (and my interpretations on top of that), may be skewing things even more.  As historians think about using big data for interpretation, I think this question becomes even more important.  When we choose a certain method that a computer then models what we’re seeing, and we’re not doing the traditional close readings that historians do, does that methodology then skew our results.  Furthermore, if we then apply our findings to policy or other practical ends, what are the implications of that?  I don’t pretend to have the answers to these questions in a blog post, but I think we nee to be asking these questions, and also thinking about ways of discussing how our data were manipulated in order to make it clear to our colleagues critiquing our work and also to our readers, who may not understand the particularities of dealing with statistical topic modeling or working with historical data sets.

Visually Representing History

I’m beginning to think now about some visual ways of representing what is happening in the Journal of the American Chemical Society.  This presents some interesting challenges.  Though there is some historiography about the society, for the most part, what I have been able to find is in a history of the society written in 1952 in celebration of its 75th anniversary.  From a historian’s point of view, this history has multiple methodological problems (technological determinism, whiggism, full of details without much analysis, and I could go on).  It is however what I have.  Thus, a way I am thinking about representing my topic models is by using this (admittedly problematic) history in order to create a kind of timeline. To put this another way, this history presents a narrative of topical progress between the founding of the society and 1952, or, at least it states what certain chemists thought was topical progress.  My data shows reality (at least within the flagship journal).  So, I think it would be interesting to construct some topics from the 1952 history and then see whether the journal does or doesn’t reflect that reality.

For those of you more visually oriented, as I was trying to think about how to do this, I went to Ted Underwood’s site to some pages about methodology.  I think my visualization might look something like this:


The line  (thought it probably wouldn’t be as curved as this one) would represent the topics that the 1952 history says are happening over a set number of years (in my case 1879 – 1922).  The dots would represent the actual topics.

This visualization would, I think, show whether the 1952 history is accurately reflecting the topics in the Journal itself.

Tied to this, I am also doing query sampling of the same data against the Stanford Encyclopedia of Philosophy (another blog post on that later).  What I hope to answer is a related with a different method.  Does a standard reference work in the field reflect what is actually happening?  So far, not surprisingly, the query sampling shows that my corpus is mostly related to chemistry.  I suspect though that an analysis of the secondary entries will be what is most interesting.