After doing some further work on my topic models for the Journal of the American Chemical Society I began to think a bit about the methods we use for doing history and how computational analyses fit into this. I know I’m not the first to think about these issues, and, in fact, I have already thought about them briefly just in this project. Nonetheless, at the risk of excessive navel gazing, what I found interesting is that even with extensive topic models for every year, I still had to go topic by topic and do some organization of the topics according to my interpretation of a history of the American Chemical Society in a long, cumbersome, and manual process. Thus what started out as a somewhat quantitative project based on numbers of words and putting them into topics became an extremely qualitative and very subjective analysis. Once I have had a chance to clean up my spreadsheets, I can share them on this blog. Suffice it to say, however, that in the end I still had to resort to the same kinds of methods that historians and social scientists use when analyzing data, that is categorizing things in a way that makes sense to an individual scholar. When I was thinking about how to analyze this data, I thought about all kinds of ways of trying to come up with statistical methods or writing scripts that could do the tasks I wanted to do more objectively. In the end though I could not think about a way of doing those things that ultimately would get at the question I wanted to answer.
My goal with this project has changed over time. Originally I wanted to determine if there was a network between editors and authors. There was no meaningful one that I could find. My goal now was to determine what these topics I had meant, and whether they reflected conventional historiography about the history of the American Chemical Society. I couldn’t topic model a history book in any meaningful way I could think of and even if I did, trying to measure those topics against the journal topics would kind of be like measuring apples against oranges. Therefore, I decided to read the history book (or at least the bits relevant to the journal), create my own topic categories, and then manually assign each of the topics Mallet so kindly found for me to a category.
Admittedly doing this work over such a large corpus would have been impossible without computational methods. In the long run,though, I still resorted to the old fashioned way of making sense out of this information. Do all computational methods in the end come down to human sense-making? Perhaps they do, but as we think about how quantitative and qualitative methods interact, this seems to me an interesting example. Mining a corpus of textual articles is certainly quantitative. In the end, however, it took qualitative analysis to really attempt to understand what was going on.