100 Years of American Science

Untitled-3

Over the past few weeks, I have been working on a project to topic model the American Journal of Science between 1819 (its first year of publication) and 1922; this journal, during much of the 19th century, was the only specialized scientific journal in the United STates.  I can release data sets later, but just wanted to share some preliminary results.  Though this research is far from conclusive, it does provide a useful proof of concept for the method of using topic modelling to determine how genres of material change over a long period of time.  Moreover, understanding this evolution of topics within a single important journal in 19th century America, helps to understand how these topics can provide a useful source of evidence to supplement more traditional historical and “close reading” methods.

The above graph shows that over the entire roughly 100 year period, topics discussing geology are the most dominant topic over time, representing roughly 35% of topics between 1819 and 1922.  Interestingly, however, the “other sciences” are also represented equally at 35%.  Yet, no one of the subtopics within “other sciences” dominates.  Astronomy, Botany, Engineering, Medicine, Meteorology, Physics, and Zoology, individually represent less than 10% of whole.  In any given year, none of these topics represent more than 13%, physics being the only exception which represents 17.5% of topics in 1840.  Chemistry is one major exception.  As a discipline, it represents 13% of the total topics over this 100 year period, and, in individual years within the period, often represents 20% – 25% of topics.  Topics related to news, another important genre of content during most of the 19th century, represent 17% of total topics, and often represent 20% of topics for individual years.   Every issue had a section called Intelligence that was dedicated to news from the field.  Additionally, individual articles, particularly in the earlier years of the journal, would be dedicated to translating articles published abroad and commenting on them and also on publishing letters to the editor that would discuss scientific endeavors both in the U.S. and abroad.

Untitled-1

The topic models also demonstrate some other interesting, though not particularly surprising trends.  Above is a simple line graph showing the number of topics within particular categories; the graph shows that geology topics increase over time, whereas other topics generally decrease.  The graph also shows that until about 1871, “other sciences” were actually significantly higher than geology.  Also in 1871 “other sciences” decline precipitously and geological topics increase and overtake “other sciences.”  Since the American Journal of Science is currently a journal dedicated to geology, one would expect to see this trend.  It is interesting to note, however, that this shift happens in the period from 1871 to 1897.  The 1890s are a period when multiple other scientific professional societies are created, along with related scholarly journals.  For instance the Journal of the American Chemical Society was founded in 1879 and the American Physical Review (journal for the American Physical Society, the society for physicists) began in 1893.  The trend line for chemistry topics also shows a decline during this period.  Clearly more detailed analysis of these topical trends is needed.  Nonetheless, the trends illustrated in this line graph may be evidence of scientists leaving the more generalized American Journal of Science for more specialized journals when they are created.  The decline of “other sciences” does seem to happen at exactly the right period of time.

Untitled-2

Finally, I have one more graph that shows much the same data, however it represents the topics as a percentage of the whole, rather than as raw numbers of topics as shown in the line graph.  This graph of percentages presents some nuance to the picture presented in the line graph.  Geology topics represent fewer than 30% of the entire number of topics in 1819, and that number gradually increases to nearly 40% in 1922.  Conversely, other sciences represent a high of nearly 60% in 1845, but decrease to a low of about 35% in 1922.  Thus, one can see that other sciences are still an important number of topics even as late as 1922.  This could complicate the story about scientists departing to other journals.  It is possible that many scientists, despite the appearance of alternative journals, are still choosing to publish in the American Journal of Science.  Additionally this relatively high percentage of “other science” topics could simply demonstrate that geology is a discipline that requires knowledge of other disciplines such as physics or biology in order to perform geological work.  Again, more research and closer reading of the individual articles represented by these rather broad topics is needed to better understand how individual scientists are responding to a changing scholarly communication landscape.

The gradual decline of other sciences in these graphs may demonstrate that the nature of the authority within the American Journal of Science changed over time.  As other societies created their own authority in competing journals like the Journal of the American Chemical Society, scientists within fields such as physics and chemistry decided to publish their work in those other venues. At the same time, many scientists, particularly geologists, continued to publish in the journal long after the death of Benjamin Silliman, the journal’s founder, in 1864.  Therefore, one has to assume that the journal created a kind of authority that outlasted its founder.  The nature of that authority, most likely, is through the same kinds of trust-building that other journals established, such as affiliation with a professional scientific society, peer-review, and reliance on authors’ credentialing within university hiring systems.  Perhaps the method of topic modelling and text analysis by itself cannot answer the question of how authority is constructed.  Topic modelling can, however, provide a useful source of evidence that identifies trends for further investigation and can be used to further strengthen traditional historical analyses of the history of scholarly communication.

Advertisements

Professionalization and Combining Methods

In an earlier post I discussed some topic modeling I did on the Journal of the American Chemical Society (JACS).  That research showed that post 1892 (about 11 years after the journal begins publishing in 1879), there appeared to be a significant increase in discussion of methodology, society business, and other topics not directly associated with chemistry experiments.  Though I thought this was an interesting finding, at the same time I thought that it was best not to make too much out of this result.

Why should I not treat the results of this topic model as significant?  Topic modeling is, after all, an abstraction of the data.  I had the full text of all material from  JACS, and I then asked a computer to find which words had a statistically significant probability of appearing next to each other.  After doing that, I then categorized the data into “unexpected” topics (or topics on methods, society business, etc.) and “expected” topics (chemistry experiments of various kinds).  So, in essence I was dealing with an abstraction of an abstraction.  Thus, it seemed best not to say that this was a significant result when in reality it could have just been an artifact of my categorization of topic models.

I am beginning to change my mind on my earlier instinct, however.  Why? Just recently, I completed some additional statistical tests.  Recently, I created an additional data set comprising a sample of words from these topic models.  It contained 74 words which I thought might best signify discussion of “unexpected”/non chemistry topics.  I included words such as president, committee, election which would likely only show up in discussions of society business.  I also a few words like method which admittedly could appear both in chemistry articles and in articles about methodology of chemistry.  I then created a word frequency list for all of these words and subdivided them into two groups.  One group contained the 11 years prior to 1892 (from the journal’s beginning in 1879).  The other group contained the 11 years from 1892 to 1903.  My hope was to see if there was any kind of significant difference in these word frequencies right around the year (1892) my earlier graph showed that “unexpected topics were increasing.

Using SPSS, I compared these two groups using a dependent t-test.  My t-critical value (the number that determines whether the test was statistically significant) was 1.6.  My t-calculated (the number that measures whether the means of the two groups are statistically different from each other) was 7.6 with an effect size (measure of magnitude between two means) of 0.89.  Therefore I can say that there is actually quite a significant difference between the word frequencies of these two groups.  Word frequencies for words about society business and methods increase significantly post 1892.

What does all of this statistical work really do for me?  First, I think that these statistical tests show that the topic models (and my categorizations) actually did show that something important was happening in the journal.  Indeed it seems that the journal is publishing more about methods and society business after 1892.  Furthermore, I think that combining methods like topic modeling and statistical methods can prove quite useful.  Nonetheless, I think that traditional humanistic methods can also be important.  My next step will be to go back to the articles where these words appear and see what they are talking about.  So, these other computational and quantitative methods helped me to discover a pattern in the journals that otherwise I would likely never have noticed.  I look forward to seeing where this research goes.

American Journal of Science

untitled-1

(Second Edition of the first volume of the journal, available at from Carnegie Mellon’s digital collection)

Prior to the professional scholarly journal system of today, there was only one major journal for American science,  the American Journal of Science which still exists today and is focused on geology.  In the nineteenth century, however, the journal focused on every scientific topic.  The table of contents for the issues of the first volume (pictured) includes:

  1. Mineralogy, Geology & Topography
  2. Botany
  3. Zoology
  4. Fossil Zoology
  5. Mathematics
  6. Miscellaneous
  7. Physics, Mechanics, & Chemistry
  8. Fine Arts
  9. Useful Arts
  10. Agriculture & Economics
  11. Intelligence

Each article is roughly two to three pages and each contains an “intelligence” section which seems to be general news.  This section continues into the twentieth century, when the journal was more focused on geology, but the intelligence section will talk about important findings of Physics & Chemistry, and other scientific areas.

The journal was founded by Benjamin Silliman and later edited by his son. There is a good overview of the foundation of the journal, and of course multiple references to it, but so far I have not been able to find any articles using a computational approach to analyzing its contents.  In particular, I think it would be a great candidate for the topic modelling and query sampling techniques I have used earlier.  I haven’t done much of this in the past (I intended to do so for the Journal of the American Chemical Society), but this journal may even be a good candidate for a network analysis since it would contain a large number of scientists in the United States and potentially would show the network as it was beginning to split into different disciplines.  Fortunately, there is also over 100 years of textual data available for this journal in the public domain, making it a potentially very rich source.  I am going to see if some initial tests may get some interesting results, and I’m looking forward to seeing whether this journal helps understand the professionalization of science and the origins of the scholarly communication system in even more interesting ways than the Journal of the American Chemical Society has done so far.

Do Journals Help Understand Professionalization?

I have been working on trying to make sense of my data about the history of the Journal of the American Chemical Society, and after doing some rough categorization of “expected” (meaning topics that my  historiography on the American Chemical Society specifically mentions) and “unexpected” topics (which are not specifically mentioned in the historiography).  Here is a graph of the expected and unexpected topics year by year.

expected_vs_unexpected_yearly2

Roughly 20% of the time (sometimes less, and occasionally more), the topics in the journal are discussing the kinds of issues the historiography describes.  Thus, one might conclude that the historiography does a fairly good job of understanding the history of the field.  What is interesting, however, is how widely divergent the unexpected topics are.    There is tremendous variation which opens up another question.  might there be some external influence that is causing this variation?  The historiography of the society also divides its timelines of the journal by editor.  Therefore, I decided to see what the expected vs. unexpected topics looked like if you viewed them by editorial years (note that there is some overlap between editors)

expected_vs_unexpected_editor3

Here it appears that the number of unexpected nearly doubles during later years.  This might indicate that the journal is indeed influenced by particular editorial policies.  If one breaks out the unexpected topics from these later years, the division looks something like this.

unexpected_breakdown2

Aside from the appearance of one article on chemistry education, the division of unexpected topics seems to be primarily society business (eg. who is being elected president, who are presiding officers, or where the annual meeting should be held), and methodology (eg. what is chemistry, what kinds of experimental procedures are acceptable).  Generally speaking, when looking at influence of the editorial board, it seems that methdology (somewhat surprisingly) takes a leading role.

I want to return to my original question, however: Do Journals Help Understand Professionalization? (at least in this particular case of one professional society).  On the one hand, I am inclined to say yes.  We actually see how the writers for the journal are deciding on what chemistry is, and though one might think that this would be a more important issue at the start of a journal’s lifetime, the major debates seem to be happening thirty to forty years after the journal’s foundation.

On the other hand, I am also inclined to question this data.  Though I am confident that I have broken down the topics that are not dealing with chemical experiments and reasonable confident that I have been able to separate out society business and methodology topics, I am somewhat skeptical of what this really shows.  I used Mallet to create topic models for years in a journal.  In essence this is showing me not article by article what is being discussed, but rather topics, or general ideas being discussed in the journal.  Thus, my topic models are a somewhat generalized overview of the journal data itself. Furthermore, I have abstracted out that data into even higher topics of expected and unexpected.  Finally, when I divided by editors, certain trends seemed to become even greater (like the doubling of unexpected topics).

What does all of this mean?  I am trying to determine how a professional society defined itself by using their means of communication, the journal.  There could be multiple ways of doing this, and I experimented with topic modelling.  I am very happy that I was able to find some interesting trends, but I wonder how much the generalization of particular articles (and my interpretations on top of that), may be skewing things even more.  As historians think about using big data for interpretation, I think this question becomes even more important.  When we choose a certain method that a computer then models what we’re seeing, and we’re not doing the traditional close readings that historians do, does that methodology then skew our results.  Furthermore, if we then apply our findings to policy or other practical ends, what are the implications of that?  I don’t pretend to have the answers to these questions in a blog post, but I think we nee to be asking these questions, and also thinking about ways of discussing how our data were manipulated in order to make it clear to our colleagues critiquing our work and also to our readers, who may not understand the particularities of dealing with statistical topic modeling or working with historical data sets.

Computational vs. Traditional Methods

After doing some further work on my topic models for the Journal of the American Chemical Society I began to think a bit about the methods we use for doing history and how computational analyses fit into this.  I know I’m not the first to think about these issues, and, in fact, I have already thought about them briefly just in this project.  Nonetheless, at the risk of excessive navel gazing, what I found interesting is that even with extensive topic models for every year, I still had to go topic by topic and do some organization of the topics according to my interpretation of a history of the American Chemical Society in a long, cumbersome, and manual process.  Thus what started out as a somewhat quantitative project based on numbers of words and putting them into topics became an extremely qualitative and very subjective analysis.  Once I have had a chance to clean up my spreadsheets, I can share them on this blog.  Suffice it to say, however, that in the end I still had to resort to the same kinds of methods that historians and social scientists use when analyzing data, that is categorizing things in a way that makes sense to an individual scholar.  When I was thinking about how to analyze this data, I thought about all kinds of ways of trying to come up with statistical methods or writing scripts that could do the tasks I wanted to do more objectively.  In the end though I could not think about a way of doing those things that ultimately would get at the question I wanted to answer.

My goal with this project has changed over time.  Originally I wanted to determine if there was a network between editors and authors.  There was no meaningful one that I could find.  My goal now was to determine what these topics I had meant, and whether they reflected conventional historiography about the history of the American Chemical Society.  I couldn’t topic model a history book in any meaningful way I could think of and even if I did, trying to measure those topics against the journal topics would kind of be like measuring apples against oranges.  Therefore, I decided to read the history book (or at least the bits relevant to the journal), create my own topic categories, and then manually assign each of the topics Mallet so kindly found for me to a category.

Admittedly doing this work over such a large corpus would have been impossible without computational methods.  In the long run,though, I still resorted to the old fashioned way of making sense out of this information.  Do all computational methods in the end come down to human sense-making?  Perhaps they do, but as we think about how quantitative and qualitative methods interact, this seems to me an interesting example.  Mining a corpus of textual articles is certainly quantitative.  In the end, however, it took qualitative analysis to really attempt to understand what was going on.

Finding Patterns in Scientific Journals

The last topic models I showed for the Journal of the American Chemical Society showed topics across the entire corpus I have (all issues between 1879 and 1922). Now, I have been working on seeing if there are any patterns in the topics from year to year.  Since I ran a 20 topic model using Mallet, the list is quite long, so I created another page for those who want to look at the original data.  For now, I’ll just summarize what I think is happening.  First a few general points.

  • The word molecule first appears within a topic in 1883 and in 1891 it appears in three different topics.  It continues to appear throughout the corpus but not regularly.
  • The word atom first appears in a topic for 1880 and seems to appear more regularly than molecule.
  • The word patent first appears in 1884 but then does not show up all that frequently (only 6 times within the topic models and only until 1892).
  • Many years, though not all, also have topics that seem to pertain to the business of the society with words like journal, meeting, or city names.  Interestingly this was also one of the topics in the overall model, but it is interesting to see how the topic seems to be more dominant in some years than in others.
  • The word method shows up in the topics practically every year and seems to appear more frequently in the earlier years of the journal.

These are just some general observations from admittedly someone who is not trained as a chemist.  There may be other interesting issues that might be clearer to a trained eye.  For my next steps on this project I intend to look at two sources on the history of Chemistry:

There may be other sources, but I think I can at least try to show a proof of concept on these two. Hopefully, there is some way to measure what the topic models are showing against what these more general histories say is happening in the history of the society and in chemistry more generally.

Analyzing Some Preliminary Topic Modelling

I’ve started some work on topic modelling, using at the moment two programs: the InPho Topic Explorer (http://inphodata.cogs.indiana.edu/) from Indiana University’s Cognitive Science Program and Mallet (http://mallet.cs.umass.edu/index.php).  I’m also open to other suggestions for potentially trying out different methods, but this gets me started.  Right now, I’m still using some sample files, but look forward to trying these out on the entire corpus of the Journal of the American Chemical Society once I have the data.

From InPho, the topics I get are

Topic 0 per, found, precipitate, sulphuric, weight, substance, soluble, liquid, thus, total, nitric, hydrochloric, sulphate, value, gas, described, second, grams, oxygen, follows
Topic 1 per, new, chemical, iron, copper, ore, action, author, cent, review, germany, oil, see, assignor, research, richards, york, parts, zinc, analyses
Topic 2 gram, cent, per, sulphur, gave, method, methods, ten, hydrochloric, determinations, steel, tungsten, proteid, ammonium, cement, weighed, matter, oil, sulphide, aluminum
Topic 3 calc, sulfate, yield, secretary, sec, since, ave, held, electrons, prepared, system, boiling, atoms, product, curve, series, benzene, sulfuric, conductivity, lewis
Topic 4 sugar, action, etc, author, c.c, upon, heated, acids, oil, chem, grms, abstracts, obtained, liquid, air, oxygen, gas, ozone, lime, soda
Topic 5 new, form, table, per, weight, meeting, cent, slightly, der, book, tube, hydrogen, conductivity, west, sulphate, sec, sulphuric, estimation, oxygen, magnesium
Topic 6 section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university
Topic 7 hydrogen, section, equation, concentration, ion, measurements, mercury, therefore, university, content, ann, curves, increase, theory, concentrations, points, derivatives, velocity, ions, room
Topic 8 cent, per, found, made, first, grams, soluble, sulphate, added, precipitate, small, containing, weight, much, sulphuric, gas, ether, would, form, substance
Topic 9 found, liquid, substance, ether, soluble, hydrochloric, thus, precipitate, gas, described, alcohol, value, second, alkali, glass, oxygen, acid, ethyl, material, specific
Topic 10 cent, much, hydrogen, grams, containing, ether, experiment, dried, fact, even, upon, form, place, separated, treated, various, following, soil, shall, magnesium
Topic 11 made, first, would, small, added, shown, pure, could, must, use, well, due, part, dioxide, order, table, gives, organic, precipitated, concentrated
Topic 12 acid, solution, water, one, results, method, may, two, chloride, used, potassium, alcohol, also, sodium, amount, temperature, time, present, experiments, salt
Topic 13 obtained, given, upon, nitrogen, following, mixture, chemical, crystals, chemistry, copper, conditions, heating, preparation, dilute, oxide, dry, case, error, known, iron
Topic 14 form, value, chair, constant, sulfuric, fig, table, chloride, temperature, experimental, solid, mixture, sulfide, subs, values, heat, carbon, hydroxide, slightly, journal
Topic 15 weight, values, reaction, solutions, point, much, concentration, pressure, fact, even, containing, temperature, equilibrium, sodium, experiment, dried, acids, melting, case, iodide
Topic 16 per, acids, heated, calculated, laboratory, journal, hydroxide, society, true, book, contain, cause, values, received, satisfactory, showing, special, year, manganese, show
Topic 17 form, table, constant, sulfuric, two, solution, slightly, value, cell, grams, surface, gave, solid, meeting, negative, phase, experimental, fig, hydroxide, however
Topic 18 grams, cent, gram, book, value, per, der, calculated, general, values, physical, und, sulfate, constant, inorganic, normal, die, ion, hydrazine, salts
Topic 19 water, new, milk, meeting, chemical, process, one, iron, apparatus, york, carbon, society, mass, read, fat, analysis, lime, secretary, furnace, use

From Mallet I get

topicId words..
1 chemical chemistry society american journal dr book work committee general
2 solutions concentration solution salts salt ion solubility ions conductivity cell
3 acid ch acids nh ester ii methyl ethyl obtained acetic
4 sugar experiments time action effect reaction starch amount rate power
5 theory atoms number surface oxygen molecules form hydrogen energy case
6 color oil red obtained mixture liquid yellow white water small
7 compounds compound reaction action group bromine chloride carbon derivatives formed
8 cc solution acid water added precipitate hydrochloric dissolved filtered excess
9 cent nitrogen gram results grams total amount weight sample found
10 alcohol water ether soluble solution found salt crystals melting acid
11 chem milk fat oil extract protein oils ash composition soc
12 method results made methods determination error obtained standard determinations found
13 work present paper fact great part study made question view
14 chloride potassium sodium silver solution acid ammonia ammonium nitrate oxide
15 analysis determination water soil organic matter plant chemical analyses methods
16 temperature values table pressure point heat equation constant data vapor
17 weight io atomic oo ii lead arsenic separation series oxide
18 st section meeting city mass pa secretary ave university york
19 iron process der steel copper coal gas gold ore furnace
20 tube air gas apparatus glass water temperature platinum liquid mercury

As I said in a previous post, many of these results are not surprising.  There are, however, some topics that merit some further analysis.

In InPho some names come up in topic 1 and 3 (richards and lewis) which could signal some important people in the field and worth seeing if they show up in other more focused contexts.  Also topic 6, which has the words “section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university” has very few words that have much to do with chemistry (aside from sodium, chemistry, and sulphur).  My guess is that these have to do with meetings or with business of the society.  Unhelpfully, the names in this topic seem to be first names, which might just indicate that there are lots of people named Charles and John.  I also wonder whether this might just be a catchall topic.  For instance the words American, Society, and Report would, I think, come up in almost any issue of the journal.

In Mallet, the topics that come up are actually quite different, though substantively similar I think.  The first topic is, I think, a catchall with words that probably show up in every journal article, like “American Chemical Society.”  Nonetheless, I also think that “committee” is an interesting one in that topic, probably reflecting the reports of various committees that show up in the sections of the journal that report the activities of the society.  Topic 13 is also interesting with “work present paper fact great part study made question view” that do not seem to be discussing chemistry directly but rather seem to be talking about the work of doing chemistry (like publishing).

In addition to thinking about the network of people and how they are institutionalizing the journal, I also think it might be interesting to look at these topics as a kind of window into the philosophy of chemistry.  In other words, what are chemists talking about, and more importantly, does this match up with what historians and philosophers of science say what was going on in the nineteenth century.  It might be interesting to see if I can match up concepts from the Stanford Encyclopedia of Philsophy (http://plato.stanford.edu/entries/chemistry/) with the topic models I’m getting.  More on that in a later post.

In all, I think that these are some exciting preliminary results, and I look forward to doing some more in-depth topic modelling to see whether things change over time, see if I can understand more about the individual authors (and who is publishing on certain topics), and also see whether historians of chemistry are accurately understanding the topics of the day (at least in terms of what the flagship journal seems to think important).

Processing (Continued) and Moving Toward Topic Modelling

I’m still working on the issues of processing the corpus of text that I have, and will hopefully be able to finish that sometime next week, which moves me on to what will be my next step:  topic modelling the corpus of the Journal of the American Chemical Society between 1879 and 1922.

Based on some sample files, there is some good news.  The topics I seem to be getting mention acids, bases, chemical compounds, and the kinds of things I would expect to see in a topic model of a chemistry journal, and there are no extremely strange topics that I would not expect to see.  That, I think tells me that the text will be good enough to move forward and do some good mining.

On a side note with my processing I have also been extracting all of the tables of contents from the journal.  Ideally this should be done automatically but I’ve been doing it manually so that I can put some editorial notes in various parts of my spreadsheet (which I will share when I’ve finished).  For now, the spreadsheet contains a list of all of the officers of the American Chemical Society separated out by year.  Surprising (at least to me) is the fact that there is not as much overlap as I would have expected.  Some officers do continue to serve year after year, but there is actually a fairly high turnover.  New officers seem to come in every year.  The spreadsheet also contains every author in the journal between those years, what articles they’ve published, whether I consider them “prolific” (i.e. published many articles), and if there is any information about them in Wikipedia.  If someone knows of a more comprehensive database, specifically for chemists, let me know; so far, I’m not seeing many of the early authors/officers listed in Wikipedia.  This spreadsheet, I hope, can serve as a guide while I’m processing and hopefully can tell me if I make any significant errors when I start dividing up articles and years in the larger corpus.

All of this is a preface to try and get to the question I’m asking.  What is the network of scientists involved in the journal, and are the officers/editorial board influencing the content in any measurable way?  Originally, I had thought that a spreadsheet like the one I’m creating would help to answer this question.  I had thought that editors of the journal would be some of the most prolific authors, and I thought that there would be a significant continuity of officers over this time period.  I had not anticipated so many unique authors contributing to the journal, nor had I thought that the officers of the association would turn over as frequently.

There may still be a way to get at the question I’m asking, though.  I think that by topic modelling the corpus and seeing if particular authors are tied to particular topics, that may at least help to answer whether specific people have more influence over the journal’s content than others.  Also, I’m sure others have tried to tie Wikipedia information to networks like this.  Like I said, so far I’m not finding many scientists who have Wikipedia entries, though that may change as I move further into the twentieth century.  Perhaps even if I can find authors who have high influence over the corpus and a Wikipedia entry that may tell me something.

In any case, that’ s where I am at the moment, and if there are thoughts about what might be useful to do (before I move into heavy duty processing of lots of files), let me know.