I’ve started some work on topic modelling, using at the moment two programs: the InPho Topic Explorer (http://inphodata.cogs.indiana.edu/) from Indiana University’s Cognitive Science Program and Mallet (http://mallet.cs.umass.edu/index.php). I’m also open to other suggestions for potentially trying out different methods, but this gets me started. Right now, I’m still using some sample files, but look forward to trying these out on the entire corpus of the Journal of the American Chemical Society once I have the data.
From InPho, the topics I get are
|Topic 0||per, found, precipitate, sulphuric, weight, substance, soluble, liquid, thus, total, nitric, hydrochloric, sulphate, value, gas, described, second, grams, oxygen, follows|
|Topic 1||per, new, chemical, iron, copper, ore, action, author, cent, review, germany, oil, see, assignor, research, richards, york, parts, zinc, analyses|
|Topic 2||gram, cent, per, sulphur, gave, method, methods, ten, hydrochloric, determinations, steel, tungsten, proteid, ammonium, cement, weighed, matter, oil, sulphide, aluminum|
|Topic 3||calc, sulfate, yield, secretary, sec, since, ave, held, electrons, prepared, system, boiling, atoms, product, curve, series, benzene, sulfuric, conductivity, lewis|
|Topic 4||sugar, action, etc, author, c.c, upon, heated, acids, oil, chem, grms, abstracts, obtained, liquid, air, oxygen, gas, ozone, lime, soda|
|Topic 5||new, form, table, per, weight, meeting, cent, slightly, der, book, tube, hydrogen, conductivity, west, sulphate, sec, sulphuric, estimation, oxygen, magnesium|
|Topic 6||section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university|
|Topic 7||hydrogen, section, equation, concentration, ion, measurements, mercury, therefore, university, content, ann, curves, increase, theory, concentrations, points, derivatives, velocity, ions, room|
|Topic 8||cent, per, found, made, first, grams, soluble, sulphate, added, precipitate, small, containing, weight, much, sulphuric, gas, ether, would, form, substance|
|Topic 9||found, liquid, substance, ether, soluble, hydrochloric, thus, precipitate, gas, described, alcohol, value, second, alkali, glass, oxygen, acid, ethyl, material, specific|
|Topic 10||cent, much, hydrogen, grams, containing, ether, experiment, dried, fact, even, upon, form, place, separated, treated, various, following, soil, shall, magnesium|
|Topic 11||made, first, would, small, added, shown, pure, could, must, use, well, due, part, dioxide, order, table, gives, organic, precipitated, concentrated|
|Topic 12||acid, solution, water, one, results, method, may, two, chloride, used, potassium, alcohol, also, sodium, amount, temperature, time, present, experiments, salt|
|Topic 13||obtained, given, upon, nitrogen, following, mixture, chemical, crystals, chemistry, copper, conditions, heating, preparation, dilute, oxide, dry, case, error, known, iron|
|Topic 14||form, value, chair, constant, sulfuric, fig, table, chloride, temperature, experimental, solid, mixture, sulfide, subs, values, heat, carbon, hydroxide, slightly, journal|
|Topic 15||weight, values, reaction, solutions, point, much, concentration, pressure, fact, even, containing, temperature, equilibrium, sodium, experiment, dried, acids, melting, case, iodide|
|Topic 16||per, acids, heated, calculated, laboratory, journal, hydroxide, society, true, book, contain, cause, values, received, satisfactory, showing, special, year, manganese, show|
|Topic 17||form, table, constant, sulfuric, two, solution, slightly, value, cell, grams, surface, gave, solid, meeting, negative, phase, experimental, fig, hydroxide, however|
|Topic 18||grams, cent, gram, book, value, per, der, calculated, general, values, physical, und, sulfate, constant, inorganic, normal, die, ion, hydrazine, salts|
|Topic 19||water, new, milk, meeting, chemical, process, one, iron, apparatus, york, carbon, society, mass, read, fat, analysis, lime, secretary, furnace, use|
From Mallet I get
|1||chemical chemistry society american journal dr book work committee general|
|2||solutions concentration solution salts salt ion solubility ions conductivity cell|
|3||acid ch acids nh ester ii methyl ethyl obtained acetic|
|4||sugar experiments time action effect reaction starch amount rate power|
|5||theory atoms number surface oxygen molecules form hydrogen energy case|
|6||color oil red obtained mixture liquid yellow white water small|
|7||compounds compound reaction action group bromine chloride carbon derivatives formed|
|8||cc solution acid water added precipitate hydrochloric dissolved filtered excess|
|9||cent nitrogen gram results grams total amount weight sample found|
|10||alcohol water ether soluble solution found salt crystals melting acid|
|11||chem milk fat oil extract protein oils ash composition soc|
|12||method results made methods determination error obtained standard determinations found|
|13||work present paper fact great part study made question view|
|14||chloride potassium sodium silver solution acid ammonia ammonium nitrate oxide|
|15||analysis determination water soil organic matter plant chemical analyses methods|
|16||temperature values table pressure point heat equation constant data vapor|
|17||weight io atomic oo ii lead arsenic separation series oxide|
|18||st section meeting city mass pa secretary ave university york|
|19||iron process der steel copper coal gas gold ore furnace|
|20||tube air gas apparatus glass water temperature platinum liquid mercury|
As I said in a previous post, many of these results are not surprising. There are, however, some topics that merit some further analysis.
In InPho some names come up in topic 1 and 3 (richards and lewis) which could signal some important people in the field and worth seeing if they show up in other more focused contexts. Also topic 6, which has the words “section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university” has very few words that have much to do with chemistry (aside from sodium, chemistry, and sulphur). My guess is that these have to do with meetings or with business of the society. Unhelpfully, the names in this topic seem to be first names, which might just indicate that there are lots of people named Charles and John. I also wonder whether this might just be a catchall topic. For instance the words American, Society, and Report would, I think, come up in almost any issue of the journal.
In Mallet, the topics that come up are actually quite different, though substantively similar I think. The first topic is, I think, a catchall with words that probably show up in every journal article, like “American Chemical Society.” Nonetheless, I also think that “committee” is an interesting one in that topic, probably reflecting the reports of various committees that show up in the sections of the journal that report the activities of the society. Topic 13 is also interesting with “work present paper fact great part study made question view” that do not seem to be discussing chemistry directly but rather seem to be talking about the work of doing chemistry (like publishing).
In addition to thinking about the network of people and how they are institutionalizing the journal, I also think it might be interesting to look at these topics as a kind of window into the philosophy of chemistry. In other words, what are chemists talking about, and more importantly, does this match up with what historians and philosophers of science say what was going on in the nineteenth century. It might be interesting to see if I can match up concepts from the Stanford Encyclopedia of Philsophy (http://plato.stanford.edu/entries/chemistry/) with the topic models I’m getting. More on that in a later post.
In all, I think that these are some exciting preliminary results, and I look forward to doing some more in-depth topic modelling to see whether things change over time, see if I can understand more about the individual authors (and who is publishing on certain topics), and also see whether historians of chemistry are accurately understanding the topics of the day (at least in terms of what the flagship journal seems to think important).