Analyzing Some Preliminary Topic Modelling

I’ve started some work on topic modelling, using at the moment two programs: the InPho Topic Explorer ( from Indiana University’s Cognitive Science Program and Mallet (  I’m also open to other suggestions for potentially trying out different methods, but this gets me started.  Right now, I’m still using some sample files, but look forward to trying these out on the entire corpus of the Journal of the American Chemical Society once I have the data.

From InPho, the topics I get are

Topic 0 per, found, precipitate, sulphuric, weight, substance, soluble, liquid, thus, total, nitric, hydrochloric, sulphate, value, gas, described, second, grams, oxygen, follows
Topic 1 per, new, chemical, iron, copper, ore, action, author, cent, review, germany, oil, see, assignor, research, richards, york, parts, zinc, analyses
Topic 2 gram, cent, per, sulphur, gave, method, methods, ten, hydrochloric, determinations, steel, tungsten, proteid, ammonium, cement, weighed, matter, oil, sulphide, aluminum
Topic 3 calc, sulfate, yield, secretary, sec, since, ave, held, electrons, prepared, system, boiling, atoms, product, curve, series, benzene, sulfuric, conductivity, lewis
Topic 4 sugar, action, etc, author, c.c, upon, heated, acids, oil, chem, grms, abstracts, obtained, liquid, air, oxygen, gas, ozone, lime, soda
Topic 5 new, form, table, per, weight, meeting, cent, slightly, der, book, tube, hydrogen, conductivity, west, sulphate, sec, sulphuric, estimation, oxygen, magnesium
Topic 6 section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university
Topic 7 hydrogen, section, equation, concentration, ion, measurements, mercury, therefore, university, content, ann, curves, increase, theory, concentrations, points, derivatives, velocity, ions, room
Topic 8 cent, per, found, made, first, grams, soluble, sulphate, added, precipitate, small, containing, weight, much, sulphuric, gas, ether, would, form, substance
Topic 9 found, liquid, substance, ether, soluble, hydrochloric, thus, precipitate, gas, described, alcohol, value, second, alkali, glass, oxygen, acid, ethyl, material, specific
Topic 10 cent, much, hydrogen, grams, containing, ether, experiment, dried, fact, even, upon, form, place, separated, treated, various, following, soil, shall, magnesium
Topic 11 made, first, would, small, added, shown, pure, could, must, use, well, due, part, dioxide, order, table, gives, organic, precipitated, concentrated
Topic 12 acid, solution, water, one, results, method, may, two, chloride, used, potassium, alcohol, also, sodium, amount, temperature, time, present, experiments, salt
Topic 13 obtained, given, upon, nitrogen, following, mixture, chemical, crystals, chemistry, copper, conditions, heating, preparation, dilute, oxide, dry, case, error, known, iron
Topic 14 form, value, chair, constant, sulfuric, fig, table, chloride, temperature, experimental, solid, mixture, sulfide, subs, values, heat, carbon, hydroxide, slightly, journal
Topic 15 weight, values, reaction, solutions, point, much, concentration, pressure, fact, even, containing, temperature, equilibrium, sodium, experiment, dried, acids, melting, case, iodide
Topic 16 per, acids, heated, calculated, laboratory, journal, hydroxide, society, true, book, contain, cause, values, received, satisfactory, showing, special, year, manganese, show
Topic 17 form, table, constant, sulfuric, two, solution, slightly, value, cell, grams, surface, gave, solid, meeting, negative, phase, experimental, fig, hydroxide, however
Topic 18 grams, cent, gram, book, value, per, der, calculated, general, values, physical, und, sulfate, constant, inorganic, normal, die, ion, hydrazine, salts
Topic 19 water, new, milk, meeting, chemical, process, one, iron, apparatus, york, carbon, society, mass, read, fat, analysis, lime, secretary, furnace, use

From Mallet I get

topicId words..
1 chemical chemistry society american journal dr book work committee general
2 solutions concentration solution salts salt ion solubility ions conductivity cell
3 acid ch acids nh ester ii methyl ethyl obtained acetic
4 sugar experiments time action effect reaction starch amount rate power
5 theory atoms number surface oxygen molecules form hydrogen energy case
6 color oil red obtained mixture liquid yellow white water small
7 compounds compound reaction action group bromine chloride carbon derivatives formed
8 cc solution acid water added precipitate hydrochloric dissolved filtered excess
9 cent nitrogen gram results grams total amount weight sample found
10 alcohol water ether soluble solution found salt crystals melting acid
11 chem milk fat oil extract protein oils ash composition soc
12 method results made methods determination error obtained standard determinations found
13 work present paper fact great part study made question view
14 chloride potassium sodium silver solution acid ammonia ammonium nitrate oxide
15 analysis determination water soil organic matter plant chemical analyses methods
16 temperature values table pressure point heat equation constant data vapor
17 weight io atomic oo ii lead arsenic separation series oxide
18 st section meeting city mass pa secretary ave university york
19 iron process der steel copper coal gas gold ore furnace
20 tube air gas apparatus glass water temperature platinum liquid mercury

As I said in a previous post, many of these results are not surprising.  There are, however, some topics that merit some further analysis.

In InPho some names come up in topic 1 and 3 (richards and lewis) which could signal some important people in the field and worth seeing if they show up in other more focused contexts.  Also topic 6, which has the words “section, city, ave, american, mass, chicago, william, water, charles, sodium, john, report, society, chemistry, washington, steel, philadelphia, members, sulphur, university” has very few words that have much to do with chemistry (aside from sodium, chemistry, and sulphur).  My guess is that these have to do with meetings or with business of the society.  Unhelpfully, the names in this topic seem to be first names, which might just indicate that there are lots of people named Charles and John.  I also wonder whether this might just be a catchall topic.  For instance the words American, Society, and Report would, I think, come up in almost any issue of the journal.

In Mallet, the topics that come up are actually quite different, though substantively similar I think.  The first topic is, I think, a catchall with words that probably show up in every journal article, like “American Chemical Society.”  Nonetheless, I also think that “committee” is an interesting one in that topic, probably reflecting the reports of various committees that show up in the sections of the journal that report the activities of the society.  Topic 13 is also interesting with “work present paper fact great part study made question view” that do not seem to be discussing chemistry directly but rather seem to be talking about the work of doing chemistry (like publishing).

In addition to thinking about the network of people and how they are institutionalizing the journal, I also think it might be interesting to look at these topics as a kind of window into the philosophy of chemistry.  In other words, what are chemists talking about, and more importantly, does this match up with what historians and philosophers of science say what was going on in the nineteenth century.  It might be interesting to see if I can match up concepts from the Stanford Encyclopedia of Philsophy ( with the topic models I’m getting.  More on that in a later post.

In all, I think that these are some exciting preliminary results, and I look forward to doing some more in-depth topic modelling to see whether things change over time, see if I can understand more about the individual authors (and who is publishing on certain topics), and also see whether historians of chemistry are accurately understanding the topics of the day (at least in terms of what the flagship journal seems to think important).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s