Instead of simply selecting the most frequent word submitted by collective thought stream contributors, frequencies can be adjusted according to the frequencies of synonymous words that were also submitted. This helps to ensure that the next word selected for the collective thought stream reflects the general meaning intended by largest possible number of contributors. I downloaded a lexical database for English nouns, verbs, adjectives and adverbs called Wordnet (freely available here), and linked it to an R package of the same name. I then wrote some R code to generate a similarity matrix of all 630 words that were found to follow “this is a” in the Twitter corpus. The similarity measure I used was the proportion of synonyms of a word (plus the word itself) that were shared with each other word. This similarity measure is asymmetric in a way that a word like “exam” can share 100% of itself plus its synonyms with the synonyms of the word “test”, such that every time someone submits “exam”, they basically mean “test”, but by contrast the word “test” might only share 33% of itself plus its synonyms with the synonyms of the word “exam”, so when someone submits “test”, it only counts as a partial count toward the previously submitted word “exam”. Synonym-adjusted relative frequency scores were calculated for each word as the sum of the word frequencies multiplied by shared synonym proportions.
See below for a plot of each of the 630 next words that followed “this is a” in the Twitter corpus, arranged according to their relative frequency of word meanings based on their synonym-adjusted frequency scores.
This figure shows that despite adjustments for the frequencies of synonymous words, the rankings of the two most frequent words (“great”, and “good”) did not change from Plot 1. The ranking of many other words did sometimes change dramatically. The word “neat” jumped to 3rd ranking, despite the word only following “this is a” in a single instance in the Twitter corpus. This is mainly because “neat” shared 72% of its synonyms with “great”, so every instance of the word “great” counted as 0.72 of a vote for “neat”. Similar result occurred for “large” (due to shared synonyms with previously frequent words “great” and “big”), and for “swell” (due to shared synonyms with the two most frequent words, “great” and “good”). This plot clearly demonstrates that shared meanings across many submitted words can be incorporated into the selection of the next word to be added to a collective thought stream.