A Closer Look At The Language Library Behind Sosolimited's ReConstitution 2012

<p>Cindy Chung from the University of Texas in Austin explains how they made the Presidential Debate-analyzing library as objective as possible.</p>

Kathleen Flood

Sosolimited‘s real-time web app deconstructing the Presidential and Vice Presidential debates, ReConstitution 2012, has launched as part of our new Digital Gallery just in time for the second Presidential debate tonight. If you’ve played around with it all (click here if you haven’t yet), you may be wondering how the language development team from University of Texas Austin chose to categorize certain words as positive, negative, lying-by-the-seat-of-their-pants, etc.

Language is inherently subjective, but there’s a lot more meaning behind word choice than we usually think about when just listening to a person speak. Sosolimited’s John Rothenberg took the Presidential Debates, “a high-stakes verbal situation,” as a jumping off point, explaining “You can log into the site and see a flowing transcript and see all kinds of statistical and graphical explosions of information… it’s like a second lens into the debate.”

The app is designed to find frequent words and phrases, trending topics in the debate, reveal how deceptive or authentic one of the candidates is speaking, and show how the candidates compare to each other and other past Presidential candidates. “It’s not designed to be this super serious, analytical tool,” says Rothenberg.

We shot off a few questions to Cindy Chung, a fellow in the Department of Psychology at The University of Texas Austin, who along with Professor James Pennebaker, helped develop the language-analyzing software Sosolimited utilizes in their app.

The Creators Project: What originally drew you to studying the meaning behind language?
Cindy Chung:
I was one of those nerdy kids that actually enjoyed looking up definitions in a dictionary or synonyms in a thesaurus. It was so satisfying to be able to express what I was thinking or feeling in just the right word. In university, I was introduced to social psychology, which is the systematic study of how people interact and influence one another. What's cool about social psychologists is that they take ordinary, everyday behaviors and study them systematically. What they find out about people and their relationships turns out to be quite extraordinary—people sometimes behave in ways that are not in line with their motivations, reasoning, and remembering for those behaviors.

What I found peculiar was that language is the primary means by which people interact and influence one another. Unfortunately, spoken or written language have historically been given little attention in social psychology because it's a lot of work to collect and to analyze large samples of written or spoken text. But with computer programs and digital text available on the internet, there are entirely new ways for social psychologists to study the meanings behind the words we use in everyday life.

Just like most other studies in social psychology, you find the same extraordinary thing with language; what people say may not be what they mean. But by studying their words systematically, and relating their uses of words to questionnaires and behaviors, you can discover how people really might be thinking or feeling, relating to others, and perhaps predict how they will behave in the future. I came to work at The University of Texas at Austin with Professor Jamie Pennebaker because he is the world's leading expert on linking people's word use to their personalities and psychological states. And this is where I was trained to be a social psychologist and professional word counter to understand the meanings behind language.

How did you apply your research to Sosolimited’s ReConstitution 2012? How did the collaboration unfold?
Sosolimited had read about Professor Jamie Pennebaker's work on relating words to psychological states when he was the featured scientist in the New York Times. At the time, Sosolimited had been receiving raving press reviews, also from the New York Times, about their live remix of the 2010 UK Prime Minister debates. They made the connection to Pennebaker's work. Justin Manor, a partner at Sosolimited, contacted Pennebaker with the goal of making ReConstitution 2012 more about revealing the underlying psychological meanings behind words. Knowing that I love language-based data visualizations, Pennebaker engaged me in a conversation with Manor about the kinds of things you could discover about a person based on their word use.

Using Linguistic Inquiry and Word Count (LIWC—pronounced “Luke”), a computerized word counting program developed by Pennebaker, he and I had analyzed the personalities of the 2004 U.S. Presidential and Vice Presidential candidates with Professor Rich Slatcher, now at Wayne State University, and Dr. Lori Stone. We had applied various algorithms to the candidate's word use that previous research had shown to distinguish young vs. old people, men vs. women, truths vs. lies, depressed vs. non-depressed, etc. The results of our analyses told us who had spoken most like a woman (Edwards), who had spoken in the least honest way (Bush), who had spoken most like a depressed person (Kerry), and who had spoken most like a past President (Cheney).

Drawing from the 2004 analysis, our blog on previous political campaigns, and from Pennebaker's popular book The Secret Life of Pronouns, Sosolimited selected the most reliable and interesting features that you could tell about people based on their word use, keeping in mind the artistic goals of the project. They were professional and diligent about making sure that each feature that they were reporting on was grounded in scientific research over several months of discussion with Pennebaker and me. They invited me to their studio in Boston to apply algorithms to previous candidates's language in debates so that they could understand the meaning and maths of LIWC features, and replicate the results on their own machines. Being a fan of language-based data visualizations, being at the Sosolimited studio was like having a backstage pass to my favorite rock band—only much nerdier. I was so impressed by their intellectual curiosity, their artistic eye to detail, and their professional and collaborative energy. They drew on their previous ingenious work of taking real-time, closed-captioning feeds straight into language processing modules to produce the most elegant works of digital art. ReConstitution 2012 is fun, smart, mischievous, and a beautiful representation of research on the psychology of language.

Assuming you want the library to stay as neutral as possible, how did you go about assigning a positive or negative weight to words… especially more ambiguous ones?
LIWC is a software program that is made up of a processor and a dictionary. The processor matches words in a given text to words in its dictionary. The LIWC dictionary was initially developed by Professor Jamie Pennebaker to assess texts for psychological features. So, the dictionary includes categories of words relating to psychological constructs (e.g., positive and negative emotion categories, social word categories, an insight category, etc.). It also includes words related to topics such as achievement, leisure, death, etc. In developing the LIWC dictionary, Pennebaker ensured that at least three of four judges agreed that each word belonged to its assigned category across several rounds of ratings.

Finally, since they are already categorized, grammatical categories called function words (e.g., articles, prepositions, pronouns, auxiliary verbs etc.) were added to the LIWC dictionary. Function words, and especially pronouns, tend to appear as the most reliable correlates of psychological states in many studies using LIWC. Today there are many papers that show the psychological functions of function words. This research using LIWC, psychological and behavioral assessments, and statistical models reveals who we are and how we relate to others based on the words we use in everyday life. This research is summarized in Pennebaker's book, The Secret Life of Pronouns, and it is the basis of the visualizations in ReConstitution 2012.

Do you think the meaning behind word choice and the way the information is presented in this project has the potential to sway voters? Was that ever an intention?
It was never the intention to sway voters. The intention was to reveal psychological features that might otherwise be difficult to detect. ReConstitution 2012 is revealing without having given Presidential candidates a battery of psychological questionnaires, a polygraph, truth serum, or a mood ring. The intention was to objectively summarize personality traits and psychological states of a candidate relative to the other guy without knowing a priori who the “winner” would be on any of the traits, even if it would make “our guy” (whoever he is) seem less attractive on some dimensions. That's the science for you. And the art of ReConstitution 2012 is that it engages and sparks curiosity in the most pretty and clever ways. Whether or not the project sways votes remains an empirical question…

Tune in to during the last two debates (Tuesday, October 16th 9:00 PM EST & Monday, October 22nd 9:00 PM EST) to see the system in action live, or anytime afterwards for a recap.