
Ah, sarcasm. It’s a guilty pleasure few of us would likely admit we enjoy, but have almost certainly employed at one point or another. It’s a verbal catharsis, a momentary sour-lemon bitterness, a passing scorn to ease a flare of frustration or score a stinging humorous point. In my work as a software developer, I’ve joked with friends about how useful a sarcasm HTML tag would be, akin to the tags that enable us to display online text as bold or italic or underlined. I’ve also heard people pine for a “sarcasm font,” some typeface that would effectively convey a healthy dose of verbal irony. There’s even a fully realized punctuation mark, the SarcMark, available for download and perfect for anyone concerned that the irony of their text might not come across sufficiently to a recipient. Just tack this little mark onto the end of your sarcastic sentences like a little dry wit flag and voila, no more embarrassing apologies for misunderstood missives!
Wanting some way to express sarcasm in text-based communication is not a new idea – while probably not limited to my professional community, it’s something for which many of us who work with computers and code seem to find ample need. We are, apparently, a pretty sardonic bunch. In fact, someone out in the Twitterverse proposed, with an unknown level of tongue-in-cheekedness, that the W3C (the international community responsible for developing Web standards) include a <sarcasm> tag as part of the specification for the latest HTML standard. The W3C tweeted back with good humor, while basically informing this person they probably shouldn’t hold their breath.
Sarcasm is a form of indirect speech, a sophisticated construct where the real message gets conveyed implicitly through a combination of verbal and non-verbal cues. It often involves saying the opposite of what you really mean, and is usually easiest to recognize in face-to-face verbal dialogue. It’s a fairly ambiguous form of communication, but is, curiously, quite common in the nuance-poor online world. Comments on blog posts, customer reviews of products and other opinionated, user-generated content are often written with an acerbic tongue. But text is a tricky platform for implicit speech. It can be hard enough for humans to recognize sarcasm in text form, let alone computers, which is why it’s notable that an Israeli research team has actually developed a machine algorithm that can recognize sarcasm.
This novel sardonic bloodhound is called SASI, a Semi-supervised Algorithm for Sarcasm Identification. SASI can recognize sarcasm in online customer reviews with 77 percent accuracy. In developing the algorithm, the research team of Oren Tsur, Dmitry Davidov and Ari Rappoport looked at 66,000 product reviews on Amazon.com for 120 products including books, music players, digital cameras, camcorders, GPS devices, e-readers, game consoles, and mobile phones*. They had three people tag sentences for sarcasm to create a small seed of 80 sentences of annotated data. The team identified sarcastic patterns in the reviews, ranked the sentences by level of sarcasm and created a classification algorithm. They applied the algorithm against the seed set, which helped it learn words and patterns that distinguish sarcastic remarks. The algorithm achieved 81% recognition in the pattern acquisition phase.
Having put SASI through basic training, the team introduced it into an evaluation set of reviews containing completely new sarcastic sentences. Each sentence in this new set had again been classified for sarcasm by those three human annotators. Using what it had learned in the pattern acquisition phase, SASI achieved 77 percent precision for sniffing out sarcasm in the new data, and just over 81 percent pattern evaluation efficiency. Not perfect, but it’s easily better than some of my more literal friends.
So, does this matter? Do we care about a computer being able to recognize sarcasm? It’s actually pretty interesting from a commercial point of view — user preference studies suggest some users dislike sarcastic product reviews, finding them biased, while others actually prefer them. Therefore, the ability to identify sarcasm in reviews could improve the personalization of content ranking and recommendation systems. Review summarization and opinion-mining systems that attempt to aggregate public sentiment could also benefit if sarcasm could be identified and not included in the average scores, where its often opposite meaning would skew the results inappropriately.

In the meantime, I’m sure I’ll keep finding uses for that <sarcasm> HTML tag in my occasional forays into playful snarkiness and can only hope the W3C someday comes to its senses and realizes what a boon this would be to online communication. As the great acerbic master Groucho Marx once said: “A child of five could understand this. Fetch me a child of five.”
Ah, sarcasm.
References:
- ICWSM – A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews. Tsur, Davidov, and Rappoport.
Oh, and if you’re interested, Shure and Sony noise-cancellation earphones, Dan Brown’s “Da Vinci Code” and Amazon’s Kindle e-reader attracted the most sarcastic comments.

Caroline Sober

Latest posts by Caroline Sober (see all)
- “Look, Ma! No Needles!”: Is An Immunization Revolution Close at Hand? - June 21, 2013
- Phylo: A Crowdsourced, Beautiful Biodiversity Game - May 22, 2013
- A Sea Lion in Boogie Wonderland - April 24, 2013
Down with the SarcMark!
Punctuation for sarcasm must be free, standards-compliant, and historically accurate. Join the revolution to free sarcasm from the capitalist chains of Sarcasm, Inc., by punctuating your sarcastic sentences with ¡
More info at http://opensarcasm.org