Share Email Print

Proceedings Paper

Language and dialect identification in social media analysis
Author(s): Stephen Tratz; Douglas Briesch; Jamal Laoudi; Clare Voss; V. Melissa Holland
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user’s social network—patterns that may correspond to socio-cultural aspects such as participants’ perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.

Paper Details

Date Published: 5 June 2014
PDF: 11 pages
Proc. SPIE 9122, Next-Generation Analyst II, 91220K (5 June 2014); doi: 10.1117/12.2059092
Show Author Affiliations
Stephen Tratz, U.S. Army Research Lab. (United States)
Douglas Briesch, U.S. Army Research Lab. (United States)
Jamal Laoudi, ARTI (United States)
Clare Voss, U.S. Army Research Lab. (United States)
V. Melissa Holland, U.S. Army Research Lab. (United States)

Published in SPIE Proceedings Vol. 9122:
Next-Generation Analyst II
Barbara D. Broome; David L. Hall; James Llinas, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?