A process-based approach to dynamic social network analysis using Enron e-mail

Analyzing the dynamic interactions in a social network can yield a deeper and more accurate understanding than the traditional, static view.
15 July 2006
Wayne Chung

Common methods of social network analysis (SNA) emphasize the static analysis of role relationships: the roles of different actors are understood based on patterns of transactions conducted over stationary channels that connect them.1–3 Semantic methods for analyzing a stable corpus of transactions dominate the techniques for message classification. Such stationary analyses are poorly suited to identifying behaviors that are potentially sparse but well-coordinated (synchronous). Malevolent organizations such as terrorist cells may use a variety of techniques to escape detection, such as cloaking language and embedding communications into otherwise innocuous background traffic. The sheer quantity of natural interactions of large numbers of actors in an active social network can be computationally daunting.

In this work, we move beyond traditional static methods of SNA to develop a set of dynamic process models that encode various modes of behavior in active social networks. These models will serve as the basis for a new application of the Process Query System (PQS)4 to the identification and tracking of dynamic processes. We are working on a process-driven approach to dynamic social network analysis (DSNA) based on the idea that a network's organizational structure, as well as its characteristic messaging patterns, reflect distributed dynamic processes operating on the network.

A primary goal in applying a process-driven approach to DSNA is to develop techniques and a general framework that can be applied across multiple languages and domains. The PQS technology developed at Dartmouth College provides an ideal platform for performing such process-based analysis. By exploiting the temporal attributes of the transactions, we are able to distinguish likely threads of conversation using only rudimentary content analysis. Temporal attributes of social network transactions include the time, sequence, and rate of communications. Using these dynamic attributes addresses the limited ability of stationary SNA to locate sparse transactional channels, which often reflect key relationships. It should be emphasized, however, that process-driven approaches do not preclude traditional SNA from being applied to the greatly reduced set of transactions.

The dynamic attributes and meta-data of the Enron e-mail dataset were analyzed by PQS in an online fashion (see Figure 1). The e-mail meta-data for each message included the sender, the recipient, and the time it was sent. A primitive bag-of-words model was used to classify e-mails into topic threads for an initial, coarse separation. The e-mails were replayed, classified, and correlation probabilities generated based on an exponential decay-time kernel. The time kernel, classification, and contact chaining were used to generate the conversation segments, which were in turn used as the basis for functional role assignments.


Figure 1. The Trafen Engine, an implementation of Process Query System (PQS) technology, was used to process Enron e-mail.
 

The probabilistic temporal groupings were then used to identify actors' functional roles within a process. The temporal groupings can be modeled as collections of functional primitives that identify the basic role of the actor in a conversation segment. Individuals may play different roles within a social network during different processes, as well as over the life of a process. For example, an actor may initiate an activity, recruit new members into the network, broker active contacts to those members, and then remove him- or herself once a network stabilizes. In more complex processes, the application of functional primitives may represent groups of actors fulfilling roles such as accounting, administration, and human resources.

The functional primitives provide the basic understanding of which role the actor was playing in a conversation. The role label provides information to identify what underlying process is driving communications as well as helping to identify important actors.5By determining who initiates the conversation chain, how the conversation thread grows, and who joins the conversation, we were able to infer information unavailable using traditional SNA. The use of functional roles also allowed us to learn the state of the current social network and how various actors were connected. This use of roles within the social networks allowed us to infer the organizational chart of Enron and distinguish various business units within it. Specifically, we were able to identify various executives as well as correlate them into different business units.

To perform meaningful social network analysis, a means for reducing and filtering massive volumes of data is necessary. In addition, the ability to perform social network analysis in an on-line and predictive fashion would be a great improvement over current methods. The incorporation of dynamics with a process-based approach to social network analysis has allowed us to analyze social networks in real time and establish basic roles and hierarchy with little natural language processing. We will continue this work, expanding on the notion of a process to define higher level roles and to model various group-level processes.


Figure 2. Process-based analysis of the relative timing of messages allowed a specific conversation segment to be extracted from a background of other messages

Author
Wayne Chung
Thayer School of Engineering at Dartmouth College
Hanover, NH

References:
1. S. Wasserman, K. Faust,
Social Network Analysis: Methods and Applications,
Cambridge University Press, New York, 1994. doi:10.2277/0521387078
2. R. Hanneman,
Introduction to Social Network Analysis ,
2001.
Recent News
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research