Next-Generation Analyst II

Front Matter: Volume 9122

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9122 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.

Automatic theory generation from analyst text files using coherence networks

Steven C. Shaffer

Show abstract

This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.

Using Complex Event Processing (CEP) and vocal synthesis techniques to improve comprehension of sonified human-centric data

Jeff Rimland, Mark Ballora

Show abstract

The field of sonification, which uses auditory presentation of data to replace or augment visualization techniques, is gaining popularity and acceptance for analysis of “big data” and for assisting analysts who are unable to utilize traditional visual approaches due to either: 1) visual overload caused by existing displays; 2) concurrent need to perform critical visually intensive tasks (e.g. operating a vehicle or performing a medical procedure); or 3) visual impairment due to either temporary environmental factors (e.g. dense smoke) or biological causes. Sonification tools typically map data values to sound attributes such as pitch, volume, and localization to enable them to be interpreted via human listening. In more complex problems, the challenge is in creating multi-dimensional sonifications that are both compelling and listenable, and that have enough discrete features that can be modulated in ways that allow meaningful discrimination by a listener. We propose a solution to this problem that incorporates Complex Event Processing (CEP) with speech synthesis. Some of the more promising sonifications to date use speech synthesis, which is an "instrument" that is amenable to extended listening, and can also provide a great deal of subtle nuance. These vocal nuances, which can represent a nearly limitless number of expressive meanings (via a combination of pitch, inflection, volume, and other acoustic factors), are the basis of our daily communications, and thus have the potential to engage the innate human understanding of these sounds. Additionally, recent advances in CEP have facilitated the extraction of multi-level hierarchies of information, which is necessary to bridge the gap between raw data and this type of vocal synthesis. We therefore propose that CEP-enabled sonifications based on the sound of human utterances could be considered the next logical step in human-centric "big data" compression and transmission.

A data fusion approach to indications and warnings of terrorist attacks

David McDaniel, Gregory Schaefer

Show abstract

Indications and Warning (I&W) of terrorist attacks, particularly IED attacks, require detection of networks of agents and patterns of behavior. Social Network Analysis tries to detect a network; activity analysis tries to detect anomalous activities. This work builds on both to detect elements of an activity model of terrorist attack activity – the agents, resources, networks, and behaviors. The activity model is expressed as RDF triples statements where the tuple positions are elements or subsets of a formal ontology for activity models. The advantage of a model is that elements are interdependent and evidence for or against one will influence others so that there is a multiplier effect. The advantage of the formality is that detection could occur hierarchically, that is, at different levels of abstraction. The model matching is expressed as a likelihood ratio between input text and the model triples. The likelihood ratio is designed to be analogous to track correlation likelihood ratios common in JDL fusion level 1. This required development of a semantic distance metric for positive and null hypotheses as well as for complex objects. The metric uses the Web 1Terabype database of one to five gram frequencies for priors. This size requires the use of big data technologies so a Hadoop cluster is used in conjunction with OpenNLP natural language and Mahout clustering software. Distributed data fusion Map Reduce jobs distribute parts of the data fusion problem to the Hadoop nodes. For the purposes of this initial testing, open source models and text inputs of similar complexity to terrorist events were used as surrogates for the intended counter-terrorist application.

Warfighter information services: lessons learned in the intelligence domain

S. E. Bray

Show abstract

A vision was presented in a previous paper of how a common set of services within a framework could be used to provide all the information processing needs of Warfighters. Central to that vision was the concept of a “Virtual Knowledge Base”. The paper presents an implementation of these ideas in the intelligence domain. Several innovative technologies were employed in the solution, which are presented and their benefits explained. The project was successful, validating many of the design principles for such a system which had been proposed in earlier work. Many of these principles are discussed in detail, explaining lessons learned. The results showed that it is possible to make vast improvements in the ability to exploit available data, making it discoverable and queryable wherever it is from anywhere within a participating network; and to exploit machine reasoning to make faster and better inferences from available data, enabling human analysts to spend more of their time doing more difficult analytical tasks rather than searching for relevant data. It was also demonstrated that a small number of generic Information Processing services can be combined and configured in a variety of ways (without changing any software code) to create “fact-processing” workflows, in this case to create different intelligence analysis capabilities. It is yet to be demonstrated that the same generic services can be reused to create analytical/situational awareness capabilities for logistics, operations, planning or other military functions but this is considered likely.

A survey of automated methods for sensemaking support

James Llinas

Show abstract

Complex, dynamic problems in general present a challenge for the design of analysis support systems and tools largely because there is limited reliable a priori procedural knowledge descriptive of the dynamic processes in the environment. Problem domains that are non-cooperative or adversarial impute added difficulties involving suboptimal observational data and/or data containing the effects of deception or covertness. The fundamental nature of analysis in these environments is based on composite approaches involving mining or foraging over the evidence, discovery and learning processes, and the synthesis of fragmented hypotheses; together, these can be labeled as sensemaking procedures. This paper reviews and analyzes the features, benefits, and limitations of a variety of automated techniques that offer possible support to sensemaking processes in these problem domains.

Neural network based visualization of collaborations in a citizen science project

Alessandra M. M. Morais, Rafael D. C. Santos, M. Jordan Raddick

Show abstract

Citizen science projects are those in which volunteers are asked to collaborate in scientific projects, usually by volunteering idle computer time for distributed data processing efforts or by actively labeling or classifying information - shapes of galaxies, whale sounds, historical records are all examples of citizen science projects in which users access a data collecting system to label or classify images and sounds. In order to be successful, a citizen science project must captivate users and keep them interested on the project and on the science behind it, increasing therefore the time the users spend collaborating with the project. Understanding behavior of citizen scientists and their interaction with the data collection systems may help increase the involvement of the users, categorize them accordingly to different parameters, facilitate their collaboration with the systems, design better user interfaces, and allow better planning and deployment of similar projects and systems. Users behavior can be actively monitored or derived from their interaction with the data collection systems. Records of the interactions can be analyzed using visualization techniques to identify patterns and outliers. In this paper we present some results on the visualization of more than 80 million interactions of almost 150 thousand users with the Galaxy Zoo I citizen science project. Visualization of the attributes extracted from their behaviors was done with a clustering neural network (the Self-Organizing Map) and a selection of icon- and pixel-based techniques. These techniques allows the visual identification of groups of similar behavior in several different ways.

Visualizing common operating picture of critical infrastructure

Lauri Rummukainen, Lauri Oksama, Jussi Timonen, et al.

Show abstract

This paper presents a solution for visualizing the common operating picture (COP) of the critical infrastructure (CI). The purpose is to improve the situational awareness (SA) of the strategic-level actor and the source system operator in order to support decision making. The information is obtained through the Situational Awareness of Critical Infrastructure and Networks (SACIN) framework. The system consists of an agent-based solution for gathering, storing, and analyzing the information, and a user interface (UI) is presented in this paper. The UI consists of multiple views visualizing information from the CI in different ways. Different CI actors are categorized in 11 separate sectors, and events are used to present meaningful incidents. Past and current states, together with geographical distribution and logical dependencies, are presented to the user. The current states are visualized as segmented circles to represent event categories. Geographical distribution of assets is displayed with a well-known map tool. Logical dependencies are presented in a simple directed graph, and users also have a timeline to review past events. The objective of the UI is to provide an easily understandable overview of the CI status. Therefore, testing methods, such as a walkthrough, an informal walkthrough, and the Situation Awareness Global Assessment Technique (SAGAT), were used in the evaluation of the UI. Results showed that users were able to obtain an understanding of the current state of CI, and the usability of the UI was rated as good. In particular, the designated display for the CI overview and the timeline were found to be efficient.

Visualization of multi-INT fusion data using Java Viewer (JVIEW)

Erik Blasch, Alex Aved, James Nagy, et al.

Show abstract

Visualization is important for multi-intelligence fusion and we demonstrate issues for presenting physics-derived (i.e., hard) and human-derived (i.e., soft) fusion results. Physics-derived solutions (e.g., imagery) typically involve sensor measurements that are objective, while human-derived (e.g., text) typically involve language processing. Both results can be geographically displayed for user-machine fusion. Attributes of an effective and efficient display are not well understood, so we demonstrate issues and results for filtering, correlation, and association of data for users - be they operators or analysts. Operators require near-real time solutions while analysts have the opportunities of non-real time solutions for forensic analysis. In a use case, we demonstrate examples using the JVIEW concept that has been applied to piloting, space situation awareness, and cyber analysis. Using the open-source JVIEW software, we showcase a big data solution for multi-intelligence fusion application for context-enhanced information fusion.

A visual analytic framework for data fusion in investigative intelligence

Guoray Cai, Geoff Gross, James Llinas, et al.

Show abstract

Intelligence analysis depends on data fusion systems to provide capabilities of detecting and tracking important objects, events, and their relationships in connection to an analytical situation. However, automated data fusion technologies are not mature enough to offer reliable and trustworthy information for situation awareness. Given the trend of increasing sophistication of data fusion algorithms and loss of transparency in data fusion process, analysts are left out of the data fusion process cycle with little to no control and confidence on the data fusion outcome. Following the recent rethinking of data fusion as human-centered process, this paper proposes a conceptual framework towards developing alternative data fusion architecture. This idea is inspired by the recent advances in our understanding of human cognitive systems, the science of visual analytics, and the latest thinking about human-centered data fusion. Our conceptual framework is supported by an analysis of the limitation of existing fully automated data fusion systems where the effectiveness of important algorithmic decisions depend on the availability of expert knowledge or the knowledge of the analyst’s mental state in an investigation. The success of this effort will result in next generation data fusion systems that can be better trusted while maintaining high throughput.

Human terrain exploitation suite: applying visual analytics to open source information.

Timothy Hanratty, John Richardson, Mark Mittrick, et al.

Show abstract

This paper presents the concept development and demonstration of the Human Terrain Exploitation Suite (HTES) under development at the U.S. Army Research Laboratory’s Tactical Information Fusion Branch. The HTES is an amalgamation of four complementary visual analytic capabilities that target the exploitation of open source information. Open source information, specifically news feeds, blogs and other social media, provide a unique opportunity to collect and examine salient topics and trends. Analysis of open source information provides valuable insights into determining opinions, values, cultural nuances and other socio-political aspects within a military area of interest. The early results of the HTES field study indicate that the tools greatly increased the analysts’ ability to exploit open source information, but improvement through greater cross-tool integration and correlation of their results is necessary for further advances.

Profile-based autonomous data feeding: an approach to the information retrieval problem in a high communications latency environment

Jeremy Straub

Show abstract

This paper proposes the use of user profiles for data selection and prioritization for transmission. This approach has three parts. First, a profile can be created for an individual user. This may provide the best results; however, it requires transmitting a separate profile up for each prospective user. Second, user correspondence with a set of profiles can be tracked. Finally, this can be extended to match a user not just with a single profile but with (possibly different) profiles for each dimension tracked. The benefits of each of these approaches are discussed and the implementation pathway is considered.

Exploiting social media for Army operations: Syrian crisis use case

Sue E. Kase, Elizabeth K. Bowman, Tanvir Al Amin, et al.

Show abstract

Millions of people exchange user-generated information through online social media (SM) services. The prevalence of SM use globally and its growing significance to the evolution of events has attracted the attention of the Army and other agencies charged with protecting national security interests. The information exchanged in SM sites and the networks of people who interact with these online communities can provide value to Army intelligence efforts. SM could facilitate the Military Decision Making Process by providing ongoing assessment of military actions from a local citizen perspective. Despite potential value, there are significant technological barriers to leveraging SM. SM collection and analysis are difficult in the dynamic SM environment and deception is a real concern. This paper introduces a credibility analysis approach and prototype fact-finding technology called the “Apollo Fact-finder” that mitigates the problem of inaccurate or falsified SM data. Apollo groups data into sets (or claims), corroborating specific observations, then iteratively assesses both claim and source credibility resulting in a ranking of claims by likelihood of occurrence. These credibility analysis approaches are discussed in the context of a conflict event, the Syrian civil war, and applied to tweets collected in the aftermath of the Syrian chemical weapons crisis.

A qualitative readiness-requirements assessment model for enterprise big-data infrastructure investment

Mohammed M. Olama, Allen W. McNair, Sreenivas R. Sukumar, et al.

Show abstract

In the last three decades, there has been an exponential growth in the area of information technology providing the information processing needs of data-driven businesses in government, science, and private industry in the form of capturing, staging, integrating, conveying, analyzing, and transferring data that will help knowledge workers and decision makers make sound business decisions. Data integration across enterprise warehouses is one of the most challenging steps in the big data analytics strategy. Several levels of data integration have been identified across enterprise warehouses: data accessibility, common data platform, and consolidated data model. Each level of integration has its own set of complexities that requires a certain amount of time, budget, and resources to implement. Such levels of integration are designed to address the technical challenges inherent in consolidating the disparate data sources. In this paper, we present a methodology based on industry best practices to measure the readiness of an organization and its data sets against the different levels of data integration. We introduce a new Integration Level Model (ILM) tool, which is used for quantifying an organization and data system’s readiness to share data at a certain level of data integration. It is based largely on the established and accepted framework provided in the Data Management Association (DAMADMBOK). It comprises several key data management functions and supporting activities, together with several environmental elements that describe and apply to each function. The proposed model scores the maturity of a system’s data governance processes and provides a pragmatic methodology for evaluating integration risks. The higher the computed scores, the better managed the source data system and the greater the likelihood that the data system can be brought in at a higher level of integration.

Utilizing semantic Wiki technology for intelligence analysis at the tactical edge

Eric Little

Show abstract

Challenges exist for intelligence analysts to efficiently and accurately process large amounts of data collected from a myriad of available data sources. These challenges are even more evident for analysts who must operate within small military units at the tactical edge. In such environments, decisions must be made quickly without guaranteed access to the kinds of large-scale data sources available to analysts working at intelligence agencies. Improved technologies must be provided to analysts at the tactical edge to make informed, reliable decisions, since this is often a critical collection point for important intelligence data. To aid tactical edge users, new types of intelligent, automated technology interfaces are required to allow them to rapidly explore information associated with the intersection of hard and soft data fusion, such as multi-INT signals, semantic models, social network data, and natural language processing of text. Abilities to fuse these types of data is paramount to providing decision superiority. For these types of applications, we have developed BLADE. BLADE allows users to dynamically add, delete and link data via a semantic wiki, allowing for improved interaction between different users. Analysts can see information updates in near-real-time due to a common underlying set of semantic models operating within a triple store that allows for updates on related data points from independent users tracking different items (persons, events, locations, organizations, etc.). The wiki can capture pictures, videos and related information. New information added directly to pages is automatically updated in the triple store and its provenance and pedigree is tracked over time, making that data more trustworthy and easily integrated with other users’ pages.

User-centric incentive design for participatory mobile phone sensing

Wei Gao, Haoyang Lu

Show abstract

Mobile phone sensing is a critical underpinning of pervasive mobile computing, and is one of the key factors for improving people’s quality of life in modern society via collective utilization of the on-board sensing capabilities of people’s smartphones. The increasing demands for sensing services and ambient awareness in mobile environments highlight the necessity of active participation of individual mobile users in sensing tasks. User incentives for such participation have been continuously offered from an application-centric perspective, i.e., as payments from the sensing server, to compensate users’ sensing costs. These payments, however, are manipulated to maximize the benefits of the sensing server, ignoring the runtime flexibility and benefits of participating users. This paper presents a novel framework of user-centric incentive design, and develops a universal sensing platform which translates heterogenous sensing tasks to a generic sensing plan specifying the task-independent requirements of sensing performance. We use this sensing plan as input to reduce three categories of sensing costs, which together cover the possible sources hindering users’ participation in sensing.

Conversational sensing

Alun Preece, Chris Gwilliams, Christos Parizas, et al.

Show abstract

Recent developments in sensing technologies, mobile devices and context-aware user interfaces have made it pos- sible to represent information fusion and situational awareness for Intelligence, Surveillance and Reconnaissance (ISR) activities as a conversational process among actors at or near the tactical edges of a network. Motivated by use cases in the domain of Company Intelligence Support Team (CoIST) tasks, this paper presents an approach to information collection, fusion and sense-making based on the use of natural language (NL) and controlled nat- ural language (CNL) to support richer forms of human-machine interaction. The approach uses a conversational protocol to facilitate a ow of collaborative messages from NL to CNL and back again in support of interactions such as: turning eyewitness reports from human observers into actionable information (from both soldier and civilian sources); fusing information from humans and physical sensors (with associated quality metadata); and assisting human analysts to make the best use of available sensing assets in an area of interest (governed by man- agement and security policies). CNL is used as a common formal knowledge representation for both machine and human agents to support reasoning, semantic information fusion and generation of rationale for inferences, in ways that remain transparent to human users. Examples are provided of various alternative styles for user feedback, including NL, CNL and graphical feedback. A pilot experiment with human subjects shows that a prototype conversational agent is able to gather usable CNL information from untrained human subjects.

Using cognitive architectures to study issues in team cognition in a complex task environment

Paul R. Smart, Katia Sycara, Yuqing Tang

Show abstract

Cognitive social simulation is a computer simulation technique that aims to improve our understanding of the dynamics of socially-situated and socially-distributed cognition. This makes cognitive social simulation techniques particularly appealing as a means to undertake experiments into team cognition. The current paper reports on the results of an ongoing effort to develop a cognitive social simulation capability that can be used to undertake studies into team cognition using the ACT-R cognitive architecture. This capability is intended to support simulation experiments using a team-based problem solving task, which has been used to explore the effect of different organizational environments on collective problem solving performance. The functionality of the ACT-R-based cognitive social simulation capability is presented and a number of areas of future development work are outlined. The paper also describes the motivation for adopting cognitive architectures in the context of social simulation experiments and presents a number of research areas where cognitive social simulation may be useful in developing a better understanding of the dynamics of team cognition. These include the use of cognitive social simulation to study the role of cognitive processes in determining aspects of communicative behavior, as well as the impact of communicative behavior on the shaping of task-relevant cognitive processes (e.g., the social shaping of individual and collective memory as a result of communicative exchanges). We suggest that the ability to perform cognitive social simulation experiments in these areas will help to elucidate some of the complex interactions that exist between cognitive, social, technological and informational factors in the context of team-based problem-solving activities.

Language and dialect identification in social media analysis

Stephen Tratz, Douglas Briesch, Jamal Laoudi, et al.

Show abstract

Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user’s social network—patterns that may correspond to socio-cultural aspects such as participants’ perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.

Application of the JDL data fusion process model to hard/soft information fusion in the condition monitoring of aircraft

Joseph T. Bernardo

Show abstract

Hard/soft information fusion has been proposed as a way to enhance diagnostic capability for the condition monitoring of machinery. However, there is a limited understanding of where hard/soft information fusion could and should be applied in the condition monitoring of aircraft. Condition-based maintenance refers to the philosophy of performing maintenance when the need arises, based upon indicators of deterioration in the condition of the machinery. The addition of the multisensory capability of human cognition to electronic sensors may create a fuller picture of machinery condition. Since 1988, the Joint Directors of Laboratories (JDL) data fusion process model has served as a framework for information fusion research. Advances are described in the application of hard/soft information fusion in condition monitoring using terms that condition-based maintenance professionals in aviation will recognize. Emerging literature on hard/soft information fusion in condition monitoring is organized into the levels of the JDL data fusion process model. Gaps in the literature are identified, and the author’s ongoing research is discussed. Future efforts will focus on building domain-specific frameworks and experimental design, which may provide a foundation for improving flight safety, increasing mission readiness, and reducing the cost of maintenance operations.

Predicting student success using analytics in course learning management systems

Mohammed M. Olama, Gautam Thakur, Allen W. McNair, et al.

Show abstract

Educational data analytics is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. For example, predicting college student performance is crucial for both the student and educational institutions. It can support timely intervention to prevent students from failing a course, increasing efficacy of advising functions, and improving course completion rate. In this paper, we present the efforts carried out at Oak Ridge National Laboratory (ORNL) toward conducting predictive analytics to academic data collected from 2009 through 2013 and available in one of the most commonly used learning management systems, called Moodle. First, we have identified the data features useful for predicting student outcomes such as students’ scores in homework assignments, quizzes, exams, in addition to their activities in discussion forums and their total GPA at the same term they enrolled in the course. Then, Logistic Regression and Neural Network predictive models are used to identify students as early as possible that are in danger of failing the course they are currently enrolled in. These models compute the likelihood of any given student failing (or passing) the current course. Numerical results are presented to evaluate and compare the performance of the developed models and their predictive accuracy.

Next-Generation Analyst II

Volume Details

Table of Contents

Table of Contents