Proceedings Volume 9499

Next-Generation Analyst III

Barbara D. Broome, Timothy P. Hanratty, David L. Hall, et al.
cover
Proceedings Volume 9499

Next-Generation Analyst III

Barbara D. Broome, Timothy P. Hanratty, David L. Hall, et al.
Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 15 June 2015
Contents: 7 Sessions, 25 Papers, 0 Presentations
Conference: SPIE Sensing Technology + Applications 2015
Volume Number: 9499

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9499
  • Exploitation of Social Media
  • Advance Concepts I
  • Emerging Technology
  • Human Machine Interaction
  • Advance Concepts II
  • Interactive Poster Session
Front Matter: Volume 9499
icon_mobile_dropdown
Front Matter: Volume 9499
This PDF file contains the front matter associated with SPIE Proceedings Volume 9499, including the Title Page, Copyright information, Table of Contents, Introduction (if any), Authors, and Conference Committee listing.
Exploitation of Social Media
icon_mobile_dropdown
Challenges in the use of social media data for the next generation analyst
This paper discusses the opportunities and challenges present for the next generation analyst in the use of social media data. Focusing particularly on the detection of deception and misinformation within the latter, a review of current approaches is followed by the elaboration of a theoretical model for social media analysis premised on activity based intelligence. Considering this model with regard to latent challenges to analytical performance and potential opportunities for analytical calibration, this discussion articulates an approach for open-source, next generation intelligence analysis.
Yik Yak: a social media sensor
This is the first academic paper which focuses specifically on the new social media application Yik Yak. To provide a solid foundation, a brief overview of a few anonymous social media platforms is provided. A social media sensor framework is then presented which utilizes a three-layered approach to addressing the use of analytic tools. Specifically the use of keyword, geolocation, sentiment, and network analysis is explored through the perspective of social media as a sensor. Challenges and criticisms are exposed in addition to some possible solutions. A theoretical case study is then offered which outlines a potential use of social media as a senor for emergency managers. The paper culminates with a data collection for the development of a lexicon for Yik Yak. This data collection focuses on an 18 day study which collects Yik Yak posts and Twitter tweets simultaneously. The top 100 keywords for each platform are collected for every 24 hour period and placed through a relative change comparison. Overall, Yik Yak offers a more stable baseline as compared to Twitter.
Employing socially driven techniques for framing, contextualization, and collaboration in complex analytical threads
Arthur Wollocko, Jennifer Danczyk, Michael Farry, et al.
The proliferation of sensor technologies continues to impact Intelligence Analysis (IA) work domains. Historical procurement focus on sensor platform development and acquisition has resulted in increasingly advanced collection systems; however, such systems often demonstrate classic data overload conditions by placing increased burdens on already overtaxed human operators and analysts. Support technologies and improved interfaces have begun to emerge to ease that burden, but these often focus on single modalities or sensor platforms rather than underlying operator and analyst support needs, resulting in systems that do not adequately leverage their natural human attentional competencies, unique skills, and training. One particular reason why emerging support tools often fail is due to the gap between military applications and their functions, and the functions and capabilities afforded by cutting edge technology employed daily by modern knowledge workers who are increasingly “digitally native.” With the entry of Generation Y into these workplaces, “net generation” analysts, who are familiar with socially driven platforms that excel at giving users insight into large data sets while keeping cognitive burdens at a minimum, are creating opportunities for enhanced workflows. By using these ubiquitous platforms, net generation analysts have trained skills in discovering new information socially, tracking trends among affinity groups, and disseminating information. However, these functions are currently under-supported by existing tools. In this paper, we describe how socially driven techniques can be contextualized to frame complex analytical threads throughout the IA process. This paper focuses specifically on collaborative support technology development efforts for a team of operators and analysts. Our work focuses on under-supported functions in current working environments, and identifies opportunities to improve a team’s ability to discover new information and disseminate insightful analytic findings. We describe our Cognitive Systems Engineering approach to developing a novel collaborative enterprise IA system that combines modern collaboration tools with familiar contemporary social technologies. Our current findings detail specific cognitive and collaborative work support functions that defined the design requirements for a prototype analyst collaborative support environment.
Social network analysis realization and exploitation
Jack H. Davenport, James J. Nolan
Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract meaning in relationships between people, objects, and locations from a variety of unstructured text datasets is critical to proactive decision making. Additionally, the ability to automatically cluster text documents about entities and discover connections between those documents allows the analyst to navigate an extremely large collection of documents. Analysts also demand a temporal understanding of the extracted relationships between entities and connections between documents. We describe approaches to automatically realize the social networks via concept extraction, relationship extraction, and document connection algorithms; we also describe approaches to exploit the network by visualizing the results to the analyst such that changes over time are evident.
Advance Concepts I
icon_mobile_dropdown
A scalable architecture for extracting, aligning, linking, and visualizing multi-Int data
Craig A. Knoblock, Pedro Szekely
An analyst today has a tremendous amount of data available, but each of the various data sources typically exists in their own silos, so an analyst has limited ability to see an integrated view of the data and has little or no access to contextual information that could help in understanding the data. We have developed the Domain-Insight Graph (DIG) system, an innovative architecture for extracting, aligning, linking, and visualizing massive amounts of domain-specific content from unstructured sources. Under the DARPA Memex program we have already successfully applied this architecture to multiple application domains, including the enormous international problem of human trafficking, where we extracted, aligned and linked data from 50 million online Web pages. DIG builds on our Karma data integration toolkit, which makes it easy to rapidly integrate structured data from a variety of sources, including databases, spreadsheets, XML, JSON, and Web services. The ability to integrate Web services allows Karma to pull in live data from the various social media sites, such as Twitter, Instagram, and OpenStreetMaps. DIG then indexes the integrated data and provides an easy to use interface for query, visualization, and analysis.
Classification of short-lived objects using an interactive adaptable assistance system
“Although we know that it is not a familiar object, after a while we can say what it resembles”. The core task of an aerial image analyst is to recognize different object types based on certain clearly classified characteristics from aerial or satellite images. An interactive recognition assistance system compares selected features with a fixed set of reference objects (core data set). Therefore it is mainly designed to evaluate durable single objects like a specific type of ship or vehicle. Aerial image analysts on missions realized a changed warfare over the time. The task was not anymore to classify and thereby recognize a single durable object. The problem was that they had to classify strong variable objects and the reference set did not match anymore. In order to approach this new scope we introduce a concept to a further development of the interactive assistance system to be able to handle also short-lived, not clearly classifiable and strong variable objects like for example dhows. Dhows are the type of ships that are often used during pirate attacks at the coast of West Africa. Often these ships were build or extended by the pirates themselves. They follow no particular pattern as the standard construction of a merchant ship. In this work we differ between short-lived and durable objects. The interactive adaptable assistance system is supposed to assist image analysts with the classification of objects, which are new and not listed in the reference set of objects yet. The human interaction and perception is an important factor in order to realize this task and achieve the goal of recognition. Therefore we had to model the possibility to classify short-lived objects with appropriate procedures taking into consideration all aspects of short-lived objects. In this paper we will outline suitable measures and the possibilities to categorize short-lived objects via simple basic shapes as well as a temporary data storage concept for shortlived objects. The interactive adaptable approach offers the possibility to insert the data (objects) into the system directly and on-site. To mitigate the manipulation risk the entry of data (objects) into the main reference (core data) set is granted to a central authorized unit.
Recognition of human-vehicle interactions in group activities via multi-attributed semantic message generation
Vinayak Elangovan, Amir Shirkhodaie
Improved Situational awareness is a vital ongoing research effort for the U.S. Homeland Security for the past recent years. Many outdoor anomalous activities involve vehicles as their primary source of transportation to and from the scene where a plot is executed. Analysis of dynamics of Human-Vehicle Interaction (HVI) helps to identify correlated patterns of activities representing potential threats. The objective of this paper is bi-folded. Primarily, we discuss a method for temporal HVI events detection and verification for generation of HVI hypotheses. To effectively recognize HVI events, a Multi-attribute Vehicle Detection and Identification technique (MVDI) for detection and classification of stationary vehicles is presented. Secondly, we describe a method for identification of pertinent anomalous behaviors through analysis of state transitions between two successively detected events. Finally, we present a technique for generation of HVI semantic messages and present our experimental results to demonstrate the effectiveness of semantic messages for discovery of HVI in group activities.
Torpedo: topic periodicity discovery from text data
Jingjing Wang, Hongbo Deng, Jiawei Han
Although history may not repeat itself, many human activities are inherently periodic, recurring daily, weekly, monthly, yearly or following some other periods. Such recurring activities may not repeat the same set of keywords, but they do share similar topics. Thus it is interesting to mine topic periodicity from text data instead of just looking at the temporal behavior of a single keyword/phrase. Some previous preliminary studies in this direction prespecify a periodic temporal template for each topic. In this paper, we remove this restriction and propose a simple yet effective framework Torpedo to mine periodic/recurrent patterns from text, such as news articles, search query logs, research papers, and web blogs. We first transform text data into topic-specific time series by a time dependent topic modeling module, where each of the time series characterizes the temporal behavior of a topic. Then we use time series techniques to detect periodicity. Hence we both obtain a clear view of how topics distribute over time and enable the automatic discovery of periods that are inherent in each topic. Theoretical and experimental analyses demonstrate the advantage of Torpedo over existing work.
Emerging Technology
icon_mobile_dropdown
A survey of tools and resources for the next generation analyst
David L. Hall, Jake Graham, Emily Catherman
We have previously argued that a combination of trends in information technology (IT) and changing habits of people using IT provide opportunities for the emergence of a new generation of analysts that can perform effective intelligence, surveillance and reconnaissance (ISR) on a “do it yourself” (DIY) or “armchair” approach (see D.L. Hall and J. Llinas (2014)). Key technology advances include: i) new sensing capabilities including the use of micro-scale sensors and ad hoc deployment platforms such as commercial drones, ii) advanced computing capabilities in mobile devices that allow advanced signal and image processing and modeling, iii) intelligent interconnections due to advances in “web N” capabilities, and iv) global interconnectivity and increasing bandwidth. In addition, the changing habits of the digital natives reflect new ways of collecting and reporting information, sharing information, and collaborating in dynamic teams. This paper provides a survey and assessment of tools and resources to support this emerging analysis approach. The tools range from large-scale commercial tools such as IBM i2 Analyst Notebook, Palantir, and GeoSuite to emerging open source tools such as GeoViz and DECIDE from university research centers. The tools include geospatial visualization tools, social network analysis tools and decision aids. A summary of tools is provided along with links to web sites for tool access.
Addressing information management and dissemination challenges for the next generation analyst
Jesse Kovach, Laurel Sadler, Niranjan Suri, et al.
Recent technological advances in the areas of sensors, computation, and storage have led to the development of relatively inexpensive sensors that have been deployed on a wide scale and are able to generate large volumes of data. However, tactical networks have not been able to keep pace in terms of their ability to transfer all of the sensor data from the edge to an operations center for analysis. This paper explores multiple techniques to help bridge this gap, by using a three-pronged approach based on value of information-based dissemination, active sensor query capabilities, and anomaly detection mechanisms. These capabilities are being integrated into an open-source sensor platform deployed in a testbed environment for evaluation purposes.
Next generation data harmonization
Chandler Armstrong, Ryan M. Brown, Jillian Chaves, et al.
Analysts are presented with a never ending stream of data sources. Often, subsets of data sources to solve problems are easily identified but the process to align data sets is time consuming. However, many semantic technologies do allow for fast harmonization of data to overcome these problems. These include ontologies that serve as alignment targets, visual tools and natural language processing that generate semantic graphs in terms of the ontologies, and analytics that leverage these graphs. This research reviews a developed prototype that employs all these approaches to perform analysis across disparate data sources documenting violent, extremist events.
Intelligence Reach for Expertise (IREx)
Christina Hadley, James R. Schoening, Yonatan Schreiber
IREx is a search engine for next-generation analysts to find collaborators. U.S. Army Field Manual 2.0 (Intelligence) calls for collaboration within and outside the area of operations, but finding the best collaborator for a given task can be challenging. IREx will be demonstrated as part of Actionable Intelligence Technology Enabled Capability Demonstration (AI-TECD) at the E15 field exercises at Ft. Dix in July 2015. It includes a Task Model for describing a task and its prerequisite competencies, plus a User Model (i.e., a user profile) for individuals to assert their capabilities and other relevant data. These models use a canonical suite of ontologies as a foundation for these models, which enables robust queries and also keeps the models logically consistent. IREx also supports learning validation, where a learner who has completed a course module can search and find a suitable task to practice and demonstrate that their new knowledge can be used in the real world for its intended purpose. The IREx models are in the initial phase of a process to develop them as an IEEE standard. This initiative is currently an approved IEEE Study Group, after which follows a standards working group, then a balloting group, and if all goes well, an IEEE standard.
Human Machine Interaction
icon_mobile_dropdown
Collaborative interactive visualization: exploratory concept
Marielle Mokhtari, Valérie Lavigne, Frédéric Drolet
Dealing with an ever increasing amount of data is a challenge that military intelligence analysts or team of analysts face day to day. Increased individual and collective comprehension goes through collaboration between people. Better is the collaboration, better will be the comprehension. Nowadays, various technologies support and enhance collaboration by allowing people to connect and collaborate in settings as varied as across mobile devices, over networked computers, display walls, tabletop surfaces, to name just a few. A powerful collaboration system includes traditional and multimodal visualization features to achieve effective human communication. Interactive visualization strengthens collaboration because this approach is conducive to incrementally building a mental assessment of the data meaning. The purpose of this paper is to present an overview of the envisioned collaboration architecture and the interactive visualization concepts underlying the Sensemaking Support System prototype developed to support analysts in the context of the Joint Intelligence Collection and Analysis Capability project at DRDC Valcartier. It presents the current version of the architecture, discusses future capabilities to help analyst(s) in the accomplishment of their tasks and finally recommends collaboration and visualization technologies allowing to go a step further both as individual and as a team.
Visualizing approaches for displaying measures of sentiment
Sue E. Kase, Heather Roy, Daniel N. Cassenti
The overall purpose of intelligence analysis platforms is to extract key information from multi-source data. Ultimately, these systems are meant to save intelligence analysts time and effort by offering knowledge discovery capabilities. However, intelligence analysis platforms only assist analysts to the extent they are designed with human factors in mind. Poorly designed intelligence analysis platforms can hinder the knowledge discovery process, or worse, promote the misinterpretation of analysis results. Future intelligence systems must be critical enablers for improving speed, efficiency, and effectiveness of command-level decision making. Human-centered research is needed to address the challenge of visualizing large data collections to facilitate orientation and context, enable the discovery and selection of relevant information, and provide dynamic feedback for identifying changes in the state of a targeted region or topic. From the perspective of the ‘Human as a Data Explorer,’ this study investigates the visual presentation of intelligence information to support timely and accurate decision making. The investigation is a starting point in understanding the rich and varied set of information visualizations sponsored by the Army in recent years. A human-subjects experiment explores two visualization approaches against a control condition for displaying sentiment about a set of topics with an emphasis on the performance metrics of decision accuracy and response time. The resulting data analysis is the first in a series of experiments providing input for technology development informing future interface designs and system prototypes.
Conversational sensemaking
Alun Preece, Will Webberley, Dave Braines
Recent advances in natural language question-answering systems and context-aware mobile apps create opportunities for improved sensemaking in a tactical setting. Users equipped with mobile devices act as both sensors (able to acquire information) and effectors (able to act in situ), operating alone or in collectives. The currently- dominant technical approaches follow either a pull model (e.g. Apple’s Siri or IBM’s Watson which respond to users’ natural language queries) or a push model (e.g. Google’s Now which sends notifications to a user based on their context). There is growing recognition that users need more flexible styles of conversational interaction, where they are able to freely ask or tell, be asked or told, seek explanations and clarifications. Ideally such conversations should involve a mix of human and machine agents, able to collaborate in collective sensemaking activities with as few barriers as possible. Desirable capabilities include adding new knowledge, collaboratively building models, invoking specific services, and drawing inferences. As a step towards this goal, we collect evidence from a number of recent pilot studies including natural experiments (e.g. situation awareness in the context of organised protests) and synthetic experiments (e.g. human and machine agents collaborating in information seeking and spot reporting). We identify some principles and areas of future research for “conversational sensemaking”.
Collaborative human-machine analysis using a controlled natural language
David H. Mott, Donald R. Shemanski, Cheryl Giammanco, et al.
A key aspect of an analyst's task in providing relevant information from data is the reasoning about the implications of that data, in order to build a picture of the real world situation. This requires human cognition, based upon domain knowledge about individuals, events and environmental conditions. For a computer system to collaborate with an analyst, it must be capable of following a similar reasoning process to that of the analyst. We describe ITA Controlled English (CE), a subset of English to represent analyst's domain knowledge and reasoning, in a form that it is understandable by both analyst and machine. CE can be used to express domain rules, background data, assumptions and inferred conclusions, thus supporting human-machine interaction. A CE reasoning and modeling system can perform inferences from the data and provide the user with conclusions together with their rationale. We present a logical problem called the "Analysis Game", used for training analysts, which presents “analytic pitfalls” inherent in many problems. We explore an iterative approach to its representation in CE, where a person can develop an understanding of the problem solution by incremental construction of relevant concepts and rules. We discuss how such interactions might occur, and propose that such techniques could lead to better collaborative tools to assist the analyst and avoid the “pitfalls”.
Leveraging human oversight and intervention in large-scale parallel processing of open-source data
Enrico Casini, Niranjan Suri, Jeffrey M. Bradshaw
The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Advance Concepts II
icon_mobile_dropdown
One decade of the Data Fusion Information Group (DFIG) model
The revision of the Joint Directors of the Laboratories (JDL) Information Fusion model in 2004 discussed information processing, incorporated the analyst, and was coined the Data Fusion Information Group (DFIG) model. Since that time, developments in information technology (e.g., cloud computing, applications, and multimedia) have altered the role of the analyst. Data production has outpaced the analyst; however the analyst still has the role of data refinement and information reporting. In this paper, we highlight three examples being addressed by the DFIG model. One example is the role of the analyst to provide semantic queries (through an ontology) so that vast amount of data available can be indexed, accessed, retrieved, and processed. The second idea is reporting which requires the analyst to collect the data into a condensed and meaningful form through information management. The last example is the interpretation of the resolved information from data that must include contextual information not inherent in the data itself. Through a literature review, the DFIG developments in the last decade demonstrate the usability of the DFIG model to bring together the user (analyst or operator) and the machine (information fusion or manager) in a systems design.
Combining human and machine processes (CHAMP)
Machine Reasoning and Intelligence is usually done in a vacuum, without consultation of the ultimate decision-maker. The late consideration of the human cognitive process causes some major problems in the use of automated systems to provide reliable and actionable information that users can trust and depend to make the best Course-of-Action (COA). On the other hand, if automated systems are created exclusively based on human cognition, then there is a danger of developing systems that don’t push the barrier of technology and are mainly done for the comfort level of selected subject matter experts (SMEs). Our approach to combining human and machine processes (CHAMP) is based on the notion of developing optimal strategies for where, when, how, and which human intelligence should be injected within a machine reasoning and intelligence process. This combination is based on the criteria of improving the quality of the output of the automated process while maintaining the required computational efficiency for a COA to be actuated in timely fashion. This research addresses the following problem areas:
    Providing consistency within a mission: Injection of human reasoning and intelligence within the reliability and temporal needs of a mission to attain situational awareness, impact assessment, and COA development.
    Supporting the incorporation of data that is uncertain, incomplete, imprecise and contradictory (UIIC): Development of mathematical models to suggest the insertion of a cognitive process within a machine reasoning and intelligent system so as to minimize UIIC concerns.
    Developing systems that include humans in the loop whose performance can be analyzed and understood to provide feedback to the sensors.
Composable Analytic Systems for next-generation intelligence analysis
Lockheed Martin Advanced Technology Laboratories (LM ATL) is collaborating with Professor James Llinas, Ph.D., of the Center for Multisource Information Fusion at the University at Buffalo (State of NY), researching concepts for a mixed-initiative associate system for intelligence analysts to facilitate reduced analysis and decision times while proactively discovering and presenting relevant information based on the analyst’s needs, current tasks and cognitive state. Today’s exploitation and analysis systems have largely been designed for a specific sensor, data type, and operational context, leading to difficulty in directly supporting the analyst’s evolving tasking and work product development preferences across complex Operational Environments. Our interactions with analysts illuminate the need to impact the information fusion, exploitation, and analysis capabilities in a variety of ways, including understanding data options, algorithm composition, hypothesis validation, and work product development. Composable Analytic Systems, an analyst-driven system that increases flexibility and capability to effectively utilize Multi-INT fusion and analytics tailored to the analyst’s mission needs, holds promise to addresses the current and future intelligence analysis needs, as US forces engage threats in contested and denied environments.
Generalist analysts at the edge and distributed analytics
Gavin Pearson, Bhopinder Madahar
Joint Vision 2020 highlights that achievement of ‘full spectrum dominance rests upon information superiority’ and that information capabilities are changing rapidly. Similarly the Eight Great Technologies and McKinsey Global Institute have highlighted the criticality of ‘Big Data’ technologies. But most ‘Big Data’ technologies are predicated on the availability of high quality/bandwidth distributed information Infrastructure and service rich systems, and much of the technology is designed for use by highly trained data scientists. In deployed military operations the context is radically different; many analysts are generalists as opposed to highly trained data scientists, and the information infrastructure is frequently significantly smaller, sparse and brittle but nevertheless complex. Further operations are highly dynamic, temporally challenging, and in an unfamiliar sociocultural environment. As Joint Vision 2020 states ‘the need to shape ambiguous situations at the low end of the range of operations will present special challenges’. This paper outlines the S&T challenges associated with adapting ‘Big Data’ technologies to build a distributed analytic capability for the deployed operations. In particular we will discuss issues associated with: a) The adoption of data analytic platforms and the need for adaption to a distributed coalition environment and tactical information infrastructures; b) The Volume, Velocity, Variety, Veracity, Viscosity and Value of information and information processing, storage and distribution capabilities; c) The nature of the situations to be understood and the resulting impact on abstract representations and synergistic human-machine teams; d) The role of the human in collaboratively extracting understanding from information and directing the information system.
Interactive Poster Session
icon_mobile_dropdown
Dealing with extreme data diversity: extraction and fusion from the growing types of document formats
Peter David, Nichole Hansen, James J. Nolan, et al.
The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.
Towards an automated intelligence product generation capability
Alison M. Smith, Timothy W. Hawes, James J. Nolan
Creating intelligence information products is a time consuming and difficult process for analysts faced with identifying key pieces of information relevant to a complex set of information requirements. Complicating matters, these key pieces of information exist in multiple modalities scattered across data stores, buried in huge volumes of data. This results in the current predicament analysts find themselves; information retrieval and management consumes huge amounts of time that could be better spent performing analysis. The persistent growth in data accumulation rates will only increase the amount of time spent on these tasks without a significant advance in automated solutions for information product generation. We present a product generation tool, Automated PrOduct Generation and Enrichment (APOGEE), which aims to automate the information product creation process in order to shift the bulk of the analysts’ effort from data discovery and management to analysis. APOGEE discovers relevant text, imagery, video, and audio for inclusion in information products using semantic and statistical models of unstructured content. APOGEEs mixed-initiative interface, supported by highly responsive backend mechanisms, allows analysts to dynamically control the product generation process ensuring a maximally relevant result. The combination of these capabilities results in significant reductions in the time it takes analysts to produce information products while helping to increase the overall coverage. Through evaluation with a domain expert, APOGEE has been shown the potential to cut down the time for product generation by 20x. The result is a flexible end-to-end system that can be rapidly deployed in new operational settings.
Entity resolution using cloud computing
Alex James, Gregory Tauer, Adam Czerniejewski, et al.
Roles and capabilities of analysts are changing as the volume of data grows. Open-source content is abundant and users are becoming increasingly dependent on automated capabilities to sift and correlate information. Entity resolution is one such capability. It is an algorithm that links entities using an arbitrary number of criteria (e.g., identifiers, attributes) from multiple sources. This paper demonstrates a prototype capability, which identifies enriched attributes of individuals stored across multiple sources. Here, the system first completes its processing on a cloud-computing cluster. Then, in a data explorer role, the analyst evaluates whether automated results are correct and whether attribute enrichment improves knowledge discovery.