Proceedings Volume 6241

Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006

cover
Proceedings Volume 6241

Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 17 April 2006
Contents: 8 Sessions, 35 Papers, 0 Presentations
Conference: Defense and Security Symposium 2006
Volume Number: 6241

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Data Mining
  • Information Assurance and Security
  • Intrusion Detection
  • Internet Applications
  • Miscellaneous Models and Applications
  • Intrusion Detection and Network Security
  • Data Mining Algorithms and Applications
  • Poster Session
Data Mining
icon_mobile_dropdown
An algorithmic approach to mining unknown clusters in training data
In this paper, unsupervised learning is utilized to develop a method for mining unknown clusters in training data. The approach is based on the Bayesian Data Reduction Algorithm (BDRA), which has recently been developed into a patented system called the Data Extraction and Mining Software Tool (DEMIST). In the BDRA, the modeling assumption is that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach to selecting and discretizing the relevant features of each class for best performance. The primary metric for selecting and discretizing all relevant features contained in each class is an analytic formula for the probability of error conditioned on the training data. Thus, the primary contribution of this work is to demonstrate an algorithmic approach to finding multiple unknown clusters in training data, which represents an extension to the original data clustering algorithm. To illustrate performance, results are demonstrated using simulated data that contains multiple clusters. In general, the results of this work will demonstrate an effective method for finding multiple clusters in data mining applications.
Efficient mining of strongly correlated item pairs
Shuxin Li, Robert Lee, Sheau-Dong Lang
Past attempts to mine transactional databases for strongly correlated item pairs have been beset by difficulties. In an attempt to be efficient, some algorithms produce false positive and false negative results. In an attempt to be accurate and comprehensive, other algorithms sacrifice efficiency. We propose an efficient new algorithm that uses Jaccard's correlation coefficient, which is simply the ratio between the sizes of the intersection and the union of two sets, to generate a set of strongly correlated item pairs that is both accurate and comprehensive. The pruning of candidate item pairs based on an upper bound facilitates efficiency. Furthermore, there is no possibility of false positives or false negatives. Testing of our algorithm on datasets of various sizes shows its effectiveness in real-world application.
Genetic program based data mining to reverse engineer digital logic
James F. Smith III, Thanh Vu H. Nguyen
A data mining based procedure for automated reverse engineering and defect discovery has been developed. The data mining algorithm for reverse engineering uses a genetic program (GP) as a data mining function. A genetic program is an algorithm based on the theory of evolution that automatically evolves populations of computer programs or mathematical expressions, eventually selecting one that is optimal in the sense it maximizes a measure of effectiveness, referred to as a fitness function. The system to be reverse engineered is typically a sensor. Design documents for the sensor are not available and conditions prevent the sensor from being taken apart. The sensor is used to create a database of input signals and output measurements. Rules about the likely design properties of the sensor are collected from experts. The rules are used to create a fitness function for the genetic program. Genetic program based data mining is then conducted. This procedure incorporates not only the experts' rules into the fitness function, but also the information in the database. The information extracted through this process is the internal design specifications of the sensor. Uncertainty related to the input-output database and the expert based rule set can significantly alter the reverse engineering results. Significant experimental and theoretical results related to GP based data mining for reverse engineering will be provided. Methods of quantifying uncertainty and its effects will be presented. Finally methods for reducing the uncertainty will be examined.
Database architecture for data mining to aid real-time range safety decision in a test range
Flight vehicles carrying sensitive payloads and large amount of propellants pose danger to life and property around launch pad area. There are inherent limitations in conventional decision support system and at times the man in the loop is under severe strain while analyzing the real time data of flight vehicle for range safety decision support system. It is essential to use newer technological input for designing flight termination system for handling high speed, high maneuvering and multi-platform based flight vehicles. This calls for extensive trajectory simulation under various flight conditions and failure modes, collection of actual test data of sub systems and post flight data along with geographical, metrological and tracking instrument data to be collected and organized in a data warehouse for data mining. The information obtained in real time using large data base will aid range safety decision making in a complex scenario of flight testing in a test range. This paper highlights briefly the existing system and its constraints and attempt to evolve an innovative system combining knowledge base and real time data from multiple sensors and fusing the data from similar and dissimilar sensors using state-vector fusion technique for more reliable and quick range safety decision making.
Granular computing for data mining
Granular computing, as an emerging research field, provides a conceptual framework for studying many issues in data mining. This paper examines some of those issues, including data and knowledge representation and processing. It is demonstrated that one of the fundamental tasks of data mining is searching for the right level of granularity in data and knowledge representation.
Information Assurance and Security
icon_mobile_dropdown
How ISO/IEC 17799 can be used for base lining information assurance among entities using data mining for defense, homeland security, commercial, and other civilian/commercial domains
One goal of database mining is to draw unique and valid perspectives from multiple data sources. Insights that are fashioned from closely-held data stores are likely to possess a high degree of reliability. The degree of information assurance comes into question, however, when external databases are accessed, combined and analyzed to form new perspectives. ISO/IEC 17799, Information technology-Security techniques-Code of practice for information security management, can be used to establish a higher level of information assurance among disparate entities using data mining in the defense, homeland security, commercial and other civilian/commercial domains. Organizations that meet ISO/IEC information security standards have identified and assessed risks, threats and vulnerabilities and have taken significant proactive steps to meet their unique security requirements. The ISO standards address twelve domains: risk assessment and treatment, security policy, organization of information security, asset management, human resources security, physical and environmental security, communications and operations management, access control, information systems acquisition, development and maintenance, information security incident management and business continuity management and compliance. Analysts can be relatively confident that if organizations are ISO 17799 compliant, a high degree of information assurance is likely to be a characteristic of the data sets being used. The reverse may be true. Extracting, fusing and drawing conclusions based upon databases with a low degree of information assurance may be wrought with all of the hazards that come from knowingly using bad data to make decisions. Using ISO/IEC 17799 as a baseline for information assurance can help mitigate these risks.
Personal privacy, information assurance, and the threat posed by malware techology
Martin R. Stytz, Sheila B. Banks
In spite of our best efforts to secure the cyber world, the threats posed to personal privacy by attacks upon networks and software continue unabated. While there are many reasons for this state of affairs, clearly one of the reasons for continued vulnerabilities in software is the inability to assess their security properties and test their security systems while they are in development. A second reason for this growing threat to personal privacy is the growing sophistication and maliciousness of malware coupled with the increasing difficulty of detecting malware. The pervasive threat posed by malware coupled with the difficulties faced when trying to detect its presence or an attempted intrusion make addressing the malware threat one of the most pressing issues that must be solved in order to insure personal privacy to users of the internet. In this paper, we will discuss the threat posed by malware, the types of malware found in the wild (outside of computer laboratories), and current techniques that are available for from a successful malware penetration. The paper includes a discussion of anti-malware tools and suggestions for future anti-malware efforts.
Energy efficient link layer security solution for wireless LANs
For the last couple of years people have become too reliant on Wireless LAN (WLAN) for information exchange. As wireless technology has no inherent physical protection, WLANs introduce new serious security threats to the personal information of individuals and organizations. Unfortunately, much of the growth has not been accompanied with an appropriate level of security for most corporate networks. The broadcast nature of wireless networks promote casual eavesdropping of data traffic with possible security threats including unauthorized use of networks, and denial of service attacks etc. Therefore, as in any environment where data is transmitted over untreated media, in order to protect the data, certain safeguards must be in place and effectively managed. To this end, this paper introduces a wireless link layer security protocol for WLANs that provides the users of IEEE 802.11 WLAN a security level close to the security level of wired networks. The proposed security protocol consists of three components: WLAN clients (STAs), WLAN Access Points (APs), and Authentication and Accounting Server (AAS). Before an STA can access the network, the user who uses the STA must be authenticated to the AP. AP must be authenticated to the STA as well, so that there is no rogue AP in the network. Finally, the communication between STAs and APs, as well as between APs and AAS are protected and defended from any kind of interception, modification and fabrication. We performed extensive simulations to evaluate the security and energy consumption performance of the proposed security protocol. The cryptographic primitives are selected based on their security and power consumption to make proposed protocol scalable and a manageable solution for low power wireless clients, such as PDAs.
Image sensor for security applications with on-chip data authentication
P. Stifter, K. Eberhardt, A. Erni, et al.
Sensors in a networked environment which are used for security applications could be jeopardized by man-in-the-middle or address spoofing attacks. By authentication and secure data transmission of the sensor's data stream, this can be thwart by fusing the image sensor with the necessary digital encryption and authentication circuit, which fulfils the three standard requirements of cryptography: data integrity, confidentiality and non-repudiation. This paper presents the development done by AIM, which led to the unique sensor SECVGA, a high performance monochrome (B/W) CMOS active pixel image sensor. The device captures still and motion images with a resolution of 800x600 active pixels and converts them into a digital data stream. Additional to a standard imaging sensor there is the capability of the on-chip cryptographic engine to provide the authentication of the sensor to the host, based on a one-way challenge/response protocol. The protocol that has been realized uses the exchange of a session key to secure the following video data transmission. To achieve this, we calculate a cryptographic checksum derived from a message authentication code (MAC) for a complete image frame. The imager is equipped with an EEPROM to give it the capability to personalize it with a unique and unchangeable identity. A two-wire I2C compatible serial interface allows to program the functions of the imager, i.e. various operating modes, including the authentication procedure, the control of the integration time, sub-frames and the frame rate.
Mining security events in a distributed agent society
D. Dasgupta, J. Rodríguez, S. Balachandran
In distributed agent architecture, tasks are performed on multiple computers which are sometimes spread across different locations. While it is important to collect security critical sensory information from the agent society, it is equally important to analyze and report such security events in a precise and useful manner. Data mining techniques are found to be very efficient in the generation of security event profiles. This paper describes the implementation of such a security alert mining tool which generates profiles of security events collected from a large agent society. In particular, our previous work addressed the development of a security console to collect and display alert message (IDMEF) from a Cougaar (agent) society. These messages are then logged in an XML database for further off-line analysis. In our current work, stream mining algorithms are applied for sequencing and generating frequently occurring episodes, and then finding association rules among frequent candidate episodes. This alert miner could profile most prevalent patterns as indications of frequent attacks in a large agent society.
Intrusion Detection
icon_mobile_dropdown
Distinguishing false from true alerts in Snort by data mining patterns of alerts
Jidong Long, Daniel Schwartz, Sara Stoecklin
The Snort network intrusion detection system is well known for triggering large numbers of false alerts. In addition, it usually only warns of a potential attack without stating what kind of attack it might be. This paper presents a clustering approach for handling Snort alerts more effectively. Central to this approach is the representation of alerts using the Intrusion Detection Message Exchange Format, which is written in XML. All the alerts for each network session are assembled into a single XML document, thereby representing a pattern of alerts. A novel XML distance measure is proposed to obtain the distance between two such XML documents. A classical clustering algorithm, implemented based on this distance measure, is then applied to group the alert patterns into clusters. Our experiment with the MIT 1998 DARPA data sets demonstrates that the clustering algorithm can distinguish between normal sessions that give rise to false alerts and those sessions that contain real attacks, and in about half of the latter cases can effectively identify the name of the attack.
A novel interacting multiple model based network intrusion detection scheme
In today's information age, information and network security are of primary importance to any organization. Network intrusion is a serious threat to security of computers and data networks. In internet protocol (IP) based network, intrusions originate in different kinds of packets/messages contained in the open system interconnection (OSI) layer 3 or higher layers. Network intrusion detection and prevention systems observe the layer 3 packets (or layer 4 to 7 messages) to screen for intrusions and security threats. Signature based methods use a pre-existing database that document intrusion patterns as perceived in the layer 3 to 7 protocol traffics and match the incoming traffic for potential intrusion attacks. Alternately, network traffic data can be modeled and any huge anomaly from the established traffic pattern can be detected as network intrusion. The latter method, also known as anomaly based detection is gaining popularity for its versatility in learning new patterns and discovering new attacks. It is apparent that for a reliable performance, an accurate model of the network data needs to be established. In this paper, we illustrate using collected data that network traffic is seldom stationary. We propose the use of multiple models to accurately represent the traffic data. The improvement in reliability of the proposed model is verified by measuring the detection and false alarm rates on several datasets.
Attribute selection using information gain for a fuzzy logic intrusion detection system
Jesús González-Pino, Janica Edmonds, Mauricio Papa
In the modern realm of information technology, data mining and fuzzy logic are often used as effective tools in the development of novel intrusion detection systems. This paper describes an intrusion detection system that effectively deploys both techniques and uses the concept of information gain to guide the attribute selection process. The advantage of this approach is that it provides a computationally efficient solution that helps reduce the overhead associated with the data mining process. Experimental results obtained with a prototype system implementation show promising opportunities for improving the overall detection performance of our intrusion detection system.
Threshold-based Clustering for Intrusion Detection Systems
Signature-based intrusion detection systems look for known, suspicious patterns in the input data. In this paper we explore compression of labeled empirical data using threshold-based clustering with regularization. The main target of clustering is to compress training dataset to the limited number of signatures, and to minimize the number of comparisons that are necessary to determine the status of the input event as a result. Essentially, the process of clustering includes merging of the clusters which are close enough. As a consequence, we will reduce original dataset to the limited number of labeled centroids. In a complex with k-nearest-neighbor (kNN) method, this set of centroids may be used as a multi-class classifier. Clearly, different attributes have different importance depending on the particular training database. This importance may be regulated in the definition of the distance using linear weight coefficients. The paper introduces special procedure to estimate above weight coefficients. The experiments on the KDD-99 intrusion detection dataset have confirmed effectiveness of the proposed methods.
Distributed intrusion detection system based on fuzzy rules
Computational Intelligence is the theory and method solving problems by simulating the intelligence of human using computer and it is the development of Artificial Intelligence. Fuzzy Technique is one of the most important theories of computational Intelligence. Genetic Fuzzy Technique and Neuro-Fuzzy Technique are the combination of Fuzzy Technique and novel techniques. This paper gives a distributed intrusion detection system based on fuzzy rules that has the characters of distributed parallel processing, self-organization, self-learning and self-adaptation by the using of Neuro-Fuzzy Technique and Genetic Fuzzy Technique. Specially, fuzzy decision technique can be used to reduce false detection. The results of the simulation experiment show that this intrusion detection system model has the characteristics of distributed, error tolerance, dynamic learning, and adaptation. It solves the problem of low identifying rate to new attacks and hidden attacks. The false detection rate is low. This approach is efficient to the distributed intrusion detection.
Internet Applications
icon_mobile_dropdown
Mining emotional profiles using e-mail messages for earlier warnings of potential terrorist activities
We develop a software system Text Scanner for Emotional Distress (TSED) for helping to detect email messages which are suspicious of coming from people under strong emotional distress. It has been confirmed by multiple studies that terrorist attackers have experienced a substantial emotional distress at some points before committing a terrorist attack. Therefore, if an individual in emotional distress can be detected on the basis of email texts, some preventive measures can be taken. The proposed detection machinery is based on extraction and classification of emotional profiles from emails. An emotional profile is a formal representation of a sequence of emotional states through a textual discourse where communicative actions are attached to these emotional states. The issues of extraction of emotional profiles from text and reasoning about it are discussed and illustrated. We then develop an inductive machine learning and reasoning framework to relate an emotional profile to the class "Emotional distress" or "No emotional distress", given a training dataset where the class is assigned by an expert. TSED's machine learning is evaluated using the database of structured customer complaints.
Detecting people of interest from internet data sources
In previous papers, we have documented success in determining the key people of interest from a large corpus of real-world evidence. Our recent efforts focus on exploring additional domains and data sources. Internet data sources such as email, web pages, and news feeds make it easier to gather a large corpus of documents for various domains, but detecting people of interest in these sources introduces new challenges. Analyzing these massive sources magnifies entity resolution problems, and demands a storage management strategy that supports efficient algorithmic analysis and visualization techniques. This paper discusses the techniques we used in order to analyze the ENRON email repository, which are also applicable to analyzing web pages returned from our "Buddy" meta-search engine.
Web-based dynamic Delphi: a new survey instrument
JingTao Yao, Wei-Ning Liu
We present a mathematical model for a dynamic Delphi survey method which takes advantages of Web technology. A comparative study on the performance of the conventional Delphi method and the dynamic Delphi instrument is conducted. It is suggested that a dynamic Delphi survey may form a consensus quickly. However, the result may not be robust due to the judgement leaking issues.
Dimensional reduction of web traffic data
Dimensional reduction may be effective in order to compress data without loss of essential information. Also, it may be useful in order to smooth data and reduce random noise. The model presented in this paper was motivated by the structure of the msweb web-traffic dataset from the UCI archive. It is proposed to reduce dimension (number of the used web-areas or vroots) as a result of the unsupervised learning process maximizing specially defined average log-likelihood divergence. Two different web-areas will be merged in the case if these areas appear together frequently during the same sessions. Essentially, roles of the web-areas are not symmetrical in the merging process. The web-area or cluster with bigger weight will act as an attractor and will stimulate merging. In difference, the smaller cluster will try to keep independence. In both cases the powers of attraction or resistance will depend on the weights of the corresponding clusters. Above strategy will prevent creation of one super-big cluster, and will help to reduce number of non-significant clusters. The proposed method was illustrated using two synthetic examples. The first example is based on an ideal vlink matrix which characterizes weights of the vroots and relations between them. The vlink matrix for the second example was generated using specially designed web-traffic simulator.
Miscellaneous Models and Applications
icon_mobile_dropdown
A novel mark embedding and attack identifying technique for watermarking
Qi Wei Lin, Gui Feng
With the rapid growth of network distributions of information like image, video and etc, there is an urgent need for copyright protection against pirating. As an effective method for ownership identification, digital watermarking technique arouses much interesting around the corresponding scientists. And much study is focused on information hiding technique. In this paper we propose a novel technique for watermarking. The technique was considered on other side of copyright protection: using mark when the watermark was embedded, if the watermarked image was attacked, we first distinguish the attack method, and then take the corresponding means to lessen or remove the effect of attack. As a result satisfactory extracted watermark can be obtained. In the proposed technique, usually we chose the length of mark sequence is much smaller than the pixel number in the sub-band, and the mark sequence can also be scrawled by spread spectrum code. So it is considerably powerful in resisting the clipping operation. This means the estimation of the attack type may be considerably accurate. Therefore the algorithm has the merit of robustness, strong anti-attack ability and security.
Broad frequency acoustic response of ground/floor to human footsteps
The human footstep is one of several signatures that can serve as a useful parameter for human detection. In early research, the force of footsteps was measured on load cells and the input energy from multiple footsteps was detected in the frequency range of 1-4 Hz. Cress investigated the seismic velocity response of outdoor ground sites to individuals that were crawling, walking, and running. In his work, the seismic velocity response was shown to be site-specific and the characteristic frequency range was 20-90 Hz. The current paper will present vibration and sound pressure responses of human footsteps in a broad frequency range. The vibration and sound in the low-frequency band are well known in the literature and generated by the force component normal to the ground/floor. This force is a function of person's weight and a manner of motion (e.g. walking, running, etc). Forces tangential to the ground/floor from a footstep and the ground reaction generate the high frequency responses. The interactions of foot and the ground/floor produce sliding contacts and the result is a friction signal. The parameters of this friction signal, such as frequency band and vibration and sound magnitudes as functions of human walking styles, were studied. The results of tests are presented and discussed.
An implementation-independent threat model for group communications
Jason Hester, William Yurcik, Roy Campbell
The importance of group communications and the need to efficiently and reliably support it across a network is an issue of growing importance for the next decade. New group communication services are emerging such as multimedia conferencing/groupware, distributed interactive simulations, sensor fusion systems, command and control centers, and network-centric military applications. While a succession of point-to-point unicast routes could provide group communications, this approach is inherently inefficient and unlikely to support the increased resource requirements of these new services. There is the lack of a comprehensive process to designing security into group communications schemes. Designing such protection for group communications is best done by utilizing proactive system engineering rather than reacting with ad hoc countermeasures to the latest attack du jour. Threat modeling is the foundation for secure system engineering processes because it organizes system threats and vulnerabilities into general classes so they can be addressed with known protection techniques. Although there has been prior work on threat modeling primarily for software applications, however, to our knowledge this is the first attempt at implementation-independent threat modeling for group communications. We discuss protection challenges unique to group communications and propose a process to create a threat model for group communication systems independent of underlying implementation based on classical security principles (Confidentiality, Integrity, Availability, Authentication, or CIAA). It is our hope that this work will lead to better designs for protection solutions against threats to group communication systems.
AutoCorrel: a neural network event correlation approach
Maxwell G. Dondo, Nathalie Japkowicz, Reuben Smith
Intrusion detection analysts are often swamped by multitudes of alerts originating from installed intrusion detection systems (IDS) as well as logs from routers and firewalls on the networks. Properly managing these alerts and correlating them to previously seen threats is critical in the ability to effectively protect a network from attacks. Manually correlating events can be a slow tedious task prone to human error. We present a two-stage alert correlation approach involving an artificial neural network (ANN) autoassociator and a single parameter decision threshold-setting unit. By clustering closely matched alerts together, this approach would be beneficial to the analyst. In this approach, alert attributes are extracted from each alert content and used to train an autoassociator. Based on the reconstruction error determined by the autoassociator, closely matched alerts are grouped together. Whenever a new alert is received, it is automatically categorised into one of the alert clusters which identify the type of attack and its severity level as previously known by the analyst. If the attack is entirely new and there is no match to the existing clusters, this would be appropriately reflected to the analyst. There are several advantages to using an ANN based approach. First, ANNs acquire knowledge straight from the data without the need for a human expert to build sets of domain rules and facts. Second, once trained, ANNs can be very fast, accurate and have high precision for near real-time applications. Finally, while learning, ANNs perform a type of dimensionality reduction allowing a user to input large amounts of information without fearing an effciency bottleneck. Thus, rather than storing the data in TCP Quad format (which stores only seven event attributes) and performing a multi-stage query on reduced information, the user can input all the relevant information available and instead allow the neural network to organise and reduce this knowledge in an adaptive and goal-oriented fashion.
Data Modeling for predictive behavior hypothesis formation and testing
Holger M. Jaenisch, James W. Handley, Marvin H. Barnett, et al.
This paper presents a novel hypothesis analysis tool building on QUEST and DANCER. Unique is the ability to convert cause/effect relationships into analytical equation transfer functions for exploitation. In this the third phase of our work, we derive Data Models for each unique word and its ontological associated unique words. We form a classical control theory transfer function using the associated words as the input vector and the assigned unique word as the output vector. Each transfer function model can be tested against new evidence to yield new output. Additionally, conjectured output can be passed through the inverse model to predict the requisite case observations required to yield the conjectured output. Hypotheses are tested using circumstantial evidence, notional similarity, evidential strength, and plausibility to determine if they are supported or rejected. Examples of solving for evidence links are provided from tool execution.
Intrusion Detection and Network Security
icon_mobile_dropdown
End-to-end communications security
The current methodologies of network communication security and in-transit data security being used within the enterprise do not adequately meet the ever growing threats from internal as well as external sources. A new approach called End-to-End Communications Security is being used to successfully close these security gaps and bring enterprises into regulatory and aduit compliance all at the same time.
Extending key sharing: how to generate a key tightly coupled to a network security policy
Current state of the art security policy technologies, besides the small scale limitation and largely manual nature of accompanied management methods, are lacking a) in real-timeliness of policy implementation and b) vulnerabilities and inflexibility stemming from the centralized policy decision making; even if, for example, a policy description or access control database is distributed, the actual decision is often a centralized action and forms a system single point of failure. In this paper we are presenting a new fundamental concept that allows implement a security policy by a systematic and efficient key distribution procedure. Specifically, we extend the polynomial Shamir key splitting. According to this, a global key is split into n parts, any k of which can re-construct the original key. In this paper we present a method that instead of having "any k parts" be able to re-construct the original key, the latter can only be reconstructed if keys are combined as any access control policy describes. This leads into an easily deployable key generation procedure that results a single key per entity that "knows" its role in the specific access control policy from which it was derived. The system is considered efficient as it may be used to avoid expensive PKI operations or pairwise key distributions as well as provides superior security due to its distributed nature, the fact that the key is tightly coupled to the policy, and that policy change may be implemented easier and faster.
A novel unsupervised anomaly detection based on robust principal component classifier
Intrusion Detection Systems (IDSs) need a mass of labeled data in the process of training, which hampers the application and popularity of traditional IDSs. Classical principal component analysis is highly sensitive to outliers in training data, and leads to poor classification accuracy. This paper proposes a novel scheme based on robust principal component classifier, which obtains principal components that are not influenced much by outliers. An anomaly detection model is constructed from the distances in the principal component space and the reconstruction error of training data. The experiments show that this proposed approach can detect unknown intrusions effectively, and has a good performance in detection rate and false positive rate especially.
AINIDS: an immune-based network intrusion detection system
Qiao Yan, Jianping Yu
Intrusion detection can be looked as a problem of pattern classification. Since intrusion detection has some intrinsic characteristic such as high dimensional feature spaces, linearity non-differentiation, severe unevenness of normal pattern and anomaly pattern, it is very difficult to detection intrusions directly by use of classical pattern recognition method. Nature immune system is a self-adaptive and self-learning classifier, which can accomplish recognition and classification by learning, remembrance and association. First we use four-tuple to define nature immune system and intrusion detection system, then we give the mathematic formalization description of performance index of intrusion detection system. Finally we design and develop an immune-based network intrusion detection system-- AINIDS, which includes a data collector component, a packet head parser and feature extraction component, antibody generation and antigen detection component, co-stimulation and report component and rule optimization component. The antibody generation and antigen detection component is the key module of AINIDS. In the component the passive immune antibodies and the automatic immune antibodies that include memory automatic immune antibodies and fuzzy automatic immune antibodies are proposed by analogy with natural immune system. The passive immune antibodies inherit available rules and can detect known intrusion rapidly. The automatic immune antibodies integrate statistic method with fuzzy reasoning system to improve the detection performance and can discover novel attacks. AINIDS is tested by the data that we collect from our LANs and by the data from 1999 DARPA intrusion detection evaluation data sets. Both experiments prove AINIDS has good detection rate for old and novel attacks.
Data Mining Algorithms and Applications
icon_mobile_dropdown
Clustering method via independent components for semi-structured documents
Tong Wang, Da-Xin Liu, Xuanzuo Lin, et al.
This paper presents a novel clustering method for XML documents. Much research effort of document clustering is currently devoted to support the storage and retrieval of large collections of XML documents. However, traditional text clustering approaches cannot embody the structural information of semi-structured documents. Our technique is firstly to extract relative path features to represent each document. And then, we transform these documents to Vector Space Model (VSM) and propose a similarity computation. Before clustering, we apply Independent Component Analysis (ICA) to reduce dimensions of VSM. To the best of author's knowledge, ICA has not been used for XML clustering before. The standard C-means partition algorithm is also improved: When a solution can be no more improved, the algorithm makes the next iteration after an appropriate disturbance on the local minimum solution. Thus the algorithm can skip out of the local minimum and in the meanwhile, reach the whole search space. Experimental results, based on two real datasets and one synthetic dataset, show that the proposed approach is efficient and outperforms naive-clustering method without ICA applied.
Mining hospital management data
Shusaku Tsumoto, Yuko Tsumoto
This papers gives an approach to hospital management data by using statistical data mining. For analysis, distribution analysis, correlation and uniregression analysis and generalized linear model were applied. The results showed several interesting results, which suggests that the reuse of stored data will give a powerful tool to support a long-period management of a university hospital.
Visualization of similarities and dissimilarities in rules using MDS
This paper proposes a visualization approach to show the similarity relations between rules based on multidimensional scaling (MDS), which assign a two-dimensional cartesian coordinate to each data point from the information about similiaries between this data and others data. First, semantic and synctatic similarities of rules are obtained after rules are induced from a datasets. Then, MDS is applied to each similarity. MDS visualizes the difference between semantic and synctatic simliarites. This method was evaluated on two medical data sets, whose experimental results show that knowledge useful for domain experts could be found.
Damage assessment of mission essential buildings based on simulation studies of low yield explosives
There has been a lack of investigations related to low yield explosives instigated by terrorist on small but high occupancy buildings. Also, mitigating the threat of terrorist attacks against high occupancy buildings with network equipment essential to the mission of an organization is a challenging task. At the same time, it is difficult to predict how, why, and when terrorists may attack theses assets. Many factors must be considered in creating a safe building environment. Although it is possible that the dominant threat mode may change in the future, bombings have historically been a favorite tactic of terrorists. Ingredients for homemade bombs are easily obtained on the open market, as are the techniques for making bombs. Bombings are easy and quick to execute. This paper discusses the problems with and provides insights of experience gained in analyzing small scale explosions on older military base buildings. In this study, we examine the placement of various bombs on buildings using the shock wave simulation code CTH and examine the damage effects on the interior of the building, particularly the damage that is incurred on a computer center. These simulation experiments provide data on the effectiveness of a building's security and an understanding of the phenomenology of shocks as they propagate through rooms and corridors. It's purpose is to motivate researchers to take the seriousness of small yield explosives on moderately sized buildings. Visualizations from this analysis are used to understand the complex flow of the air blasts around corridors and hallways. Finally, we make suggestions for improving the mitigation of such terrorist attacks. The intent of this study is not to provide breakthrough technology, but to provide a tool and a means for analyzing the material hardness of a building and to eventually provide the incentive for more security. The information mentioned in this paper is public domain information and easily available via the internet as well as in any public library or bookstore. Therefore, the information discussed in this paper is unclassified and in no way reveals any new methodology or new technology.
Poster Session
icon_mobile_dropdown
A noise-immune cryptographic information protection method for facsimile information transmission and the realization algorithms
Vladimir G. Krasilenko, Vitaliy F. Bardachenko, Alexander I. Nikolsky, et al.
We analyse the existent methods of cryptographic defence for the facsimile information transfer, consider their shortcomings and prove the necessity of better information protection degree. The method of information protection that is based on presentation of input data as images is proposed. We offer a new noise-immune algorithm for realization of this method which consists in transformation of an input frame by pixels transposition according to an entered key. At decoding mode the reverse transformation of image with the use of the same key is used. Practical realization of the given method takes into account noise in the transmission channels and information distortions by scanners, faxes and others like that. We show that the given influences are reduced to the transformation of the input image coordinates. We show the algorithm in detail and consider its basic steps. We show the possibility of the offered method by the means of the developed software. The realized algorithm corrects curvature of frames: turn, scaling, fallout of pixels and others like that. At low noise level (loss of pixel information less than 10 percents) it is possible to encode, transfer and decode any types of images and texts with 12-size font character. The software filters for information restore and noise removing allow to transfer fax data with 30 percents pixels loss at 18-size font text. This percent of data loss can be considerably increased by the use of the software character recognition block that can be realized on fuzzy-neural algorithms. Examples of encoding and decryption of images and texts are shown.
The design and application of data warehouse during modern enterprises environment
Lijuan Zhou, Chi Liu, Chunying Wang
The interest in analyzing data has grown tremendously in recent years. To analyze data, a multitude of technologies is need, namely technologies from the fields of Data Warehouse, Data Mining, On-line Analytical Processing (OLAP). This paper proposes the system structure model of the data warehouse during modern enterprises environment according to the information demand for enterprises and the actual demand of user's, and also analyses the benefit of this kind of model in practical application, and provides the setting-up course of the data warehouse model. At the same time it has proposes the total design plans of the data warehouses of modern enterprises. The data warehouse that we build in practical application can be offered: high performance of queries; efficiency of the data; independent characteristic of logical and physical data. In addition, A Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support, OLAP queries or data mining. One of the most important decisions in designing a data warehouse is selection of right views to be materialized. In this paper, we also have designed algorithms for selecting a set of views to be materialized in a data warehouse.First, we give the algorithms for selecting materialized views. Then we use experiments do demonstrate the power of our approach. The results show the proposed algorithm delivers an optimal solution. Finally, we discuss the advantage and shortcoming of our approach and future work.
A practical timing attack on RSA over a LAN
Today, the specific implementation of a cryptosystem is of possibly greater importance than the underlying cryptographic algorithm itself. Through side-channel cryptanalysis, an adversary may deduce a secret key just by monitoring implementation-specific side channels, such as execution time or power consumption during a cryptographic operation. In this paper, we describe a successful remote timing attack against a server running a protocol similar to SSL. Using a fully-automated attack on Chinese Remaindering Theorem (CRT) implementations of RSA, we show it is practical to recover a 1024-bit key in under an hour over a local area network.