Real-Time Signal Processing XII

Automatic Mapping Of Large Signal Processing Systems To A Parallel Machine

Harry Printz, H. T. Kung, Todd Mummert, et al.

Show abstract

Since the spring of 1988, Carnegie Mellon University and the Naval Air Development Center have been working together to implement several large signal processing systems on the Warp parallel computer. In the course of this work, we have developed a prototype of a software tool that can automatically and efficiently map signal processing systems to distributed-memory parallel machines, such as Warp. We have used this tool to produce Warp implementations of small test systems. The automatically generated programs compare favorably with hand-crafted code. We believe this tool will be a significant aid in the creation of high speed signal processing systems. We assume that signal processing systems have the following characteristics: •They can be described by directed graphs of computational tasks; these graphs may contain thousands of task vertices. • Some tasks can be parallelized in a systolic or data-partitioned manner, while others cannot be parallelized at all. • The side effects of each task, if any, are limited to changes in local variables. • Each task has a data-independent execution time bound, which may be expressed as a function of the way it is parallelized, and the number of processors it is mapped to. In this paper we describe techniques to automatically map such systems to Warp-like parallel machines. We identify and address key issues in gracefully combining different parallel programming styles, in allocating processor, memory and communication bandwidth, and in generating and scheduling efficient parallel code. When iWarp, the VLSI version of the Warp machine, becomes available in 1990, we will extend this tool to generate efficient code for very large applications, which may require as many as 3000 iWarp processors, with an aggregate peak performance of 60 gigaflops.

Programming Methods For Crystallography

Louis Auslander, Michael Cook, Myoung Shenefelt

Show abstract

Determining the structure of a crystal by X-ray methods requires a repeated computation of the 3-dimensional Fourier transform. In this paper, we present a method that uses the group theoretic properties of crystallographic groups to reduce the Fourier transform computation.

Novel Coding Approach To High Throughput Rate FIR Filter Implementation

A. H. Tewfik, G. E. Sobelman

Show abstract

The implementation of high throughput rate finite impulse response filters is studied using a geometrical coding point of view. The output of the filter is recognized as the projection of an arbitrary multidimensional input vector of data values onto a fixed coefficient vector. The fixed vector induces a partition of the space of possible input vectors into multidimensional strips. All vectors lying in a single strip produce the same filter output. Based on this observation, we propose to compute the output of the filter at each time instant by finding a codeword that lies in the same strip as the input vector and reading a look-up table entry that corresponds to the codeword.

Two-Dimensional Adaptive Beamforming Techniques

T. V. Ho, J. Litva

Show abstract

In this paper we present a number of algorithms for two-dimensional (2-D) adaptive beamforming with planar array antennas. These are: (1) the 2-D LMS, (2) the 2-D Howells-Applebaum algorithm, and (3) the 2-D QRD-LS algorithm. In the latter case we also present an architecture for a systolic array processing. The derivation of the first two algorithms is based on the classical methods of gradient descent and control-loop, respectively, which were developed initially for 1-D arrays. The relationship between the 2-D and 1-D adaptive arrays will be discussed. The concept of 2-D eigenvector beams is also presented. The paper shows that an eigenbeam for a 2-D array can be formed by using two independent 1-D eigenbeams which are operating on row / and column m of the array. As a consequence, this leads to the introduction of a 3-D systolic array for performing the QRD-LS minimization. When the data flow is in time-skewed format, adaptations along rows and columns of the 2-D array can be carried out simultaneously.

A Real-Time Multi-Processor System For Knowledge-Based Target-Tracking

P. D.S. Irwin, S. A. Farson, A. J. Wilkinson

Show abstract

This paper describes a multi-processor implementation of a knowledge-based system which has been developed with the aim of achieving real-time image interpretation within a specific application area, namely tracking targets in I.R. imagery. Image segmentation and feature extraction algorithms generate a reduced resolution representation of the image in terms of various features. This feature data is then fed to a knowledge-based system for image interpretation. A custom chip-set for implementation of the 'front-end' segmentation and feature extraction algorithms is described which uses a bit-level systolic array architecture to meet the high computational requirements in real-time. Reduced resolution feature arrays which are generated by this hardware are simultaneously mapped into the memory of an array of INMOS transputers. A multi-processor 'farm' architecture is described capable of implementing the first level of image interpretation algorithms in real-time. Efficient task scheduling within the multi-processor system is critical for real-time operation and an optimal algorithm for scheduling the various tasks involved in implementing the rule set, which forms the core of the knowledge-based system, is presented.

An Efficient Algorithm For Predicting The Response Of A Laterally Inhibited Neural Network

B. V.K. Vijaya Kumar, Michael Lemmon

Show abstract

Neural networks are usually described mathematically using sets of coupled non-linear differential equations. Thus, we can determine the response of the network to any input stimulus by numerically integrating the corresponding differential equations (which can be computationally intensive). In this paper, a new algorithm is presented for determining the input-output behavior of laterally inhibited neural networks. This algorithm is based on an analytical understanding and thus does not involve any numerical integrations. Initial results indicate that this new algorithm provides a speed up of up to three orders of magnitude. We also propose a VLSI Systolic Array implementation of this algorithm.

Neural Net Classifier For Millimeter Wave Radar

Joe R. Brown, Sue Archer, Mark R. Bower

Show abstract

This paper describes the development of a neural net classifier for use in an automatic target recognition (ATR) system using millimeter wave (MMW) radar data. Two distinctive neural net classifiers were developed using mapping models (back-propagation and counterpropagation) and compared to a quadratic (Bayesian-like) classifier. A statistical feature set and a radar data set was used for both training and testing all three classifier systems. This statistical feature set is often used to test IMATRs prior to using actual data. Results are presented and indicate that the backpropagation net performed at near 100 percent accuracy for the statistical feature set and slightly outperformed the counterpropagation model in this application. Both networks hold promising results using real radar data.

Application Of Selective Linear Predictive Coding (SLPC) In Enhancing Cross-Range Resolution Of Inverse Synthetic Aperture Radar (ISAR)

D. Nandagopal, D. Longstaff, G. Nash, et al.

Show abstract

High range resolution X-band Inverse Synthetic Aperture Radar (ISAR) using stepped frequency waveforms is described. Target images are produced by performing inverse Fourier transform in range dimension and linear predictive coding in the cross-range dimension. Enhancement in cross-range resolution is produced by applying selective linear predictive coding (SLPC) technique. The signal processing technique is tested using simulated point targets, data from ship targets and a motor vehicle.

The Square Root In Signal Processing

R. W. Stewart, R. Chapman, T. S. Durrani

Show abstract

This paper considers the use of the arithmetic operation of square-rooting in signal processing. Many algorithms have been reformulated to avoid the square root operation in an attempt to increase processing speed. Rather than reformulating the algorithms to be square root free with the inherent problems of numerical instability, loss of orthogonality, and overflow/underflow, the square root is reconsidered from first principles and arrays are designed that are approximately twice as fast and approximately half the chip area of the analagous division. A non-restoring division array is also presented, and it is shown that with minimal modification the array also performs square roots. The result that square rooting is in fact a simpler operation than division has considerable implica-tions for many signal processing algorithms. Two examples are considered; first it is shown that there are no advantages in using square root free Givens transformations in QR triangular arrays; and also contrary to common conception that unnormalised least square lattices are more complex to implement than the square root normalised least squares lattices.

Linear Array For Efficient Execution Of Partitioned Matrix Algorithms

Jaime H. Moreno, Tomas Lang

Show abstract

We propose a class-specific linear array suitable for partitioned execution of matrix algorithms, which achieves high efficiency, exploits pipelining within cells in a simple manner, has off cells communication rate lower than computation rate, and has a small storage per cell (whose size is independent of the size of problems). This array is well suited to use the MMG method, a data-dependency graph-based mapping technique. The MMG method has capabilities to realize fixed-size data and partitioned problems as algorithm-specific arrays, and to map algorithms onto class-specific arrays. The array proposed here uses the mapping capabilities of the method, which combine coalescing and cut-and-pile as partition strategies. Mapping is illustrated using the LU-decomposition algorithm; results obtained from mapping other algorithms are also indicated. Performance estimates of the mappings show that, for example, LU-decomposition of a 2000 by 2000 matrix computed in a linear array with 100-cells, two operation units per cell in a 4-stage pipeline, and 50 [nsec] clock period (i.e., 4000 [Mflops]), achieves 87% efficiency (3480 [Mflops]). This performance is obtained while requiring communication among cells of only 5 [Mwords/sec] and peak external I/O bandwidth for the entire array also of 5 [Mwords/sec]. Moreover, for a problem of this size, the use of cut-and-pile leads to storage requirements of only 8000 words per memory module.

Fast Mapping Of Gravity Equations On A Linear Array

R. S. Baheti, V. J. Karkhanis, D. R. O'Hallaron, et al.

Show abstract

Computing a gravity model on a sequential machine requires 0(N2) time, where N is the degree and order of the model. With the increasing demand for higher resolution gravity models, it will be necessary to develop more efficient techniques for performing the computations required by these models. In this paper, we describe a parallel mapping of a gravity model on a linear array. The mapping uses N +5 cells running in parallel to compute the gravity equations in 0(N) time. The mapping is the basis for an implementation on the Warp computer at Carnegie Mellon.

Mutual information and estimative consistency

R. C. McCarty

Show abstract

The high speed processing of sample data from continuous stochastic signal processes, such as broad-band, noise-like spread spectrum signals, provides highly correlated, stochastically-dependent sample information. The utilization of dependent sample data in the estimation of the parameters of the parent process, generally provides biased but more importantly, inconsistent parametric estimates. For a set of continuous random variables, {X1, , Xn; -oo < Xi < oo, i = 1, . . . , n; n > 2}, Blachmanl, circa 1966, proposed a two-dimensional vector measure of mutual information, I0 , which was easily extended to the general n-dimensional case. In 1988, the author of this paper proposed a consistent sample estimate of an extended Blachman measure of mutual information for the case of multivariate exponential-type probability distributions. A more generalized estimated sample measure of mutual information provides a means to determine an appropriate sample interval between adjacent temporal samples which will not vanish unless the samples are statistically independent. These samples can then be used to generate the usual statistically consistent process moment estimates.

Fast Algorithms To Compute Multidimensional Discrete Fourier Transform

I. Gertner, R. Tolimieri

Show abstract

This paper presents algorithms to com-pute a N -dimensional Discrete Fourier Transform. First existing algorithms are surveyed. Multiplicative symmetrized and orbit exchange algorithm is presented. Then line algorithm to compute Multidi-mensional DFT is derived in two and N -dimensions. The explicit congruences for minimal number of lines covering a mul-tidimensional grid is given, when the data size in one dimension is equal to the power of a prime number.

Optimizing Architectures For Parallel FFT Processing

R. Keith Bardin, J. Daryl Sisk

Show abstract

In the design of high-performance embedded processor systems dedicated to a predefined range of tasks, the best designs will result from the simultaneous optimization of the hardware architecture and the algorithms for the required task suite. This paper presents studies in progress of techniques for such optimizations applied to synthetic-aperture radar (S AR) and inverse SAR (ISAR) image processing algorithms. Our approach has been to implement scaled-down model calculations of real test problems on a variable-topology parallel test processor. We present here some initial results on the parallelization of the fast Fourier transform (.1-F1) algorithm from this investigation, and propose an enhancement of the hypercube processor topology which appears advantageous for many applications.

Computing The Two Dimensional Fast Fourier Transform On A General Purpose Mesh Connected Multiprocessort

E. M. Johansson, A. J. De Groot, S. R. Parker

Show abstract

This paper discusses an implementation of the two dimensional fast Fourier transform (FFT) on SPRINT, the Systolic Processor with a Reconfigurable Interconnection Network of Transputers. SPRINT is a 64 element multiprocessor developed at Lawrence Livermore National Laboratory for the experimental evaluation of systolic algorithms and architectures. The implementation is a radix two decimation in time algorithm, valid for an arbitrary sized p x q mesh of processors and an arbitrary sized P x Q complex input array (P, Q, p, and q must all be powers of two). The processors are interconnected with their nearest neighbors along North-South-East-West communication links. The problems of array partitioning, bit reversal, subarray transform computation, and weighted (butterfly) combinations are all discussed. Finally, benchmark results are presented, and speedup and efficiency are discussed.

Parallel Algorithms For Automatic Target Identification Using CO2 Laser Radar Imagery

Arthur V. Forman Jr., Daniel J. Sullivan

Show abstract

This paper describes a parallel implementation of algorithms for automatic target identification using a CO2 laser radar sensor, which provides pixel-registered visual, infrared, relative range and Doppler (motion) images. Images are preprocessed to remove artifacts and speckle and to detect edges. The processed images are then passed to a transformation which provides a representation independent of translation, rotation, and scaling of the input scene. The transformed image is designed to be input to a neural network to extract features representative of the targets. This paper also discusses three-dimensional transformation of range images for pattern matching and the implementation of algorithms in a single instruction, multiple data (SIMD) parallel architecture.

Pattern Recognition Using A Double Layer Associative Memory

David T.Y. Lam, John E. Carroll

Show abstract

We describe and present results of a high-order double layer associative memory (HODLAM). The memory can utilise up to neat-4, 100% of the network storage capacity, has high tolerance to patterns with high cross-correlations and has high potential for translation-invariant pattern recognition. We demonstrate the capability of the memory by using it to recognise a set of 64 characters, each being drawn on an 8X8 pirelated array. We also propose a possible optical implementation scheme for the memory and examine modifications that lead to translation invariance.

A VLSI Electro-Optical Crossbar For Signal Processing

Mehrnoosh Mary Eshaghian, V. K. Prasanna Kumar

Show abstract

In this paper, we study an electro-optical crossbar for highly parallel computing. This VLSI architecture interconnects a set of processors on a single chip using Gallium Arsenide laser diodes and has a switching time in the order of nano-seconds. We also propose modified versions of this model for non-Von-Neumann computations such as, neural and data-flow computing. In the last section, we discuss applications in signal processing.

Systolic Implementation Of Kalman Filters Using Programmable Real - Time Incoherent Matrix Multiplier

S. Barua

Show abstract

The optical implementation of systolic Kalman filters is proposed in this paper. The systolic architecture employs the Faddeev algorithm to implement the complex equations present in the Kalman filtering. The programmable real-time incoherent matrix multiplier developed by the Hughes Research laboratories is employed in the proposed architecture for implementing the Faddeev algorithm.

Recursive Optical Notching Filter

Robert W. Brandstetter, Philip G. Grieve

Show abstract

A Recursive Optical Notching Filter(RONF)7 was developed to eliminate narrow band interference (NBI) experienced by RF receiver systems such as those used in radar and communications. Electromagnetic interference(EMI) can drastically reduce the effective range of radars and cause intermodulation products in the receiver and subsequent demodulation. The RONF functions to identify the interferers, their numbers and locations, as well as adaptively construct a notch filter structure which uniquely excises the NBI from the RF signal spectrum. Further, by the use of a novel recursive architecture it is possible to develop notch depths approaching the signal dynamic range. The RONF operations are accomplished by impressing the unprocessed RF signal onto a laser carrier by means of an acousto-optic modulator(Bragg cell). The modulated laser beam is then optically Fourier transformed to produce a real-time frequency spectrum. At the carrier optical wave length, the frequency translated RF signal with NBI appears in spatial coordinates for parallel processing with spatial intercepts of the EMI provided by programmed amplitude and phase "blockers" constructed by the Programmable Spatial Filter(PSF). It is at the PSF that optical recursion is used to obtain superior notching depth.8 With the NBI removed, the optical signal is then inverse Fourier transformed and the original radio frequency signal is recovered by optical heterodyne conversion. Laboratory tests with radar systems as well as various related stimuli have been conducted under field and on-site conditions. The RONF test systems have demonstrated notch depths greater than 40 dB using the recursive architecture.

Self-Aligned Optical Holographic Clock Distribution Networks

Freddie Lin, Eva M. Strzelecki

Show abstract

Several novel optical holographic VLSI clock distribution networks are presented. Cascaded holographic structures used for the realization of these networks have the following unique characteristics: (i) High-density uniform fanouts; (ii) Reduced clock skew problems; and (iii) self-aligned holograms. These clock distribution networks can be either realized in a free-space or single-mode planar waveguide geometry.

LiNbO3 Channel Waveguide Mode Annihilation Switching Array And Its Applications

Ray T. Chen

Show abstract

Optical signal processing and computing require spatial as well as temporal dimensionality. We report thermally annealed cutoff modulator and modulator array that can provide wide modulation bandwidth and multi-channel optical signal processing capability. Experimental results for single channel device and modulator array are presented. A detailed theory is given to explain the theory and the working principle of cutoff modulator. The increase of linear dynamic range of the device after appropriate heat treatment is quantitatively explained. The improvement is due to the proper combination of the coupling condition and the heat-treatment perturbed, electric field induced attenuation coefficient. Modulator array with 333 channels/cm packing density and 1.6 GHz modulation bandwidth is also demonstrated experimentally. Further applications by utilizing the cutoff modulator array, such as addition and subtraction of two numbers and optical logic gate operations are realized also.

Adaptive Signal Processing Using A Liquid Crystal Television

Stephen T. Welstead, Michael J. Ward, Denise M. Blanchard, et al.

Show abstract

In this paper we describe a way of using a two dimensional liquid crystal display to provide extended dynamic range for one dimensional spatial light modulation. We illustrate this use of the device in an adaptive signal processing application which requires high accuracy representation of a one dimensional weight vector. One dimension of the television display screen is used to specify the components of the vector. The second dimension is used to provide increased numerical accuracy for each of these components. In this way, we overcome the recognized low dynamic range and limited number of gray scales that is characteristic of liquid crystal displays at the pixel level. Preliminary experimental results verifying this use of the liquid crystal television as an improved accuracy spatial light modulator are presented.

Hybrid Digital-Optical Processors: A Performance Assessment

Eugene Pochapsky, David Casasent

Show abstract

Hybrid digital-optical processors are designed to use the best features of their components; the speed of an analog optical processor and the accuracy of the digital processor; to overcome the accuracy limitations of analog systems and the relatively slower speed of digital systems. The application of hybrid processors to the solution of systems of linear algebraic equations using the bimodal algorithm is shown by analysis to be potentially more rapid than exclusively digital processing only when the range of significant system eigenvalues is at least two times less than the linear dynamic range of the analog optical processor provided the proper preconditioning is employed. Thus, while performance is significantly improved by preconditioning and by maximizing the accuracy of the analog optical processor, only a limited number of problems - those with a small range of eigenvalues - can be efficiently solved to high accuracy using the hybrid processor. This precludes its use as a replacement for a digital processor. The hybrid processor may find applications as an adjunct to analog optical processor. However, by design the hybrid processor has increased complexity and reduced speed with respect to the simple analog optical processor. These sacrifices may not be justified in real-time applications to which an optically-based processor would be uniquely suited.

Electro-Optical Implementations For 2-D Array Neural Nets

Fathi M. A. Salam, Yiwen Wang

Show abstract

We show that our recently developed formulation for 2-D array neural net processing is suitably mapped into various electro-optical implementations. The implementations all make use of recent develop-ments pertaining to vector-vector, vector-matrix, or matrix-matrix multiplications. Our formulation hence utilizes an already developed technology and adapt it for 2-D neural net processing.

Radar Imaging Using The Wigner-Ville Distribution

B. Boashash, O. P. Kenny, H. J. Whitehouse

Show abstract

The need for analysis of time-varying signals has led to the formulation of a class of joint time-frequency distributions (TFDs). One of these TFDs, the Wigner-Ville distribution (WVD), has useful properties which can be applied to radar imaging. This paper first discusses the radar equation in terms of the time-frequency representation of the signal received from a radar system. It then presents a method of tomographic reconstruction for time-frequency images to estimate the scattering function of the aircraft. An optical archi-tecture is then discussed for the real-time implementation of the analysis method based on the WVD.

Spatial Redundancy Reduction In High Spectral Resolution Images Using Parametric Modeling

C. Mailhes, F. Castanie, P. Vermande

Show abstract

This paper deals with the redundancy reduction in multispectral images -covering about 200 spectral bands. The first part is an extension of an already published paper2 and is devoted to the scalar compression of each image pixel. Each pixel is an interferogram and an autoregressive model of order 10 gives a set of parameters that characterize the interferogram and are well-suited to quantization and transmission : the 10 first points, 10 reflection coefficients, and a model error vector. With scalar methods, a compression ratio of 4 or 6 is reached. In the second part, the compression is considered at the image level, on a column-by-column real-time processing basis. For classification applications, the first points of the normalized interferogram are shown to yield a discriminating vector. In the compression processing applied to each determined class, only one error and one first point vector can be transmitted. With regards to the reflection coefficients, a DPCM structure is proposed. The achieved compression ratio is shown to depend on the image and desired spectral matching. With a good trade-off between these, a ratio between 10 and 16 is obtained.

A VLSI Centroid Extractor For Real-Time Target Tracking Applications

T. S. Axelrod, T. F. Tassinari, G. O. Barnes, et al.

Show abstract

The Automatic Centroid Extractor (ACE) is a CMOS VLSI chip that per-forms target extraction and centroiding functions for an extensible real-time target tracking system. Contiguous pixels whose intensities exceed a given threshold are grouped into separate objects, and three weighted sums of pixel intensities that can rapidly be converted to the target centroid and average intensity are calculated for each object. Proper assignment of contiguous pixels to their respective objects is guaranteed for all convex shapes and most concave shapes. Data for a NxM pixel image is processed in raster scan format with a maximum value of 1024 for N and M. ACE can process in excess of 500 objects per frame and accepts input at up to 20 Megapixels per second. To increase the flexibility of the chip, arbitrary circular regions of the image can be masked so that they do not enter the target extraction process. This allows any set of fixed targets to be ignored if they are not of interest. The chip is a full custom design containing approximately 200,000 transistors and is being fabricated using Hewlett Packard's CMOS40 process. The ACE chip interfaces directly to the Inmos Transputer over a serial link. The communication link is used to select the threshold and to specify the circular regions of the image that will be masked, as well as to transfer target centroids and status information.

Real-Time Signal Processing XII

Volume Details

Table of Contents

Table of Contents