Depicting Web images for the blind and visually impaired
To assist in visualizing digital images on the Web, most blind and visually impaired individuals currently rely on screen readers. These text-to-speech programs depend on the alternative (alt) text associated with an image tag found in the HTML of a Web page.1 However, detailed alt text that describes an image is not always present on a Web page, resulting in the file path or site address to be read aloud instead. This limits access to online voting and election data that is often displayed through graphs and maps. Existing technologies make it challenging for visually impaired users to quickly understand data trends from Web images.
A variety of approaches have been taken to solve the issue of Web image accessibility (and tactile representation of graphics).2 While some focus on automatically generating alt text for Web images,1 other haptic and tactile solutions on the market and in research include haptic force-feedback pens.3 Although widely available and regarded as one of the better haptic technologies, the user can have only one point of contact with the virtual object at a time.4 In addition, the system is expensive and requires the generation of haptic models, which demand more computing power than computer graphics.3
To address the deficiencies of existing solutions, we designed a system that translates maps and graphs on the Web to computer-aided-design (CAD) models that can be turned into 3D objects. The combination of translation software and 3D printing hardware creates a solution with high flexibility in selecting the spatial and spectral information of interest, and which gives more surface contact with the virtual object than haptic pens. Our developed solution for 2D-to-3D conversion performs the five steps shown in Figure 1.
The implementation of these five steps employs a color histogram-based classification of graphic images with an underlying Weibull probability distribution model, orthogonal image projections for axis detection, threshold-based segmentation, and connectivity analysis for bar graph detection.5 All algorithmic parameters are optimized over an exhaustive search space of parameter values for maximum accuracy. The accuracy is measured using an F-score and a repository of 676 reference images gathered from 50 voting and election Web pages. Each image in the repository has been manually labeled as a graph or non-graph, type of graph, and presence of vertical or horizontal axis. Our preliminary experimental accuracy evaluations on the reference data are summarized in Table 1.
Having identified the image as a type of graph or map and determined the regions of interest, we convert the image information into a CAD model and construct the Standard Tessellation Language (STL) file necessary for 3D printing (see Figure 2, left and middle columns). We explored three methods of generating CAD models. The first was a frequency elevation map: the higher the color frequency, the lower the elevation. The second, a color elevation map, assigns the largest color values the lowest elevations. The third is an edge elevation map, which relies on edge detection and depicts greater edge intensities with higher elevations. Once the CAD model-generation method is chosen, there are two ways of constructing the 3D shapes of the desired elevation. The face method involves generating 3D shapes per region of interest (see Figure 2, bottom row, center column). The pixel method generates a cube per pixel with the desired elevation (see Figure 2, top row, center column). Ultimately, the face method proved to be much more efficient than the pixel method. For example, the speed of the STL file creation was 109.78 times faster, the G-code 3D printer instruction generation was 46.45 times faster, and the STL file size was 5319.87 times smaller for the bar graph image in Figure 2.
We developed several prototype modules for translating graphs and maps from the Web to 3D printouts. In addition, we have illustrated the challenges of keeping the fidelity of the printout to the original CAD model while working with 3D printers. There are three main challenges. One is input quality, including low resolution and small image size. The discrepancies between input and output of spatial resolution and dynamic range pose another challenge. Finally, the software-hardware interface, that is, the mechanical and material constraints of 3D printing, must be considered when building the STL model. Ultimately, we will seek to design a more dynamic multimodal haptic solution.
We would like to acknowledge Sharon Laskowski, Shaneé Dawkins, Steven Booth, Lou Ann Blake, and Patrick Leahy. Certain commercial equipment and instruments are identified in this paper. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the equipment identified is necessarily the best available for the purpose.
Andrea Bajcsy is a computer science major. She is in the Gemstone Program working on a three-year research project called Project Haptic and participated in the National Institute of Standards and Technology (NIST) 2013 SURF Program. Her research interests include image processing, computer vision, artificial intelligence, and human-computer interaction.
Mary Brady is the manager of the Information Systems Group. The group is focused on developing measurements, standards, and underlying technologies that foster innovation throughout the information life cycle from collection and analysis to sharing and preservation.