The many different file formats that exist may benefit software companies, but they can confuse computer users interested only in data. For vendors, they may allow for specific application features, or help prevent customers from switching to a competitor's product. Many of them store nearly identical content, are proprietary or closed, and sometimes even change over time. This is exemplified by 3D file formats.1 We have documented more than 140 of them and believe there are many more. As a result, file transformations are a necessary part of everyday life, whether it be between *.doc and *.pdf, *.bmp and *.jpg, *.aac and *.mp3, *.mov and *.mpg, *.3ds and *.dae, or *.dwg and *.stp. Building a conversion system is hard enough without the added complexities of closed-file formats, the release of new ones, and the creation of new domains such as 3D video. Our work attempts to address these problems and take steps toward providing a universal file-format conversion system.
left: The front end of the NCSA Polyglot Web interface.1
Users can drag and drop files into the large area in the middle,2
select a target file format from the list, and click the upload button to automatically convert the files. right: After conversion, the files can be downloaded in the bottom area.
Figure 2. left: The I/O-graph can be inspected through the Web interface. right: A universal converter can also be made into a universal viewer. This is an example in the 3D domain. The user has uploaded a *.dae and *.stp file. Both are automatically converted to *.obj and displayed below.
Within each domain, numerous conversion programs attempt to make a transformation between some subset of formats. In the image domain, the popular ImageMagick3 utility converts between a large number of image types. Over the past few years, there has been a rise in Web-based, format-conversion services. Some are fee-based with staff performing the conversion and verifying results. Others are free and automated, but not verified. Examples of services are Zamzar,4 Media Convert,5 and Youconvertit.6 All of these programs and services rely on implementations of file loader and exporter programs to perform the conversion and on humans to assess the quality. What is not widely understood is that the loader implementations are as important as the file format they load. For instance, implementations may only partially support the file-format specifications (some can be more than 1,000 pages long), or they may incorrectly interpret the specification or approximate a closed-format based on reverse engineering. In addition, many conversions introduce differences because they require a change of content representation, such as moving the boundary representation to faceted surfaces within the 3D domain. Based on these issues, one must ask whether there is a means of converting from format A to format B, what the quality will be, and what the cost will be to implement a quality loader.
Our approach assumes that software vendors who have closed formats also have the best loader implementation for them, because they created them. Rather than re-implementing what is already available, we propose to simply use it. First, we document existing software in terms of its load/save and import/export operations. That includes open and closed software, as well as focused conversion utilities and user-centric graphical applications. Second, we unite all supported formats and enable searching for conversion paths. Third, we automate conversion execution based on the selected path. Finally, we automatically assess the quality according to user-driven criteria. Our framework is called National Center for Supercomputing Applications (NCSA) Polyglot7.
The heart of NCSA Polyglot is a data structure we call an input/output-graph (I/O-graph). This structure stores as vertices each of the formats supported by uniting a set of programs. Edges within this graph indicate a conversion path between formats through one of the programs. To find a set of conversions between a format A and B, we search for a path between the two respective vertices.
To automate the transformation process, we use AutoHotKey,8 a language that can script both command line and graphical user interface (GUI)-based programs. (Many of the high-end applications that interest us are GUI based.) By imposing several standards on the scripts, we are able to use them in an automated manner. Specifically, we impose a naming convention that indicates the operation the script supports, such as ‘open save,’ and a required header comment specifying the supported input/output formats. From these scripts we can obtain all information required to build an I/O-graph and execute paths through it.
Our system, composed of many third-party applications, will likely have several ways to convert between formats. Clearly, some will be better than others. Given a reasonably sized data set and the ability to directly load one format within each domain, we attempt to assign ‘quality’ weights to each conversion path indicating the amount of information retained through the transformation process. The system then uses these weights to choose a conversion path with the least likely information loss.
A Web interface hides the scripted applications from users (see Figures 1 and 2). In this setup, a Polyglot daemon runs in the background on a Web server and monitors an upload folder for conversion tasks. Multiple daemons can be run on separate computers to monitor the same shared upload folder to scale performance. For further flexibility, we provide a simple application programming interface to allow Java programs to make conversion requests of the Polyglot server.
To summarize, file-format conversion is an inevitable part of modern cyber life. Rather than performing the tedious, if even possible, task of implementing loaders and converters for each file format, we developed NCSA Polyglot using existing software. The technology can assist in accomplishing high-quality conversions in an extensible and computationally scalable framework. We are currently working on adding new quality measures and providing an interface so users can select the one that captures the most relevant information.
This research was supported by a National Archive and Records Administration (NARA) supplement to the National Science Foundation (NSF) Partnerships for Advanced Computational Infrastructure cooperative agreement CA #SCI-9619019. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, NARA, or the US government.
Kenton McHenry, Peter Bajcsy
National Center for Supercomputing Applications (NCSA)
Kenton McHenry is a research programmer working on problems of 3D content creation, conversion, and preservation. His research interests include computer vision, pattern recognition, and automation.
Peter Bajcsy is a research scientist working on problems related to automatic transfer of image content to knowledge. His interests include image processing, novel sensor technology, and computer and machine vision.