Apr 28, 2003
By Pat McGrew, EDP
Welcome to the HVCO Data Management Pavilion of OutputLinks.com!
Let me toss an idea at you. Transform programs are useful and relevant to information delivery activities today.
Why am I bringing this up? Because in mid-April on a popular forum read by many folks in our industry a vendor posted a white paper that discussed converting print streams to data. His first statement was that print streams could not be converted to data. I have to admit that the outrageousness of the statement got my attention. His later backed down a bit and said that they "should not" be converted, but even there I found myself taking exception. You can probably guess why! Many of us have been doing exactly this for many years.
His statement left me thinking about what made this vendor consider the process so horrifying, and what the ramifications were for the rest of us that a vendor was spreading this type of information in a white paper to his customers and others.
In the main, his point was that parse print data streams is almost impossible because the data within the stream has no context. He said, for example, that a program could not reliably identify an account number that didn't occur in exactly the same place in the page layout from page to page, account to account.
While identifying discrete data elements in print streams is not trivial, there are many vendors in our industry who cracked this parsing problem late in the last century. There are a variety of approaches that yield excellent results, including the use of user-defined trigger strings and pattern matching to drive the parsing routines. The vendor who took exception to the reliability of these routines didn't believe that they could handle the results of conditional processing routines which form the final print output stream, but in fact all of the vendor programs that I am aware of handle this type of print output with grace and élan.
Another point of concern was that the parsing routines could only be guaranteed to work on print streams they had been tested on, and that print streams are so variable that no guarantee was possible. Again, the primary vendors of transform technology will tell you that their routines have been built with loving care and with a sensitivity to the variations across older legacy print streams as well as modern, well-formed print streams. Sure, we all do regression testing when we put new processes in place. And, of course, we may uncover a bug or two that needs attention. But across our industry there are some incredibly good program designers and architects who have laid the groundwork for programs that are as capable of transforming 3800-style line data or 1986-era Metacode to PDF, XML or image with reliability.
What became apparent is that the vendor in question didn't trust the design, architecture and implementation of the parsing routines commonly used in our industry, and was, therefore, counseling his customers to avoid such programs and re-engineer their legacy systems. While all of us would probably agree that it would be ideal to be able to do that type of re-engineering, the reality is that requirements for new forms of output from existing print streams as well as data extraction from print streams often arrive without the luxury of budget or time to re-engineer.
Let's pick this up next time! If this is valuable, drop us a line at pm@outputlinks.com