Data Capture

1977 - 2000

Data Capture 1977-2000 is a big subject. Data Capture has always been a big challenge. Computing is built on the GIGO principle – garbage in equals garbage out. The data that goes into a computer has to be clean, accurate and formatted exactly as the computer expects it. As the volumes of data going into computers during that 23 year period grew exponentially, new techniques and technologies were needed to replace the 70 year old punched card. This Section of the Archive traces the evolution of data capture from the Punch Room to the Desktop to DIY. The extensive Case Studies tells the stories of how many well-known organisations addressed these changes and the systems they used.


In 1890 Dr Hollerith invented the 80 column punched card, a piece of slim cardboard the size of the old US Dollar bill. The new card was needed to process and analyze the US Census. By cutting holes in pre-defined positions on the card to represent standard numbers and alphabetic characters it was possible to use electro-mechanical machines to sort and collate data which could then be tabulated. For nearly 80 years thereafter the punched card became the primary method of inputting data into automated data processing equipment and, later, electronic data processing equipment better known as computers. The work of cutting the holes was called ‘punching.’ To ensure accuracy, it was necessary to check that the holes had been punched in the correct positions. The punching operation was repeated and any discrepancies were identified. This was called verifying.

The punches and verifiers were noisy, electro-mechanical machines and usually smelled of oil and card chads. As volumes of work grew these machines were organized into machine shops, lined up in rows. These shops were called Punch Rooms. All the work was batched and controlled. The operatives were rather like factory machine workers. Their productivity was measured in ‘key depressions per hour’ [kdph]. They were paid piecework. Often they worked shifts. All the operatives were women.

The Punch Room was located next door to the room containing the processing equipment, later known as the computer room. Virtually all the operators in the computer room were men. In another more office-like room could be found the technical staff. They were called system analysts and programmers. An occasional woman could be found among the technical staff. All of the technical staff was salaried.

Between 1890 and the mid-1960s the punch/verifier changed little in functionality. Paper-tape was introduced for some work and one company introduced a 40 column card. The first major technological change came with the key-to-tape system. This was electronic rather than electro-mechanical and worked on the principle of keying directly to magnetic tape at each station. The resulting magnetic tapes were taken to a pooler to create an input tape for the computer. Key-to-tape worked well but it was a little tedious with all the magnetic tape handling. It was a short-lived technology. The successor technology was key-to-disk, introduced from around 1969.

The key-to-disk system used a powerful minicomputer to drive a large number [up to 64] of terminals. The early terminals were like electronic replicas of keypunches but later visual display units were introduced. It was an application-specific computer system where for the first time functional application software and operating system were monolithic [an approach copied years later by PCs]. They were quite capable of working with 64 operators keying 20,000 kdph on 64 different data capture applications. The early key-to-disk systems also used magnetic tape data transfer to mainframe but later telecommunication transfer became the norm. Whereas the key-to-tape systems had had only limited success displacing key punches and verifiers, key-to-disk was hugely successful.

The key-to-disk systems were fully functional computer systems. It was not long before telecommunication facilities were added to allow the systems to network and permit remote as well as local terminals. In 1978 file processing, data management, expanded telecoms and high-level programming languages were added and the systems became known as ‘Distributed Processing’ systems. Distributed processing was the beginning of the end of the punch room and the old industrialised world of centralised data capture.

The holy grail of data capture has always been source data capture. Capture the data at the point of creation. It is more efficient and avoids transcription errors. With the ability of the data capture system to operate remote terminals it became possible to move the terminals out from the punch room to the offices where the data was created. This was the tipping point. The punch room was no longer a necessity. Operatives left the punch room to work in offices and became computer-literate multi-skilled clerical workers and they became salaried. It was not an event. It was a trend. It took 15 years to make the punch room obsolescent. Exceptionally, ROCC installed its last punch room style system in 2006, a conversion of an older system.

With the punch rooms going electronic, they became quieter and cleaner. The décor changed from factory to office. Carpets and potted plants appeared. However the organizational structure was not changed. There was a female supervisor reporting to a male manager. Managing the computing operations was a DP Manager. I never met a female DP Manager. There was a ‘de facto’ glass ceiling rarely breached. Interestingly, the more entrepreneurial supervisors often started their own data capture bureaux and traded successfully for many years. Even in the remodelled punch rooms piece-work payments persisted to the end as they did in bureaux.

Once the terminals moved out of the punch room the nature of the work done changed quickly. With the addition of the file processing capabilities, more computing work was undertaken at remote sites. The terminals were multi-functional and could be used for communicating with mainframes. Modern desktop computing had begun. In the early 1990s these terminals were replaced with PCs or fully multi-functional terminals and intranets.

By the late 1980s, the necessity to create a piece of paper for an audit trail and then transpose data from that paper into machine-readable format was coming under serious challenge. The pace of change in IT during the 1980s and the profusion of new ideas led to organizations taking a strategic approach towards working and exploiting their information assets rather than taking a tactical approach by displacing one technology by another while leaving the work unchanged, a process called computerisation. The strategic approach became known as business process re-engineering and was a 1990s phenomenon.

Data capture ceased to be a department or a function. It became a desktop task.

Between 1970 and 1990 most data capture was undertaken by proprietary systems unique to a particular manufacturer. The manufacturer supplied all the hardware, software and services. From the 1990s the products became commodities and suppliers began to use industry-standard hardware while providing their own software and services. Eventually the suppliers became consultancies designing and implementing sophisticated systems to solve complex problems using commodity components.

Another type of data capture, scanning, saw significant evolutionary development during 1977-2000. The early scanners were extremely large, noisy, expensive paper handling machines. There are two generic types of scanner- the scanner that merely captures an image of a page and the scanner that optically recognises the characters on the page [OCR]. The early scanners were OCR systems which could read a limited number of stylised character fonts. By the late 1970s, handprint recognition was introduced. It is not possible to read every character or mark on a piece of paper because paper can be crumpled, folded or torn or the characters faded or poorly printed and presented. A repair facility is therefore required to correct characters that cannot be properly read. By integrating a key-to-disk system with an OCR scanner it was possible to repair characters without re-handling the paper. The productivity of the scanning was vastly improved and the technique became cost-effective.

Scanning systems began as proprietary products and, like keyboard data capture systems, by the 1990s they became commodities. Moreover instead of being large cumbersome pieces of machinery, scanners became desk-top units that could be connected to PCs and networks. The non-OCR data capture scanning system worked by scanning a document as an image and then presenting the document on a split screen highlighting the characters to be keyed. It was simple and worked well.

There were also specialised scanners such as cheque encoders that could read either magnetic characters or later OCR characters from the bottom of a cheque. An operator could then key the written amount which would be encoded/printed on the cheque for subsequent processing. Another specialised scanner was the bar-code scanner. The big advantage of this recognition scanner was that the item to be scanned does not have to be carefully pre-positioned for scanning as a trip to the supermarket will prove.

In the 1980s there was extensive experimentation with new modes of data capture including handprint capture, voice input and many handheld devices. Once again the features of many of these proprietary technologies became commoditised and were included in new generations of devices as features.

To complete the picture, Do It Yourself data capture began in the 1980s and is now ubiquitous. The original idea was commercial. If data could be captured at source and the source was not part of the organization, then the organization had externalised the labour cost of data capture. The organization had also succeeded in directly connecting a customer, client, agent, counter-party or similar to its own information system. This was embryonic e-commerce. The online shopping story told elsewhere in the Archive is a good example of this activity.

It is difficult to compress a 23 year history in to a short summary. Of necessity the period is broken into technological development phases. This approach distorts what actually happened in that most of the technologies were in concurrent use at any one time. Organizations developed their IT systems to meet their own needs. Some were technology leaders, most were technology followers. The working life of many systems was nearly 20 years. A second distortion can be found in the Case Studies. The ones listed are listed solely because, accidentally, they survived. They are broadly representative but they are hugely incomplete. The Case Studies cover between 5% and 7% of the projects that ROCC and its antecedents undertook. Another distortion is the scale of the usage of the minicomputer-driven data capture systems. No firm numbers exist but the best estimate is that by the mid-1980s there were around 3000 installed systems in the UK for all manufacturers. The average number of terminals on each system was upwards of 10. A significant proportion of the centralised production systems, referred to in the Archive as Punch Room, operated extended or multiple shifts. Bureaux often ran 3 shifts, or 24/7 as it is now called. Thus, over 30,000 people each day worked on these systems.

ROCC and its antecedents had at least two competencies : very large systems and very sophisticated systems . Innovation was a necessity not a by-product. As a result, in finding solutions to problems there was often an eclectic nature to the systems. The only item never connected to a system was a coffee machine although that might have been the most useful.