Metadata and data quality problems in the digital library by Jeff Beall has been published in the Journal of Digital Information 6 (3).
This short paper builds on research from diverse areas such as descriptive cataloging, data and information perspectives from many contexts and types of information systems, information retrieval, and technical communication, besides digital libraries. Three levels of data quality, absolute data quality, faithful reproduction data quality, and born digital data quality, are proposed. A typology of errors is illustrated with examples, a small study is reported, and some solutions for fixing data quality errors are described. Jeff recommends two further areas of research: "First, the development of a standardized method for calculating and comparing data quality among different databases would help digital library managers measure data quality and focus on data that needs remediation. Second, more research into the error rate of scanning of textual objects is needed. Research is needed to determine whether the error rates of optical character recognition are acceptable and to what extent they hinder searching and document access."
Jeff Beall is Catalog Librarian, Auraria Library, University of Colorado at Denver and Health Sciences Center. A skywatcher, his picture of Mercury taken last month was featured on one of the NASA websites.