Automatic metadata capture Task 4.1.2


Develop software that will automatically identify properties of an image. These data “facets” will be automatically
captured without human intervention and provide categories of information that allow Users to easily search and
browse virtual collections more effectively.

Specimen label data will be subjected to Optical Character Recognition (OCR) software to extract the text string
and research methods to improve the accuracy of OCR use on handwritten labels. OCR-extracted text collected
from handwritten labels will need to be subject to further processing and validation, such as via crowdsourcing
methodologies (objective 2).

