- Summary from the Description of Work (17 Jan 2013)
- Executive Summary
- Introduction
-
Section 1: Review of development of tools and workflows which incorporate automatic or semiautomatic metadata capture using OCR
- Introduction
-
Trial 1: Comparing a range of OCR software tools
- Materials and Methods
- Results
- Discussion
-
Trial 2: Comparing OCR tools being used in herbaria
- Materials and Methods
- Results
- Discussion
-
Trial 3: Multiple OCR trials of diverse specimens
- Materials and Methods
- Results
- Workflows: Incorporating OCR into digitisation workflows
- Discussion
-
Section 2: Review of development of NLP for parsing OCR text into Darwin core fields
- Introduction
- Review
-
Section 3: Review of (semi) automatic specimen image classification, i.e. (semi) automatic tagging of specimen images from certain collectors or expeditions, using template matching software
- Part 1: Semi-automated Classification of Herbarium Specimens by means of Template Matching Algorithms
- Part 2: Review and trials of Handwritten Text Recognition (HTR)
- Introduction
- Materials and Methods
- Results
- Discussion
-
Section 4: Review of automatic capture of character including colour, shape as well as EXIF data
-
Part 1: Computer vision for specimen classification
- Summary
- Tools Used
- Software Prototypes
- Specimen segmentation
- Method
- Morphological feature detection
- Calculating physical dimensions
- Colour analysis
- Heat maps for regions of interest
- Dissemination
- Links
- References
-
Part 2: Correlation of leaf colour and DNA quality
- Introduction
- Materials and Methods
- Results
-
References
- Software and Projects
-
Part 1: Computer vision for specimen classification
- Appendix 1A: Settings for ABBYY Recognition Server v3 at RBGE
- Appendix 1B: Trial 2 - Summary of OCR output for one specimen from each institute63 Appendix 1C: Settings for ABBYY FineReader v12 Professional at RBGK
- Appendix 1D: File preparation at RBGK
- Appendix 1E: Scores for each specimen from each institute by word
- Appendix 1F: OCR Software Results from RBGK testing of different formatting options
- Appendix 2: Screenshots of portals using
-
Appendix 3: Protocol for using Transkribus for natural history collections
-
Introduction
- Step 1: Register and download software
- Step 2: Log in
- Step 3: Upload documents to your private collection
- Step 4: Segment your document into text blocks and baselines
- Step 5: Manually transcribe a training dataset of 100 pages
- Step 6: Training the HTR model
- Step 7: Running the HTR model
-
Introduction
-
Appendix 4: Protocols for sampling and extracting DNA from herbarium specimens at RBGE
- DNA Extraction Methodology: using the QIAGEN automated QIAxtractor
Project element:
Report files(s):
Author(s):