doctr-api/notes/notes.md
2024-10-26 19:20:19 -04:00

502 B

Image Pre-processing

  1. Invert image - Tesseract 3.0 only?
  2. Rescale
  3. Binarize
  4. Remove noise
  5. Dilation and erosion
  6. Rotation and deskewing
  7. Remove borders
  8. Missing borders
  9. Transparency and alpha channel

Invert Image

inverted_image = cv2.bitwise_not(image)
cv2.imwrite('tmp/inverted_image.jpg', inverted_image)

Rescale

Binarize

  1. Grayscale image first.
  2. Convert to black and white.
    • Adjust threshold values, may require testing.

Remove Noise