doctr-api/notes/notes.md
2024-10-26 19:20:19 -04:00

31 lines
502 B
Markdown

# Image Pre-processing
1. Invert image - Tesseract 3.0 only?
2. Rescale
3. Binarize
4. Remove noise
5. Dilation and erosion
6. Rotation and deskewing
7. Remove borders
8. Missing borders
9. Transparency and alpha channel
## Invert Image
```python
inverted_image = cv2.bitwise_not(image)
cv2.imwrite('tmp/inverted_image.jpg', inverted_image)
```
## Rescale
## Binarize
1. Grayscale image first.
2. Convert to black and white.
* Adjust threshold values, may require testing.
## Remove Noise