31 lines
502 B
Markdown
31 lines
502 B
Markdown
# Image Pre-processing
|
|
|
|
1. Invert image - Tesseract 3.0 only?
|
|
2. Rescale
|
|
3. Binarize
|
|
4. Remove noise
|
|
5. Dilation and erosion
|
|
6. Rotation and deskewing
|
|
7. Remove borders
|
|
8. Missing borders
|
|
9. Transparency and alpha channel
|
|
|
|
## Invert Image
|
|
|
|
```python
|
|
inverted_image = cv2.bitwise_not(image)
|
|
cv2.imwrite('tmp/inverted_image.jpg', inverted_image)
|
|
```
|
|
|
|
## Rescale
|
|
|
|
## Binarize
|
|
|
|
1. Grayscale image first.
|
|
2. Convert to black and white.
|
|
* Adjust threshold values, may require testing.
|
|
|
|
## Remove Noise
|
|
|
|
|