visit
Sometimes, you need to set up an OCR microservice, which accepts images and returns text. In this article, I will try to explain basic ideas on how to create your own OCR service for free, using python, fastAPI, tesseract, redis, celery
and docker
.
This is one of the variations of server architecture, that’s based on the smallest independent services (web apps), that interact with each other using protocols based on SOAP, REST, GraphQL, and RPC. In our microservice, we will use REST architectural style in microservice communications with one main microservice node (fastAPI_service
), but keep in mind, it is not the best option to do it.
FastAPI service will be an entry point to our OCR service. All communication will be there. It has python + fastAPI + uvicorn
. It will have 3 endpoints. POST /api/v1/ocr/img
endpoint, GET /api/v1/ocr/status/{task_id}
and GET /api/v1/ocr/text/{task_id}
.
We are going to use POST /api/v1/ocr/img
endpoint to upload images. Also, it will start a task and give to us the task id. Using GET /api/v1/ocr/status/{task_id}
we will get the status of our task (ocr process is a considerably “heavy” task and will take some time to execute), after receiving the success status, we are going to use and GET /api/v1/ocr/text/{task_id}
endpoint to see the final results.
For each service, we will have our own folder, I use VS Code
to write code, and for creating virtualenv
I use pipenv
package. For local testing, outside Docker, I will run services in virtual environments.
Let’s look at main.py
.
from fastapi import FastAPI
from app.routers import ocr
app = FastAPI()
app.include_router(ocr.router)
@app.get("/")
async def root():
return {"message": "Hello gzht888.com!"}
Here, we define only one root GET endpoint. It returns a simple JSON message. Also, we include the OCR router, where we have a list of endpoints, that belongs to OCR. It is a good practice not to put all eggs in one basket endpoints to one file, because it will be overloaded and not easy to understand. Try to divide your code into small logically independent pieces and connect it with short lines of code in one main file. Let’s have a look at our OCR router, which we have included in the main FastAPI router.
from fastapi import APIRouter
from model import ImageBase64
router = APIRouter(
prefix="/ocr",
tags=["ocr"],
)
@router.get("/status")
async def get_status():
return {"message": "ok"}
@router.get("/text")
async def get_text():
return {"message": "ok"}
@router.post("/img")
async def create_item(img: ImageBase64):
return {"message": "ok"}
In ocr.py
file we have a router that contains three endpoints: GET /ocr/status
, GET /ocr/text
and POST ocr/img
endpoint. Also, we import data model for POST endpoint. I have defined some simple logic to test endpoints. We can start our service via cmd.exe
.
“uvicorn app.main:app –reload”
Uvicorn
is an ASGI web server implementation for python. It starts main.py
file. – reload means that if we change some code inside files, uvicorn
will automatically rerun new code.
For testing our endpoints, we will use thunder client
in VS Code
. First, we will check our endpoints. GET //127.0.0.1:8000
and GET //127.0.0.1:8000/api/v1/ocr/status
endpoint.
There are both working fine, next we need to write some logic. We will receive our image in base64 string format and get generated task_id. Using this task_id we will go to GET /ocr/status
and receive our OCR processing status, we will have three types of status: pending, success, and error. After receiving the success status, we will get a text from GET /ocr/text
endpoint, using our task_id.
To get the status and monitor it, we will use the redis
package. To execute the parallel process, we are going to use the celery
package. Let’s install docker and write some code.
Go to //www.docker.com/
and install Docker Desktop. After installation, in case If you are on Windows, you also need to install wsl
package and get a Linux system image. Detailed instructions are described at //docs.docker.com/desktop/install/windows-install/.
Now, we need to create Dockerfile
, that will have all requirements docker
commands to run our fastAPI
service inside docker
environment.
FROM python:3.11
WORKDIR /app
RUN apt-get update && apt-get install -y && apt-get clean
RUN pip install –upgrade pip
COPY ./requirements.txt .
RUN pip install -r requirements.txt && rm -rf /root/.cache/pip
COPY . .
In short, we create a working directory named “app” and put requirements.txt
to it, installing all packages from this file using python 3.11
. After everything, we remove pip cached data and copy our all files to the working directory inside a container. From this, our first entry point container is ready. Now we need to create one more file – docker-compose.yml
. Briefly, it is a simple file, that contains all docker commands that build, deploy and execute all containers together in a simple one-line command.
version: '3.8'
services:
web:
build: ./fastapi_service
ports:
- 8001:8000
command: uvicorn app.main:app --host 0.0.0.0 –reload
We are ready to containerize your first fastAPI
service. To do this, we need to write docker-compose
commands. It creates and starts all containers.
Run cmd.exe
inside a folder that contains adocker-compose.yml
file with docker-compose up --build
command and have a look.
Now, our fastAPI
service is running inside docker. We have changed the port number to 8001 via a docker-compose
file.
First, clone //github.com/abizovnuralem/ocr.
You will see this project structure.
I have separated each app to own folder with its own instance of celery
and redis
. I have decided to use 3 instances of redis
and celery
to make each microservice independent. These apps have own dockerfile
, where we install a virtual system and all required packages from requirements.txt. Each app contains a main.py file, its own entry point, and tasks.py, where we execute celery tasks. The routers folder contains endpoints that help to communicate between containers through REST API protocol.
The main logic of all our services is in the fastapi_service/app/tasks.py
file. It orchestrates all processes, receives images, does some preprocessing stuff, and starts the tesseract recognition process.
import os
import time
import requests
import json
from routers.ocr.model import PreProsImgResponse
from celery import Celery
app = Celery('tasks', broker=os.environ.get("CELERY_BROKER_URL"))
def check_until_done(url):
attempts = 0
while True:
response = requests.get(url)
if response.status_code == 200 and response.json()['task_status'] == "PENDING" and attempts < 60:
time.sleep(1)
attempts+=1
elif response.status_code == 200 and response.json()['task_status'] == "SUCCESS":
return True
else:
return False
def convert_img_to_bin(img):
response = requests.post(url = "//img_prepro:8000/api/v1/img_prep/img", json={"img_body_base64": img})
task = response.json()
if check_until_done("//img_prepro:8000/api/v1/img_prep/status" + f"/{task['task_id']}"):
url = "//img_prepro:8000/api/v1/img_prep/img" + f"/{task['task_id']}"
response = requests.get(url)
return response.json()['img']
raise Exception("Sorry, something went wrong")
def get_ocr_text(img):
response = requests.post(url = "//tesseract:8000/api/v1/tesseract/img", json={"img_body_base64": img})
task = response.json()
if check_until_done("//tesseract:8000/api/v1/tesseract/status" + f"/{task['task_id']}"):
url = "//tesseract:8000/api/v1/tesseract/text" + f"/{task['task_id']}"
response = requests.get(url)
return response.json()['text']
raise Exception("Sorry, something went wrong")
@app.task(name="create_task")
def create_task(img: str):
try:
bin_img = convert_img_to_bin(img)
text = get_ocr_text(bin_img)
return text
except Exception as e:
print(e)
return {"text": "error"}
bin_img = convert_img_to_bin(img)
text = get_ocr_text(bin_img)
It does some image preprocessing work that helps tesseract to recognize images more accurately and faster via REST API using img_prepro
microservice and starts the tesseract engine inside tesseract_service
via REST API and gets the final results.
First, we need to convert it to a base64 string. We are going to use //codebeautify.org/image-to-base64-converter
get an image string and then with help of POST ///localhost:8001/api/v1/ocr/img
endpoint we will get the task_id.
Using GET ///localhost:8001/api/v1/ocr/status/2591ec33-11d2-4dec-8cf4-cea15e05517e
we are monitoring the task execution status, after receiving SUCCESS
status, we will get a text from
GET ///localhost:8001/api/v1/ocr/text/2591ec33-11d2-4dec-8cf4-cea15e05517e