Innovative Software Technology-Revolutionize Text Extraction in HarmonyOS with OCR and Regex

Revolutionize Text Extraction in HarmonyOS with OCR and Regex

Optical Character Recognition (OCR) is a powerful technology for converting images into machine-readable text. However, for many applications, raw text isn’t enough. When developing HarmonyOS applications, the ability to extract specific data like ID numbers, names, or dates using regular expressions (regex) elevates your app\’s functionality significantly.

This guide will walk you through the process of building an HarmonyOS app that:

Integrates camera functionality.
Processes captured images using the AI Kit.
Performs OCR with the Core Vision Kit.
Precisely extracts relevant data using Regex.

Let\’s get started.

Prerequisites

Before you begin, ensure you have:

DevEco Studio installed.
A HarmonyOS project with the following kits enabled:
- CameraKit
- CoreVisionKit
- ImageKit
- AbilityKit
A foundational understanding of TypeScript.

Step 1: Setting Up the Camera

HarmonyOS offers robust camera capabilities through CameraKit. Here’s how to initialize and manage a camera session:

async initCamera(surfaceId: string): Promise<void> {
  this.cameraMgr = camera.getCameraManager(getContext(this) as common.UIAbilityContext);
  let cameraArray = this.getCameraDevices(this.cameraMgr);
  this.cameraDevice = cameraArray[0]; // Back camera
  this.cameraInput = this.getCameraInput(this.cameraDevice, this.cameraMgr)!;
  await this.cameraInput.open();

  this.capability = this.cameraMgr.getSupportedOutputCapability(this.cameraDevice, camera.SceneMode.NORMAL_PHOTO);
  this.previewOutput = this.getPreviewOutput(this.cameraMgr, this.capability, surfaceId)!;
  this.photoOutput = this.getPhotoOutput(this.cameraMgr, this.capability)!;

  // Register listener for photo capture
  this.photoOutput.on('photoAvailable', async (errCode: BusinessError, photo: camera.Photo) => {
    const imageObj = photo.main;
    imageObj.getComponent(image.ComponentType.JPEG, async (errCode, component) => {
      const buffer = component.byteBuffer;
      this.idCardResult = await this.recognizeImage(buffer);
      this.result = JSON.stringify(this.idCardResult);
    });
  });

  // Set up photo session
  this.captureSession = this.getCaptureSession(this.cameraMgr)!;
  this.beginConfig(this.captureSession);
  this.startSession(this.captureSession, this.cameraInput, this.previewOutput, this.photoOutput);
}

// To capture an image:
async takePicture() {
  this.photoOutput!.capture();
}

Step 2: Performing OCR with CoreVisionKit

Once an image is captured, the next step is to process it for text recognition using the textRecognition.recognizeText() API.

async recognizeImage(buffer: ArrayBuffer): Promise<IDCardData> {
  const imageResource = image.createImageSource(buffer);
  const pixelMapInstance = await imageResource.createPixelMap();

  const visionInfo = { pixelMap: pixelMapInstance };
  const textConfig = { isDirectionDetectionSupported: true };

  let recognitionString = '';
  if (canIUse('SystemCapability.AI.OCR.TextRecognition')) {
    await textRecognition.recognizeText(visionInfo, textConfig).then((result) => {
      recognitionString = result.value;
    });
    pixelMapInstance.release();
    imageResource.release();
  }

  return this.extractDataWithRegex(recognitionString);
}

Step 3: Extracting Specific Information Using Regex

OCR provides all detected text, but often we only need specific pieces of information. This is where regex proves invaluable. Here’s how to define patterns and extract matches:

const patterns: RegexPatterns = {
  tckn: /(?:T\.?\s*C\.?\s*Kimlik\s*No|TR\s*identity\s*No)[\s:]*([1-9]\d{10})/i,
  surname: /(?:Soyadı|Surname)[\s:]*([A-ZÇĞİÖŞÜ]+)/i,
  name: /(?:Adı|Given Name)[\s:]*([A-ZÇĞİÖŞÜ]+)/i,
  birthDate: /(?:Doğum\s*Tarihi|Date\s*of\s*Birth)[\s:]*([\d./-]+)/i,
  gender: /(?:Cinsiyeti\s*\/\s*Gender)[\s:]*([EM])/i,
  documentNo: /(?:Seri\s*No|Document\s*No)[\s:]*([A-Z0-9]{5,})/i,
};

function extractDataWithRegex(text: string): IDCardData {
  return {
    tckn: text.match(patterns.tckn)?.[1],
    name: text.match(patterns.name)?.[1],
    surname: text.match(patterns.surname)?.[1],
    birthDate: text.match(patterns.birthDate)?.[1],
    gender: text.match(patterns.gender)?.[1],
    documentNo: text.match(patterns.documentNo)?.[1],
    rawText: text
  };
}

This regex-driven approach grants you precise control over what data is extracted from the OCR output.

Example Output

Consider the following raw text returned by OCR:

T.C. Kimlik No: 1234*******
Adı: MEHMET
Soyadı: YILMAZ
Doğum Tarihi: 01.01.1990
Cinsiyeti / Gender: E
Seri No: A1234**

Our parser would then return a structured object like this:

{
  "tckn": "1234*******",
  "name": "MEHMET",
  "surname": "YILMAZ",
  "birthDate": "01.01.1990",
  "gender": "E",
  "documentNo": "A1234**"
}

Bonus: Releasing Camera Resources

It’s crucial to release camera resources once your application no longer needs them:

async releaseCamera(): Promise<void> {
  await this.cameraInput?.close();
  await this.previewOutput?.release();
  await this.receiver?.release();
  await this.photoOutput?.release();
  await this.captureSession?.release();
}

Conclusion

By effectively combining CameraKit, CoreVisionKit, and Regex, you can develop intelligent and efficient OCR functionalities within your HarmonyOS applications. This method ensures structured and precise text extraction, whether you’re processing ID cards, receipts, or business cards.

Key Takeaways (TL;DR)

Utilize CameraKit for image capture.
Process images with CoreVisionKit for OCR.
Employ Regex to extract structured data (e.g., TCKN, name, date of birth).
Always ensure proper release of camera resources.

References

Authored by Mehmet Algul