Comparing AWS Textract and AWS Rekognition to extract text from images using PHP

A few months ago I tried using AWS Rekognition to detect text in images. The results were okay for casual use cases but overall the quality was pretty poor (primarily because Rekognition isn’t intended to be used as an OCR product).

A few days ago (May 29), AWS announced the general availability of Textract, an actual OCR product. Out of curiosity, I wanted to run the same image I ran through Rekognition through Textract to compare the difference. While Textract isn’t 100%, it’s a huge improvement over Rekognition (as should be expected since it’s intended for this).

Below is a side-by-side comparison of the results from the two services:

Textract Results Rekognition Results
DetectedText Confidence DetectedText Confidence
DANS PUMP AND GO 98% DANS PLMP AND GO 98%
15238 MAIN ST 99% 15238 MAIN ST 100%
NEWTOWN 100% NEWTOWN 100%
CAROLINA 93812 96% CAROLINA 93812 97%
ST-TX: 11089984 99% ST-TX: 11089987 (555) 708-2224 98%
(555) 708-2224 100%
2014-02-25 IW424534:9338300 07:09 99% 2014-02-25 TW420534: 34:9338300 07:09 94%
TERMINAL: 509338300 OPER: A 89% TERMINAL: 509338300 OPER: A 99%
Fuel 99% Fuel (G) ($/G) 99%
(G) ($/G) 98%
($) 95% ($) 99%
Pump 9 80% Pump 9 97%
Premium 93% Prem ium 40.000 1.345 53.80* 98%
40.000 1.345 53.80* 98%
Total Owed 97% Total Owed 53.80 98%
53.8 100%
TOTAL PAID 100% TOTAL PAID 100%
CREDIT CARD 100% CREDIT CARD 53.80 99%
53.8 99%
VISA 100% VISA *kkkkkkkkkkk4597 98%
$4,444,444,440,597 58%
INV. 972821 AUTH. 545633 99% INV. 972821 AUTH. 545633 99%
Purchase 98% Purchase 100%
S 0010010010 00 127 89% S 0010010010 98%
00 APPROVED – THANK YOU 92%
94%
IMPORTANT 100%
95%
Retain This Copy For Your Records 100%

As always, here’s my source used for this test:

<?php
require '/home/bitnami/vendor/autoload.php'; 

use Aws\Textract\TextractClient;
use Aws\Exception\AwsException;

$client = new TextractClient([
'version' => 'latest',
'region' => 'us-west-2',
'credentials' => [
  'key' => '', //IAM user key
  'secret' => '', //IAM user secret
]]);

try{
  $result = $client->detectDocumentText([
    'Document' => [
      'S3Object' => [
        'Bucket' => 'fkrekognition',
        'Name' => 'receipt_preview.jpg'
      ],
    ],
  ]);
} catch (AwsException $e){
  echo "<pre>Error: $e</pre>";
}

echo "<h1>Textract</h1>";
echo "<img src=https://s3-us-west-2.amazonaws.com/fkrekognition/receipt_preview.jpg><br />";
$i=0;
echo "<table border=1 cellspacing=0><tr><td>#</td><td>BlockType</td><td>Text</td><td>Confidence</td></tr>";
foreach ($result['Blocks'] as $phrase) {
  if($phrase['BlockType']=="LINE"){
    $i++;
    echo "<tr><td>$i</td><td>".$phrase['BlockType']."</td><td>".$phrase['Text']."</td><td>".round($phrase['Confidence'])."%</td></tr>";
  }
}
echo "</table>";
  
echo "<h1>Raw Output</h1><pre>";
print_r($result);
echo "</pre>";
?>

 

Using AWS Rekognition to Detect Text in Images with PHP

A couple years ago, I tinkered with a solution to use a webcam to capture images of receipts, covert the images to raw text, and store in a database. My scrappy solution worked okay but it lacked the accuracy to make it viable for anything real-world.

With AWS Rekognition launching since then, I figured I’d try it out and see how it compares. I used a fake receipt to see how it’d do.

Like every other AWS product I’ve used, it was incredibly easy to work it. I’ll share the simple script I used at the bottom of this post but, needless to say, there’s not much to it.

While use was a breeze, the results were disappointing. Primarily, the fact that Rekognition is limited to ONLY 50 words in an image. So clearly it’s not a full-on OCR tool.

Somewhat more disappointing was the limited range of confidence scores Rekognition returned (for each text detection, it provides a confidence score). The overall output was pretty accurate but not accurate enough for me to consider it “wow” worthy. Despite this, all of the confidence scores were above 93%.

To be considered an OCR service, AWS Rekognition has a long way to go before it’s competitive as an OCR service. It’s performance in object detection/facial recognition (which is the heart and primary usecase of Rekognition) may be better but I haven’t tested that at this point.

You can view the full analysis and output of the receipt image here.

Below is the code used to generate the output linked above:

<?php
require '/home/vendor/autoload.php'; 
use Aws\Rekognition\RekognitionClient;

$client = new Aws\Rekognition\RekognitionClient([
    'version'     => 'latest',
    'region'      => 'us-west-2',
    'credentials' => [
        'key'    => 'IAM KEY',
        'secret' => 'IAM SECRET'
    ]
]);

$result = $client->detectText([
    'Image' => [
        'S3Object' => [
            'Bucket' => 'S3 BUCKET CONTAINING IMAGE',
            'Name' => 'receipt_preview.jpg',
        ],
    ],
]);

echo "<h1>Rekognition</h1>";
$i=0;
echo "<table border=1 cellspacing=0><tr><td>#</td><td>DetectedText</td><td>Type</td><td>ID</td><td>ParentId</td><td>Confidence</td></tr>";
foreach ($result['TextDetections'] as $phrase) {
  $i++;
    echo "<tr><td>$i</td><td>".$phrase['DetectedText']."</td><td>".$phrase['Type']."</td><td>".$phrase['Id']."</td><td>".$phrase['ParentId']."</td><td>".round($phrase['Confidence'])."%</td></tr>";
}
echo "</table>";

echo "<h1>Raw Output</h1><pre>";
print_r($result);
echo "</pre>";
?>

 

Webcam Captures Text and Stores in MySQL Database

While visiting my family recently, I saw my dad entering numbers from each of the 5-8 ticket receipts he receives daily to keep track of the work he’s done, report for payroll, etc.  I knew there had to be an easier way to collect this information without having to key each ticket manually or without using a clunky, slow scanner.  After a bit of research, I found an API for OCR from Haven OnDemand and I wrote a simple script to use the camera on his laptop to snap pictures of the tickets, scrape the text and position of the text from the tickets, store it all in a MySQL database, and retain the image of the tickets in a digital archive.
Demo: Snapping image via webcam and storing text

The script itself is actually very simple:

<?php
$con=mysqli_connect(localhost,<user>,<pw>,<db>);
$name = date('Y-m-d_H:i:s');
$newname="images/".$name.".jpg";
$file = file_put_contents( $newname, file_get_contents('php://input') );
if (!$file) {
print "Unable to write image to directory.";
exit();
}
else
{
$filePath = 'http://' . $_SERVER['HTTP_HOST'] . dirname($_SERVER['REQUEST_URI']) . '/' . $newname;
$result_json = file_get_contents("https://api.idolondemand.com/1/api/sync/ocrdocument/v1?apikey=<dedacted>&url=$filePath&mode=scene_photo");
$json_a=json_decode($result_json,true);
$result_left=0;
$result_top=0;
$result_widht=0;
$result_height=0;
foreach($json_a[text_block] as $p){
$result_text=htmlspecialchars($p[text]);
$result_left=$p[left];
$result_top=$p[top];
$result_width=$p[width];
$result_height=$p[height];
$sql="insert into image (name,pxleft,pxtop,pxwidth,pxheight,result) values ('$name','$result_left','$result_top','$result_width','$result_height','$result_text')";
$result=mysqli_query($con,$sql);
$value=mysqli_insert_id($con);
}
}
print "$filePath\n";
?>

While the script works well, the API isn’t great at picking up small text or parsing large amounts of text.  It lacks the accuracy (only about 80% accurate) needed to confidently rely on its interpretation of the text.  To compound the issues with the API, the images from the webcam are low quality – shaky hands, varying lighting, etc. drop the quality of the images I’m trying to scrape text from.
This will be a project I keep playing with to improve the accuracy and speed of snapshots and storage.  Until I can figure out a way to improve the accuracy, though, this isn’t very practical.