Using AWS Rekognition to Detect Text in Images with PHP

A couple years ago, I tinkered with a solution to use a webcam to capture images of receipts, covert the images to raw text, and store in a database. My scrappy solution worked okay but it lacked the accuracy to make it viable for anything real-world.

With AWS Rekognition launching since then, I figured I’d try it out and see how it compares. I used a fake receipt to see how it’d do.

Like every other AWS product I’ve used, it was incredibly easy to work it. I’ll share the simple script I used at the bottom of this post but, needless to say, there’s not much to it.

While use was a breeze, the results were disappointing. Primarily, the fact that Rekognition is limited to ONLY 50 words in an image. So clearly it’s not a full-on OCR tool.

Somewhat more disappointing was the limited range of confidence scores Rekognition returned (for each text detection, it provides a confidence score). The overall output was pretty accurate but not accurate enough for me to consider it “wow” worthy. Despite this, all of the confidence scores were above 93%.

To be considered an OCR service, AWS Rekognition has a long way to go before it’s competitive as an OCR service. It’s performance in object detection/facial recognition (which is the heart and primary usecase of Rekognition) may be better but I haven’t tested that at this point.

You can view the full analysis and output of the receipt image here.

Below is the code used to generate the output linked above:

<?php
require '/home/vendor/autoload.php'; 
use Aws\Rekognition\RekognitionClient;

$client = new Aws\Rekognition\RekognitionClient([
    'version'     => 'latest',
    'region'      => 'us-west-2',
    'credentials' => [
        'key'    => 'IAM KEY',
        'secret' => 'IAM SECRET'
    ]
]);

$result = $client->detectText([
    'Image' => [
        'S3Object' => [
            'Bucket' => 'S3 BUCKET CONTAINING IMAGE',
            'Name' => 'receipt_preview.jpg',
        ],
    ],
]);

echo "<h1>Rekognition</h1>";
$i=0;
echo "<table border=1 cellspacing=0><tr><td>#</td><td>DetectedText</td><td>Type</td><td>ID</td><td>ParentId</td><td>Confidence</td></tr>";
foreach ($result['TextDetections'] as $phrase) {
  $i++;
    echo "<tr><td>$i</td><td>".$phrase['DetectedText']."</td><td>".$phrase['Type']."</td><td>".$phrase['Id']."</td><td>".$phrase['ParentId']."</td><td>".round($phrase['Confidence'])."%</td></tr>";
}
echo "</table>";

echo "<h1>Raw Output</h1><pre>";
print_r($result);
echo "</pre>";
?>

 

Webcam Captures Text and Stores in MySQL Database

While visiting my family recently, I saw my dad entering numbers from each of the 5-8 ticket receipts he receives daily to keep track of the work he’s done, report for payroll, etc.  I knew there had to be an easier way to collect this information without having to key each ticket manually or without using a clunky, slow scanner.  After a bit of research, I found an API for OCR from Haven OnDemand and I wrote a simple script to use the camera on his laptop to snap pictures of the tickets, scrape the text and position of the text from the tickets, store it all in a MySQL database, and retain the image of the tickets in a digital archive.
Demo: Snapping image via webcam and storing text

Play

The script itself is actually very simple:

<?php
$con=mysqli_connect(localhost,<user>,<pw>,<db>);
$name = date('Y-m-d_H:i:s');
$newname="images/".$name.".jpg";
$file = file_put_contents( $newname, file_get_contents('php://input') );
if (!$file) {
print "Unable to write image to directory.";
exit();
}
else
{
$filePath = 'http://' . $_SERVER['HTTP_HOST'] . dirname($_SERVER['REQUEST_URI']) . '/' . $newname;
$result_json = file_get_contents("https://api.idolondemand.com/1/api/sync/ocrdocument/v1?apikey=<dedacted>&url=$filePath&mode=scene_photo");
$json_a=json_decode($result_json,true);
$result_left=0;
$result_top=0;
$result_widht=0;
$result_height=0;
foreach($json_a[text_block] as $p){
$result_text=htmlspecialchars($p[text]);
$result_left=$p[left];
$result_top=$p[top];
$result_width=$p[width];
$result_height=$p[height];
$sql="insert into image (name,pxleft,pxtop,pxwidth,pxheight,result) values ('$name','$result_left','$result_top','$result_width','$result_height','$result_text')";
$result=mysqli_query($con,$sql);
$value=mysqli_insert_id($con);
}
}
print "$filePath\n";
?>

While the script works well, the API isn’t great at picking up small text or parsing large amounts of text.  It lacks the accuracy (only about 80% accurate) needed to confidently rely on its interpretation of the text.  To compound the issues with the API, the images from the webcam are low quality – shaky hands, varying lighting, etc. drop the quality of the images I’m trying to scrape text from.
This will be a project I keep playing with to improve the accuracy and speed of snapshots and storage.  Until I can figure out a way to improve the accuracy, though, this isn’t very practical.