A Comparison of Speak! and Envision AI, Two Text and Object Recognition Apps for Android

Steven Kelley

Access to Android-compatible text and object recognition just got easier with the launch of the Speak! app, available as a free download from the Google Play store. Envision AI, the other viable Android recognition app, is free for 14 days and then you must either select a subscription plan or limit your monthly use of the app. Plans include a $4.99/month, $39.99/annual, or $199.99/lifetime.

The developer of Speak! Is Eyal Hochberg, an engineer based in Tel Aviv. Hochberg explains that for quite some time he’s been “thinking about how technology can improve the lives of people with disabilities and give them more independence.” Speak! is a project he pursues in his free time. According to Hochberg, the cloud-based services driving AI products can be costly, and some companies offset the cost by charging a subscription. Hochberg explains that with Speak!, on the other hand, “I take these services and work very hard to enhance them to get as close as possible to cloud quality. Another benefit of this approach is that the app works offline. For users who pay for Internet data that might make a big cost difference.”

How does Speak! compare to Envision AI? While we might all applaud the developer’s desire to keep the app free, if it doesn’t provide the same level of quality as Envision AI, it may not be as useful.

Getting Started and Recognizing Text with Speak!

Speak! offers many of the same features found in Envision AI. When the app is first opened, you will find three main buttons at the bottom of the screen: Read Text, Scan, and Google. When you double tap the Read Text button, the app snaps a picture of whatever is in front of the camera and begins reading any recognized text. This can be compared to the Document button in Envision, within the Text menu. Speak! is self-voicing, so it’s not necessary for TalkBack to be on for text to be spoken. You can pause, fast-forward, and reverse the reading using buttons at the bottom of the screen. Adjust the speech rate in the Settings menu on the top right of the screen.

One of the immediate differences in capturing text on Speak! versus Envision AI is that Speak! does not prompt you regarding framing the image. When a target document isn't framed properly, Envision AI might prompt with “Not all edges are visible” and will capture the image automatically once the edges are visible, or when you tap the screen. Though not always accurate, this is certainly a handy feature. Speak! will automatically align text when processing, though it may take some practice to position the camera to capture all the text on a page.

The Scan Menu

The Scan button has a sub-menu that includes Text, Barcode, Object, and Color.

Text

The Text button on Speak! works similarly to the Instant button on Envision. When selected, both apps will look for text within the camera frame to interpret. In this mode, Speak! does offer you guidance by vibrating as it frames the text, and the app begins reading automatically once text is framed. Speak! and Envision AI both permit a quick new read of the next item by tapping either the Restart on Speak! or Instant on Envision.

Barcode

Tapping the Barcode menu item in Speak! initiates a search for a barcode within the camera frame. While testing this feature, I specifically chose dimmer light and several cylindrical products. Speak! And Envision AI were nearly equal in their performance. Envision AI, however, took several tries to locate the barcode on a spice bottle, while Speak! located it very quickly.

When a barcode is identified, Speak! opens the product specs in Google, which can then be read by TalkBack or Select to Speak. Envision AI, on the other hand, speaks the name and a brief description of the item. If you're in a grocery store, you might find the Envision AI barcode scan handier because it's faster. If you're looking for more product details, you might find the immediate link to Google from Speak! to be more useful.

Object

Mobile object recognition using a low-cost app is still in its infancy. Once selected, the Object mode in Speak! stays on continuously. As a result, when the camera is moved from scene to scene, or object to object, the app speaks the various items it interprets. Envision AI, on the other hand, takes a picture of what the camera is pointing to when the Describe Scene button is tapped. After processing for a second or two, the app describes the scene. Overall, some of the descriptions from Speak! seemed more accurate, but both apps provided results that seemed to be in an “experimental” state. Speak! identified a flowerpot correctly that Envision AI described as a bench. Speak! identified the cab of a truck as a vehicle, whereas Envision AI described it as a “man sitting on a bench.” In all fairness, I recommend approaching the object recognition features in existing AI apps with great optimism—the potential far outweighs the issues in these early experiments.

As an aside, in each of the scenes tested for object recognition, I obtained more meaningful results using the TapTapSee app, which still relies on a network of humans to interpret scenes visually.

Color

Speak! offers a color identifier that toggles between two modes: Basic on or Basic off. With Basic on, the color responses tend to be general, like "green," "brown," "black," etc. With it turned off, you will hear more precise descriptions like, “olive green,” “saddle brown,” “dark grey,” etc. Determining the accuracy was difficult because at any given time the camera frame may have multiple colors in it. When the camera frame was filled with tree leaves, “green” was reported. A black iPad cover was reported to be “dark grey,” and a white t-shirt was reported as “silver.” Envision AI does not offer a color identifier.

Google

The last button on the bottom of the Speak! home screen is Google. When first opened, you'll be prompted to download the Google Lens app. Once installed, tapping this button prompts a Google response to the items in the picture. A picture of a small ladder received a response of several bench seats with a price comparison. A picture of my computer was correctly identified as a MacBook Pro and I was offered several similar images. Google Lens uses Google’s AI for object identification, and apparently to offer additional information about the objects recognized in a scene. If you don’t like the responses you get when you tap the Object menu in Speak!, you have the option to ask Google to take a crack at it!

Recognizing Text: A Comparison Between Speak! and Envision AI

Text recognition—the Read Text button and the Text menu item found in the Scan button—may be two of the most widely used features on Speak! app. Speak! automatically recognizes a wide variety of languages, including: English, French, Spanish, Portuguese, Italian, German, Dutch, Danish, Turkish, Swedish, Finnish, Hungarian, Romanian, Czech, Slovak, and Vietnamese. Unlike Envision AI, Speak! did not initiate a flash from the camera in dim lighting, so text recognition comparisons were done using three identical light conditions: dim, a portable LED task light, and daylight.

Using Read Text, in Speak! and Document in Envision AI, the overall results were worst in dim light, marginally better with the LED light, and best in daylight. I used a magazine-style print booklet with a newsprint-sized, serif font. In both the dim light and LED light trials, after processing I was able to make out the general highlights of the article, but many of the details were unclear—letters were left off some words, some words were completely skipped, and the like. After making several attempts, each with the same results, I scanned the page with KNFB Reader for the sake of comparison. Under all lighting conditions, KNFB Reader provided much more accurate results. It should be noted, however, that the KNFB Reader app is not a free app, although the Android version costs less than an annual subscription to Envision.

If your goal is quick identification or skimming text, the document reading modes in both Speak! and Envision are about equal. Unless the lighting conditions and printed fonts are ideal, however, neither app has an accuracy level that would be high enough for a student or professional to rely on.

The Speak! app offers several great features for the user with low vision, including the ability to magnify and change the contrast and colors of text by selecting the Show Text button once the app begins reading the text out loud. When Show Text is activated, two sliders appear directly below the text window. The top slider enlarges and wraps the text within the window. By default, text appears in high contrast white-on-black, however, by moving the bottom slider from left to right, you can change the view to black text on yellow, blue, green, or red backgrounds. By default, converted text is copied automatically to the clipboard so it can be pasted into a document or email, as needed.

In Speak!, when using the Text menu item in the Scan button instead of using the Read Text button, the spoken results seemed more accurate. Both Envision AI and Speak! permit you to select a text-to-speech engine built into the phone, for text conversion if the phone is offline. On my phone, both apps were set to use Google TTS (text to speech) and it's unclear if this difference in performance is just subjective, or, in fact, the result of a different TTS option used in the Scan >Text menu item.

Overall, the Speak! app is intuitive to open and begin using. An email to the developer regarding a tutorial was responded to promptly, and in detail. He explained, “…users don't read those, the app strives to be intuitive. The only explanations I have are on the Google Play page.” Like many other users, I often dive into an app and begin using it before referring to any documentation, and Speak! is certainly one that can immediately be used productively without documentation. But I often do go to the Help or Tutorial documentation for specific items or clarification, and for new users this might be a great addition.

The Bottom Line

Speak! is a welcome addition to the Android toolbox of text and object recognition apps. While the text recognition did best with optimal lighting, its accuracy level was equal to that of the Envision AI app, and both are best suited to quickly identifying and skimming a document. The higher-level accuracy a student or professional may need will require a stand-alone OCR app such as KNFB Reader. Speak! will benefit from some type of user guidance in the Read Text mode, so you can tell if the target document is completely framed before taking the picture. Barcode reading and color identification features were both quick and accurate. Neither app provided reliably accurate results with object recognition. You will get better results for object recognition with another app, like TapTapSee, which is also free. You can Download Speak! and download Envision AI from the Google Play store. Users of Speak! will find a menu item, Contact Developer, in the Settings menu to forward suggestions or ask for guidance.

This article is made possible in part by generous funding from the James H. and Alice Teubert Charitable Trust, Huntington, West Virginia.

Comment on this article.

Related articles:

More by this author:

Author

Steven Kelley

Article Topic

Product Evaluations and Guides

A Comparison of Speak! and Envision AI, Two Text and Object Recognition Apps for Android

Getting Started and Recognizing Text with Speak!

The Scan Menu

Text

Barcode

Object

Color

Google

Recognizing Text: A Comparison Between Speak! and Envision AI

The Bottom Line

Take Action Today

Partner with us

Donate

Have questions?

A Comparison of Speak! and Envision AI, Two Text and Object Recognition Apps for Android

Getting Started and Recognizing Text with Speak!

The Scan Menu

Text

Barcode

Object

Color

Google

Recognizing Text: A Comparison Between Speak! and Envision AI

The Bottom Line

Partner with us

Donate

Have questions?

Sign up for the AFB Newsletter

Follow Us