Sight Tech Global is a mainstream accessibility conference which focuses on big picture technology trends and aims to give a glimpse at what the future of access technology may look like in the next 5 or 10 years. Now in its second year, it is organized by online news site TechCrunch and was entirely virtual and free to attend. The conference featured speakers from Apple, Google, Microsoft, HumanWare, Vispero, and other leading mainstream and access technology companies and covered such topics as tactile images, autonomous vehicles, and indoor navigation. The main stage sessions are available to watch for free and also include full text transcripts. Some of the highlights of the 2021 event are covered below.
The Holy Braille in Action
There has been much talk in recent years on potential solutions to render multiline braille and tactile graphics electronically and on the same screen. Greg Stilson, Head of Global Innovation for the American Printing House for the Blind did more than just talk about the future, he demonstrated it. Stilson fired up an early prototype of a graphics device developed in partnership between APH and HumanWare which he casually referred to as a tactile Kindle. This is not APH's first foray into developing a tactile image device, in fact, their former partnership with Orbit Research yielded the Graphiti, a 40 by 60 pin tactile tablet capable of creating touchable imagery. But one thing missing from that product was the ability to display braille in the form familiar to readers as opposed to jumbo-sized dots.
As Stilson points out, "We can’t change the way that the pin actually feels to somebody. We can’t change the way Braille feels. It’s just too much to change. It’s too much to accept. So, we’re like, OK, the way I look at it is like if you took the way a print font looks, if you took away the way of a specific print font looks to a sighted person, the adoption rate would be far lower. So, I wanted to make sure that we really refined the pin and the way the pin feels so that people are comfortable as soon as they lay their fingers on it."
IN fact, much of the user research performed over the past year examined these concepts, including how many lines of braille to include in the device and how thick to make the graphic lines, major design decisions which will ultimately guide the development of the product.
The importance of having both braille and graphics on the same screen would be hugely beneficial to many types of users, with a math textbook given as one possible use. Imagine a geometry book which includes a story problem, which then references an image. Ideally, a student would be able to read the math problem, and then locate the image to come up with a solution. But to take this idea to the next level, it may someday be possible for that student to touch the area of the image to zoom in and realize more details. Meanwhile, the entire screen could be relayed to a parent or teacher so they can also follow along with what the student is learning. IN the live demo, it took about two seconds for an image or text to be displayed on the machine.
Stilson envisions creating a means for software developers to connect to the unit, potentially enabling many uses beyond APH's educational needs. Application designers could feel what a screen looks like in real-time while designing an app. Travelers could read tactile maps or learn about their surroundings. With the right software, just about any image that is available online could be rendered on the machine. This technology will not be cheap, but Stilson also points out that it currently could cost $30,000 to produce one math textbook. Some of this cost comes from the embossers, and the amount of paper used (think thousands of pages for a large textbook). APH is exploring possible sources for grants to help defray the costs, as the current Federal Quota System used by school districts would not easily support such a large purchase. With a bit of luck and lots of testing, we may see this yet unnamed device by the end of 2023.
Apple Taking Images to the Next Level
This year's event opened with a Q&A session with Apple's Sarah Herrlinger, Senior Director of Global Accessibility Policy & Initiatives and Jeffrey Bigham, Research Lead, AI/ML Accessibility. The letters of AI and ML were common themes of the conference.
Artificial Intelligence, or AI, is the process of solving a task that would normally require human intelligence. Machine Learning, or ML, is a major part of current AI research, and uses huge sets of data to try to detect patterns and make predictions about objects, scenes, and text. For instance, a machine learning model might be fed thousands of images of different trees in order to teach it what a tree looks like and the characteristics they may contain. But not all trees look the same. They come in many shapes and sizes. Some have leaves, while others don't. To compound this further, trees look different when viewed from a different angle. Where older methods for identifying an image may seek to find an exact match for a picture, or something very close, machine learning takes the previous images of trees into account and tries to determine if the next image has these same characteristics. It can also use clues about its surroundings, I.E., most trees would be found in an outdoor setting.
Apple has made huge strides in the depth of image recognition features, which really started to become widely available with iOS version 14 in the fall of 2020. This fall, iOS 15 included a new feature called Live Text, which will allow you to interact with text that is found in a picture. This can be accessed by taking a photo using the Camera app. As an example, if you were to take a picture of a business card which included a phone number, you can select this number and dial it directly from the app. Like most new Apple features, it was accessible with VoiceOver from day 1.
Machine Learning has also been used to create features which are mostly targeted at certain users, such as people who are blind or have low vision. From the photo viewer, VoiceOver users can select an option to explore the image, allowing a user to move their finger around the screen to hear the relationship of items in a photo. When taking a picture of my desk, it identified my laptop, a mixer to its left, the wooden desk, and even some of the text near controls on the mixer. Instead of having all of these items in one large description as is done with some image apps, each element is selectable using regular VoiceOver commands, and I could move my finger around the screen to understand the relationship between these objects.
Machine Learning has also been tapped to provide accessibility to apps which may have not been available in the past. It does this through the Screen Recognition feature which attempts to recognize common controls like buttons or sliders and then convey this information to a VoiceOver user. And while this is exciting, Herrlinger notes the feature is not meant to give app developers a free pass when it comes to making their products accessible.
"We are really encouraging of developers and kind of work with them to make sure that they’re putting the time and energy into the accessibility of their own apps so that members of the blind community have that full experience in a way that is just a little bit better than what we’re able to do with our machine learning models."
Performing these tasks on device is another huge technology advancement. Formerly, phones were not powerful enough to manage all of the data required for recognizing images or text on the device itself, rather, they would upload images to the cloud for processing by high-powered computers. While many apps still use this approach, it often results in slower response times and could lead to security breaches. Allowing for image recognition and other features on the device itself largely negates these challenges, and also allows for these features to work in areas where Internet access is not available. BIGHAM points out that even just 10 years ago, the prevailing thought was that there was not enough computing power in the world to make these huge leaps in Machine Learning advancements. Seasoned iPhone users may recall VizWiz, an app which could give information about what's in a picture. The difference between then and now is that at the time, a human was the one supplying the answer, not a computer. Now, there is a growing amount of data that can be used to provide these descriptions automatically. With numerous ways to learn about your surroundings on an iPhone alone, researchers and engineers will be looking at not only how to improve the computer-generated responses to questions, but also how to make the experience as fluid and seamless as possible for users.
Apple, as they often do, did not give many specifics on features planned for the near future, but it is quite evident this is a major area of focus for the company. They have made numerous advancements in Machine Learning over the past few years, and it will be interesting to follow their developments going forward.
Looking Forward with Google Lookout
Across the virtual aisle, one of the main areas of focus for Google and the Android platform has been Google Lookout, an app available on many modern Android devices which can recognize text, food labels, currency, and objects among other things. One area of emphasis for the Lookout team is finding ways to bridge the gap between the images that are used for training the data models and the pictures they are receiving from users. Often, user photos are less clear or taken at a less than ideal angle, so the results that are presented are not what might be expected. Lookout Product Manager Scott Adams wants to tackle some of these challenges and find ways for users to get the exact pieces of information they are seeking.
"What if only part of that object is visible? Is there a way we can let the person know that and perhaps coach them so they can get better results?"
Another example Adams gives is the text of a newspaper, where there may be multiple headlines and stories on a single page, laid out with many columns and pieces of information. He would love to find a way for the app and users to be able to tell the difference between headlines and articles, and then focus on the article or piece of information they desire. Currently, text is often read as one large block, making it more difficult to discern specific parts of a complex page or document.
Among the new features added in 2021 is support for handwriting, which can be found inside the Documents mode of the app. The decision to include handwriting in the existing documents mode was intentional to create a smoother experience for the user. Support for more devices and languages is also in development, though Senior Software Engineer Andreína Reyna admits this can be challenging given the vast Android ecosystem.
"We think that there’s a group of features or baseline of features that are so important that we really wanted to make sure they worked on all devices. And so, we have been testing and doing this gradual rollout to make sure that the features that we have are supported in all of our devices."
Despite the iPhone being the more dominant choice for blind and low vision users especially in the United States, Android is used by nearly three quarters of the world's mobile phone users, and Lookout remains a key component of Google's accessibility strategy. It's available for free from Google Play.
A Stellar Preview for HumanWare
Dr. Louis-Philippe Massé, Vice-president of Product Innovation and Technologies for HumanWare, gave a first glimpse at a new GPS-based product they are currently calling the Stellar Trek. Building on past innovations from the Victor and Trekker lines, the new product will include both modern GPS features and a built-in camera. It's the latter inclusion that Massé feels will vastly improve the navigation experience for blind travelers. The device is a bit thicker than a smartphone, with two high-resolution cameras on the back, and a simple set of buttons on the front which is intended to be activated using your thumb. A familiar challenge to GPS users is what is often referred to as the FFF, or final forty feet problem, referring to the gap in information between a GPS which guides you to the correct block or area where a location sits and locating the actual door or landmark the user is seeking. The Stellar Trek will use a combination of the cameras, more accurate GPS technologies, and an on-board voice assistant to help solve these challenges. "We want to give additional help such that it will be a little like having a friend helping you going to those final 40 feet. So, we will use the AI and the [AUDIO OUT] to eventually locate potential threats. And sometimes, when you’re on the sidewalk, the path to that door is not a straight line. So, we will say, OK, you will have to go at 10 hours for 40 feet, and then turn to 2 hours, and so on".
The device is envisioned as a stand-alone unit, with no Internet connection required, but cloud connectivity is something that is being explored for the future, perhaps as a subscription add-on or a one-time fee, with an offline option remaining available. Massé cites battery life as one major advantage of a stand-alone unit, with potential battery life of five or six times that of a smartphone. The Stellar Trek is expected to sell for north of $1,000 and be launched this spring.
The Future is Autonomous, but it May be A while
When participants on a panel of experts on autonomous vehicles were asked by moderator Brian Bashin on how long it would take until a blind person would have a reasonable chance to hail a driverless vehicle, the responses were all over the board, ranging from a few years to decades. This represents some of the complexities that various groups are working through while developing and honing next-generation vehicle technology.
There are a variety of considerations to think about when teaching a vehicle how to coexist with a person who is blind, starting with simply recognizing a cane as an object to avoid, since in some cases, a white cane may just blend in with the surroundings. Beyond this, it's important to recognize the varying travel methods of people. For instance, a blind person may explore the curb cut with their cane or react differently to a light change or other nearby objects. The interactions between vehicles and a sighted pedestrian may be different than that same interaction with a blind traveler.
Aside from this, consider the situation where you are at an airport and are trying to find your Uber or Lyft. Currently, you might call the driver and describe what you look like so they can locate you. But an autonomous vehicle has no human at the controls, so alternatives will need to be considered. In Arizona, vehicles can honk the horn by request of a passenger, a feature used by many users, not just those who are blind. But perhaps a horn honk sound is too abrasive, and a more polite sound should be employed. There is also the matter of directing the rider to the vehicle, or to the door at their destination, and doing so in a way that is both efficient and safe.
Based on all of these and many other issues to work out, it becomes clearer why the opinions on when autonomous vehicles will be a normal part of life vary tremendously. So, look for a driverless car in 5 years, or perhaps 50.
More to Explore
There were more great panels and sessions than what we mentioned above, including a look at the future of Amazon Alexa, a panel on indoor navigation and mapping, and a look at the latest development's for Microsoft's Seeing AI app. Be sure to check the agenda page to find all of the main sessions from the conference. Will many of these predictions come to fruition in the near future? Time will tell, but we'll be able to track the progress made when the conference returns for 2022.
This article is made possible in part by generous funding from the James H. and Alice Teubert Charitable Trust, Huntington, West Virginia.