August 2010 Issue  Volume 11  Number 4

Web Issues

An Introduction to the Open Library Project

Until fairly recently, the number of sources providing books in any format accessible to people who are blind or visually impaired was quite limited. The high cost of producing braille or recording human narration rendered most off-the-shelf titles unattainable by most. While the National Library Service's (NLS) program was the most popular reading source for many, it still left many bibliophiles longing for more.

For the last 25 years, technology has made it possible to convert printed materials into electronic formats, but only recently has this power truly been explored. Websites like Project Gutenberg and Bookshare have amassed collections that include tens of thousands of titles. This may seem like a lot until one browses the collection at the Library of Congress, the world's largest with over 20 million books alone. Clearly, access to the printed word has a long way to go.

However, a new project undertaken by the Internet Archive seeks to change this disparity. The Internet Archive website, perhaps most well-known for its snapshots of websites dating back to 1996, recently launched an accessible books search engine on Open Library, a free service offering over 600,000 titles in an accessible DAISY format to qualified individuals. As Jon Hornstein, a spokesperson for the Internet Archive explains it, the goal is to build a collection "that will scale well and promote the mission of the Internet Archive, which is universal access to knowledge."

How It Works

The operation is perhaps one of the more elaborate when it comes to creating accessible materials. Over 200 people work in 20 scanning centers across the world to digitize printed material. Many of these centers are housed at major libraries in such places as Boston and San Francisco. Although public-domain books were initially the primary focus, Open Library began to include more modern titles this past May. Once a book is scanned, it is processed by specialized software that converts the book's text into a variety of formats. For public-domain books, everything from a plain-text version to a file suitable for Amazon's Kindle is produced. For books not in the public domain, only an accessible DAISY version is made available for download. Individuals who have received an NLS decryption key for their digital talking book player can read these books.

The entire collection is available on the Open Library website for perusal and download with no account sign up or registration fee. It's apparent the organization performed quite a bit of research when designing the site, as many navigational aids are included to allow for easier browsing. The National Federation of the Blind and San Francisco Lighthouse were also consulted and provided valuable input, according to Hornstein. The result is a very accessible site offering thousands of books free for the taking.

The Collection

In browsing the collection, Open Library's roots as a source of public-domain books become quite apparent. Many searches return titles that are several hundred years old--great for historians, but perhaps a bit disappointing for those seeking the latest bestsellers.

The collection is rapidly expanding, however, in large part due to a donation drive launched earlier this year. The archive has pledged to scan and make available the first 10,000 books it receives at its donation center. With a turnaround time of roughly three to four weeks, this presents a potential opportunity for those seeking accessible versions of books they wish to read for pleasure, business, or school. Many more recent titles, in fact, are already available, ranging from Stephenie Meyer's Twilight series to Glenn Beck's Common Sense.

Testing the Books

This entire discussion would be moot unless the DAISY files provided worked with modern digital talking book players. And while there still is some work to be done on this front, the responsibility now falls largely with the manufacturers of these players. To explain, when support for NLS digital books was introduced in current models, it was assumed these books would be audio only, as this is the book format the NLS provides. Open Library breaks new ground by not only piggybacking on this NLS key for its own format, but by offering text-based DAISY books using this key. Because of this, many players that do not expect to see text in an NLS-authorized book will not play them. This situation is due to change shortly, surmises Hornstein, as the Internet Archive is working with manufacturers to allow for this format.

In our lab tests, only the HumanWare VictorReader Stream successfully played the encrypted titles from the Open Library site, and this only after upgrading to the latest firmware. To play books using the stream, place them in the same folder you would NLS titles. On the other hand, BookSense and PLEXTALK stumbled and did not provide any legible output. The public-domain books, however, were read by all of the players in our tests, and should be usable by any standard DAISY reader.

What's Ahead?

Open Library was created when engineers at the Internet Archive realized the potential of DAISY-formatted books and were able to adapt their current scanning methods to support universal access to Internet Archive materials. These same advances may enable more service enhancements in the future. Hornstein is exploring the possibility of server-side text-to-speech for book titles, which would allow for DAISY audio versions of books to be created dynamically. Expanding the collection to an international audience is also a goal of the archive. "We do fully intend to make the material available to people worldwide; we just need to make sure we do it in a way that doesn't violate our copyright. It's a real patchwork of potential solutions."

Room for Improvement

In addition to the work that still needs to be done to ensure compatibility with a wider variety of book readers, some areas of the website could also be improved. Although there is a checkbox to search for eBooks, this does not search for accessible titles, as one might expect. Rather, only public-domain books are included in these results. There is a special page to search for encrypted accessible books, but selecting many of the links off this page returns the user to the search of all books in the collection, making it somewhat difficult to separate accessible titles from those with just bibliographic information. In addition, there is little documentation on how to properly copy the DAISY books to a compatible player, or even which players are currently supported. This information would be a welcome addition, especially for new visitors. These minor quibbles are pretty common with websites undergoing growing pains, and we would expect them to be ironed out over time.

The Big Picture

To the employees of the Internet Archive, the Open Library project is not a competitor to other offerings such as Bookshare or the NLS's own program. For example, there are no plans to include human-recorded narration, a feature often desirable for textbooks and titles with a plethora of diagrams and pictures. Even Bookshare, which perhaps provides the closest to this style of service, is different, according to Hornstein. "Bookshare focuses more on books that are in print. We offer books that are not readily available in other places," he comments, referring to the millions of out-of-print books published since 1923, the cut off for much copyrighted material. And considering that most of the more than 20 million books available at the Library of Congress are still not available in an accessible form, it's a commendable undertaking.

The Internet Archive is a 501(c)(3) nonprofit and accepts donations of books to scan as well as financial contributions. To learn more about the book drive, please visit the Open Library's book drive webpage.

Previous Article | Next Article | Table of Contents

Copyright © 2010 American Foundation for the Blind. All rights reserved. AccessWorld is a trademark of the American Foundation for the Blind.