Towards a practical digital library

Library bookcase I started collecting digital content in the 1980s. Before that I was satisfied to print things out, bind them, and put them on a shelf. My graduate research produced about three linear feet of printed papers sorted by author. I wrote my first book mostly from printed references, though all the writing was online. When I started my second book, Authentication, I decided to collect, catalog, and save my references digitally. I stored everything in a tree of folders, one per author, stored alphabetically.

My library now contains several thousand items, from Gutenberg ebooks to marketing brochures to technical papers. It uses over 8 GB of storage, including catalogs and metadata. I used to read classic fiction on Palm Pilots and early smartphones. Now I read everything from fiction to technical reports on a tablet, either Android or iOS. This environment poses a whole set of challenges. I’ve found some tools to make my library work, more or less: Calibre, OPDS, and DRM-free books.

My main objective is Get it Once, Organize it Once, and Read it Anywhere.

Here are the challenges:

Can’t read all books on all devices – If I get a book from iTunes I can’t read it on my Android. Some books are available everywhere, while some are only available from specific vendors. I can cope if a book is DRM-free. Kindle readers are available on almost every likely platform, but the readers aren’t especially sophisticated. If I’m using an ebook edition instead of a paper book, the figures and tables must be readable. Kindle publishers often overlook this detail. I’ve also found that iBooks does a better layout and formatting job than Kindle.
Books are hard to find in a large library – Ebook readers seem to assume no one will keep more that a few dozen books inside their program. When I dropped a thousand or so books on my iPad, iBooks became unusable. I had to scroll through long lists of books, especially for authors/titles towards the end of the alphabet. Imagine searching for a scene near the end of a video, and lacking a “jump to chapter” feature. Some ebook readers, including iBooks try to improve this with search functions, and the iBooks search has definitely improved over the years.
Library organization as an afterthought – Most ebook readers support a notion of “collections” as a way to sort books into categories. The arrangement of collections only applies to the local copy inside the app. There is no way to share the books – and their arrangement by collection – on one device with another that you might own. If you store the same books on several devices, you must organize the collections from scratch on each device.
Storage “in the cloud” is an unkept promise – I can store all of my Kindle-bought books with Amazon and all of my iTunes-bought books with Apple. The thousands of books and papers collected from elsewhere are just out of luck. I can store everything on a commercial service like Dropbox, but the library ends up as one giant list again.

If Only…

Here are a few “almost” solutions that I sometimes rely on for now, at least with iOS:

GoodReader – this is the Swiss Army Knife of iOS apps. It allows you to slurp a hierarchy of files into your i-device, and it has built-in readers for lots of formats: PDF, text, MS office formats, images, video, … almost everything you might want, except for ebooks. The developers rely heavily on built-in iOS features, so the app probably won’t ever appear on Android or other tablets.
Kindle – while its ebook formatting falls short at times, and it won’t serve up books I acquired elsewhere, it is at least available on all the platforms I’m likely to read. If I have no choice but to use a current, DRM-protected book, Kindle is the least painful alternative.

The Android world provides a different set of apps that solve things a little differently. I’ll discuss those another time.

My Better Solution

I construct a cloud-based library that uses the Open Publication Distribution System (OPDS). This is the same catalog format used by Project Gutenberg, the Internet Archive, and O’Reilly Books (the latter two were part of the team that developed the OPDS standard). Many ebook readers (except Kindle, iBooks, and others by serious captive vendors) provide OPDS as a way to find books online and download them.

I use Calibre to collect the documents and their associated metadata (author, title, publication date, etc.) into a database. I use Calibre2OPDS to build the OPDS catalog. I upload the library and catalog to password-protected space on a shared web server. I can search the catalog using a web browser, or I can search it from an ebook reader using OPDS.

I keep the master copy of the library on my desktop computer. After I add books to the library, I rebuild the OPDS catalog. Then I run synchronization software to move new and updated files to the web server.

What Works

All of my books are available anywhere I have Internet access. I have enough storage on my individual devices so that I can download lots of books to prepare for times of inaccessibility.
The library is organized by “most recently added” as well as title, author, and keyword. I can search for a book under any of those listings.
The cataloging software breaks up the listings into reasonable pieces, so I don’t have to scroll through absurdly long lists.
I can host copyrighted things because the password protection prevents others from accessing them (more or less).

What Doesn’t Work

Captive cloud-based readers (Kindle, iBooks) will keep track of the pages I’ve read, notes I’ve taken, and bookmarks I’ve added on any supported device, and share that with all my other devices. The OPDS catalog is read-only, and can’t collect or share such information. A few third-party ebook apps use Dropbox or a similar service to share such information, but the techniques are mostly vendor-specific.
There is no “search box” searching. The catalog is served as static web content and there is no server-side search mechanism. I have to search alphabetically for either author or title or keyword.
I can’t host really sensitive information. The OPDS feed isn’t encrypted, and is vulnerable to sniffing.