Bookmarking

May 2012

Long write-up of an old project I worked on for a while

In February, I started work on Recall, a web application that I hope will be a significant improvement on the way people use and save bookmarks. Today, Recall entered closed beta.

Most people who have used bookmarks have used them as provided by their web browser. Normally there is a menu item (and often a key-binding) to save a bookmark and bookmarks can be organised into folders and (perhaps) sub-folders by topic. Sometimes there is a bookmark organiser which provides for limited editing.

A few people use social bookmarking services, which tend to be web applications that allow you to share bookmarks with the public. These often have hooks into web browsers and provide basic tagging and sharing facilities. Sometimes they include some kind of gimmick as well; for example, putting virtual sticky notes onto web pages.

Browsers and social bookmarking services do a reasonable job of saving bookmarks, but they do very little to help users examine, categorise, search and otherwise work with their bookmarks.

If the user has bookmarked a few web pages about a novelist, they might be interested in what books the novelist has written and when. They might be interested to see reviews of that novelist’s works. Probably, they would be interested in seeing if there are any novelists that are considered similar (if someone’s discovering Jack Kerouac for the first time, wouldn’t it be useful if they were shown William S. Burroughs?). In order to draw these kinds of useful inferences, you need to look at many people’s bookmarks in the aggregate. Web browsers can’t do this, so the significance of bookmarking “On the Road” is ignored. Social bookmarking services have the potential to do this, but squander the opportunity with poorly considered interfaces and low-quality browsing

How could you bookmark “On the Road”, anyway? You might bookmark the Amazon web page for the book. Perhaps you would bookmark the Wikipedia article. These are both good information sources, but your web browser won’t make anything of them. To them, all bookmarks are the same: a link and a title. This is short-sighted, because a book and a web page don’t have a lot in common, and both are significantly different objects to the user. A book has individual properties, like a title, an author, an ISBN, and so on. But there’s more to it than that, because a book has social significance, such as membership of a literary movement. Obviously, a computer program can’t understand what it means for a book to be a part of a literary movement, but it can know facts about what a book is considered to be. A good bookmark manager should have strong conceptions about the kind of things that users are really trying to note, such as places. Treating a bookmark as though it is a web page is essentially picking up the semantic information that the user is trying to impart and discarding it.

Web browsers make it hard to share your bookmarks with people. There is no way for you to easily publish your bookmarks for everyone to see, as you might want to do if you are leading a university seminar group or creating a reading list for your company. There’s also no way to share something as a one off, as you might do if you wanted to put all your bookmarks with advice for buying a suit into an email a friend with his first job interview coming up. Being able to easily share bookmarks on either a general or a personal level is an important feature.

Most people these days work from a number of different computers. You might have one for your work, one for your home and perhaps a smart-phone too. When your bookmarks are stored in a web browser, the hassle of moving them around is so serious that most people just don’t bother. Social bookmarking services synchronise your bookmarks between different locations by keeping them in a central database so your bookmarks can be accessed from multiple locations. This seems like a fair solution, but centralised databases can be a large single point of failure, either for technical or political reasons. Magnolia was a social bookmarking service in the late 2000s that suffered catastrophic data loss, taking all users’ bookmarks with it. In 2010, an internal Yahoo presentation announcing that Delicious (a social bookmarking service owned by Yahoo) would be shut down in the near future, caused a chaotic user exodus as users scrambled to save their bookmarks in anticipation of the service being shut down (in the end, they were safe). The proper solution to this problem is to take the approach that some email clients take: keep a copy of all data locally and periodically reconcile that data with a central repository. With this approach, you have local copies of your data, and the ability to access it anywhere. A few web browsers take this approach, but bookmarks can’t be made available to other web browsers, for example if you use your favourite web browser when you’re at home, but are forced to use another at work.

One of the times when you most want to have reading material is when the internet isn’t available. Lots of people have a tube commute, and this would be an ideal time to read through bookmarks. Sadly, there is no phone signal underground (there isn’t even any air conditioning, or hope). One of the advantages of having a local cache of the contents of bookmarks would be the ability to read them and tag when offline. Web browsers don’t store the contents of bookmarks, and no social bookmarking service I know of works offline, but given the above discussion about keeping a local repository there is nothing preventing such a local cache.

Web browsers tend to allow categorisation by hierarchical folders. This works great for a while, until you have things that don’t fit neatly into your hierarchy, or you have things that fit so well into the hierarchy that they could legitimately go in two places. Hierarchical classification is strongly influenced by library classification where the aim is to place a book so that it can be easily found, is near other books of a similar subject and perhaps has shelf adjacency with other books by the same author. This is based around the constraint that books cannot be in two places at once: a book on Computational Biology can’t be both under Biology and under Computer Science. Obviously, this isn’t true of data — a bookmark doesn’t occupy physical space and this is part of the reason that social bookmarking services opt for tagging instead.

Tagging is a fairly popular “Web 2.0” concept. Gmail uses tags with for email, for example. Using tagging, you create your own tags, and apply as many tags to something as you like. Tagging does away with the controlled vocabulary of hierarchical classification. This doesn’t matter so long as you don’t have to collaborate with anyone. Your email inbox is something just for you so tagging makes a lot of sense, but because bookmarks are so often shared, this can cause difficulty. Someone might tag their university reading “articles”, and this will make a lot of sense, because that’s what it is. However, a journalist comes along and tags some things with “articles” because they’ll help him write articles in the future. Afterwards, someone adding lots of books adds one about journalistic writing, and tags it with “articles”, because that’s what it’s about. Three people, all adding tags rationally, and each baffled to see the others use of tags in the public feed. Even worse, this kind of mix up confuses the algorithm for bookmark recommendation, with starts giving irrelevant suggestions. A controlled vocabulary would solve this problem, but the real answer is to apply tags to three important facets of a bookmark: what it is, what it’s about and what it’s for.

In some ways browsers make life hard for themselves by creating unhelpful user interfaces to their bookmarks and not including features such as tags, but in other ways, they couldn’t be a good way to share bookmarks.

There’s three classes of problems that web browsers have. Some of them are even shared by the social bookmarking services. The first are problems that could be fixed with a bit of work and a few changes. These are problems of design — issues like having a menu-based user interface or not working offline.

The second is are problems of conception — issues that could be solved with quite a lot of changes and a re-architecture. Having really good classification facilities and having a better conception of what a bookmark can represent are included. These are the things that web browsers could fix, but would be a huge change.

The final class is problems that web browsers just couldn’t fix, even if they suddenly had the motivation. Web browsers can’t share bookmarks publicly - they aren’t in a position to serve resources over the internet. They can’t make your data available to you wherever they are. Most fundamentally, they can’t use the data gained from analysis of the bookmarks of many users in the aggregate because they don’t store the bookmarks of many users.

I’ve been working on Recall in my spare time for two months now. I want to solve these problems. I’m not there yet (in fact — I’m not even close!), but I do have a start. If you’ve read this far you might want to apply for a place in the closed beta.