Sunday, December 17, 2006

If I May Unsuggest Something...

It seems like a lot of people, particularly authors, are wasting time lately with LibraryThing's Unsuggester. It's a fun little tool for amusement, but it also seems to me that there are some wrong lessons one can take away from the Unsuggester, based on misunderstandings and wrong premises. (I've seen a few authors complaining and fulminating about the people who don't read their books based on an Unsuggester list, for example...)

In case you don't know, the Unsuggester is the flip side of LibraryThing's BookSuggester. LibraryThing itself is an website that allows users to catalog their book collections, and compare those collections to those entered by others. BookSuggester takes one book title as input, and trolls the database of all collections to find out which books are most likely to be also owned by people who own that book. The assumption there is that if someone owns both books, that person is likely to have enjoyed both books, and thus such lists will tend to be indicative of what the user will also enjoy. (Of course, anyone who spends the time required to catalog a decent-sized collection of books is at least mildly compulsive, so the enjoyment assumption is not as strong as it first appears.)

Unsuggester is just the opposite -- it checks to see what books are the least likely to be in the same collections as the input title. Because of the name "Unsuggester," everyone seems to be assuming that means that there is some group of people who "hate" these books, and thus it is saying something about "fans" of particular books. This is entirely false; what Unsuggester is saying is that people who own Book X are notably less likely to own these other books than LibraryThing's general statistics suggest they would. Since BookSuggester's correlations are shaky to begin with, trying to do the same thing in reverse (essentially looking for figures in the negative space between datapoints) is close to reading tea leaves to tell the future.

The first big unexamined suggestion is that users of LibraryThing form a relatively amorphous, undifferentiated population. (Otherwise, the statistical models would simply not be useful.) I think this is assuming a spherical cow; from the results Unsuggester gives, it's clear that there are several different communities using the LibraryThing system, and that they are using it for different purposes. For example, there is at least one community of crafters using LibraryThing to catalog their books -- but it's not clear if those people also read for pleasure, or if they're also cataloging those pleasure-reading books, should they exist.

And that leads into the second assumption: all of these books were read for pleasure, and kept because enjoyed. There are other reasons for reading, and there are clearly communities on LibraryThing who have collections of books for reference and other not directly pleasurable purposes. (BookSuggester and Unsuggester's "if you like..." paradigm is only plausible if the books in the database are there because people liked them.)

Statistical correlations are more dependable directly based on the number of datapoints they compare -- thus, LibraryThing is only as dependable as its users (who are self-selected to begin with -- never a good thing for a random sample -- and who also self-select the books they enter), and is presumably at its most dependable with bestselling books and at its least dependable with little-known works. (So the authors most likely to look for validation here will be the ones that Unsuggester has the least to say to.)

Now, I'm not saying this isn't a fun little tool, but...that's all it is. It's not providing you with valuable insights into the minds of your readers. It's just providing statistical correlations from a biased and unreliable constellation of data. Please obsess about something more useful and directly relevant to your prose (since, lord knows, nobody can stop a writer from obsessing about something.)

