Dictionopolis and Digitopolis are still deeply separate and often-warring cities, despite various peace offerings over the years. Dictionopolisians in particular scorn the idea that mere numbers could tell them anything important about words.
Ben Blatt gently disagrees. Nabokov's Favorite Word Is Mauve is his argument.
He gathered the full texts of a number of books -- about 1,500, primarily novels, including classics, recent #1 New York Times bestsellers, and recent award winners -- and ran various tests of bits of literary advice and supposed genre markers, to see if the books generally considered better or more commercially successful actually did this or that more consistently.
There are obviously quibbles to be had about his methods: literature is a vast field, and even the fairly large data-sets Blatt assembled for this book are small compared with the vast ocean of published fiction. (Though he also runs some tests against fan-fiction, where he has an even larger pool, and is a good comparison to traditionally-published work.) But his tests seem well-planned to me, and I didn't catch him claiming anything the data didn't support -- this is a measured, reasonable book with disclaimers about assuming too much on too-slim evidence.
So Blatt starts off by looking at the common advice to avoid adverbs, expands on that a bit to other kinds of words and constructions new writers are often told to eschew, and explains his methodology in the first chapter. After that, he dives right into the question of whether male and female writers have definable differences: this is the most interesting chapter in the book, and I'd love to see it expanded with further research into larger data-sets. Later chapters investigate the differences between UK and US writers -- besides the obvious giveaway word choices -- how to determine the authorship of an anonymous work, first lines, cliches, and the question of what idiosyncratic words particular authors use more than anyone else.
It is filled with charts, and has an extensive section of notes at the end (including lists of all of the books used for the tests). Frankly, this book is about as good as I could have hoped it would be, and better than I expected. I thought I'd have my usual "the author has completely neglected to consider X, which makes conclusions G, H, and K very shaky" argument to make, and Blatt was conscientious and organized enough to forestall all of those. (Others may have different objections, but mine were mostly minor -- he sometimes says "New York Times bestseller" when he's only looking at a set of books that hit #1, and all publishing hands know the set of all bestsellers is vastly more diverse than the very top few.) And his lists are all chosen, as far as I could tell, by the closest things to objective criteria he could find, so his own tastes and preferences don't seem to enter into selection criteria or the data-set in general.
This is not exactly science. But it's the closest thing to science I've ever seen applied to the world of literature, and has interesting, reasonable results based on defined, transparent data-sets and test cases. That is really impressive, and any reasonably numbers-obsessed book-lover should definitely take a look at this one.
No comments:
Post a Comment