By Sharon Begley
(Reuters) - In the escalating battle of big data vs. human experts, score another win for numbers.
The most accurate
predictions of which movies the U.S. Library of Congress will deem
"culturally, historically, or aesthetically significant" are not the
views of critics or fans but a simple algorithm applied to a database,
according to a study published on Monday.
crucial data, scientists reported in Proceedings of the National
Academy of Sciences, are what the Internet Movie Database (IMDb.com)
calls "Connections" - films, television episodes and other works that
allude to an earlier movie.
15,425 films in IMDB.com examined in the study, the measure that was
most predictive of which made it into the Library of Congress's National
Film Registry, which honors "significant" movies, was the number of
references to it by other films released many years later.
1972 classic "The Godfather," for instance, is referred to by 1,323
films and television episodes, which as recently as 2014 quoted the
"offer he can't refuse" line, referred to the famous horse-head scene,
or played the theme music, for instance. "Godfather" made the registry
The number of
references to a film more than 25 years after its release was a nearly
infallible predictor of whether it would make the registry, topping 91
percent accuracy, said applied mathematician and study author Max
Wasserman of Northwestern University.
Critics' judgments, Oscar wins, and box-office numbers did not come close.
are nominated for the registry by the public and chosen by the
Librarian of Congress in consultation with a board of experts including
critics, academics, directors, screenwriters and other industry
25-year-lag rule, the 1971 box-office disappointment "Willy Wonka &
the Chocolate Factory" should be in the registry: IMDb lists 52 long-lag
citations to it, the 37th most in the Northwestern analysis.
December, six months after the scientists submitted their paper, the
Library added "Willy Wonka" to the list of 650 cinematic immortals, just
as the research predicted.
have biases that can affect how they evaluate things," said physicist
and co-author Luis A.N. Amaral of Northwestern. "Automated, objective
methods don't suffer from that. It may hurt our pride, but they can
perform as well as or better than experts."
movies identified by the Northwestern algorithm as likely to make the
Registry include "Dumbo," "Spartacus" and "The Shining."
course, humans are not entirely superfluous: flesh-and-blood creators
must decide to refer to an earlier gem in order to establish the crucial
(Reporting by Sharon Begley; Editing by Nick Zieminski)