Some of today’s children will grow up to be Presidents, artistic luminaries and notorious criminals. A century from now, long after they have completed their noteworthy deeds, historians and biographers will attempt to document their lives and times. And thanks to the shift from written to digital records, those scholars of a future past will face a challenge very different from the job of contemporary academics.
Through Twitter, Facebook and email, a child in 2010 will, over their life, produce a body of writing that dwarfs the collected output of even the most prolific Founding Fathers such as John Adams and Thomas Jefferson. This volume will shift the problems of historical research from the archeological recovery of rare texts and letters to the process of sifting through vast fields of digital information that weave through legal gray areas of corporate and private ownership.
“The problem we are going to face isn’t the loss of literacy, or the end of electricity, but having too much information,” said John Unsworth, dean of the University of Illinois’ Library School. “It’s the abundance problem, not the scarcity problem, that we should be focused on. There’s very little that isn’t recorded [these days]. The big problem we’re going to have is ‘I know it’s in there somewhere, but where is it?’”
Carved In Data
Writing survives through the centuries either through inscription into a durable medium such as stone or animal hide, or by proliferating so thoroughly that the odds favor one copy of a text persisting through time, Unsworth said. While emails and blog posts are not carved in stone, they spread more easily and numerously than any medium in human history, all but ensuring their survival for discovery by future historians.
“Digital information's best hope for survival is its remarkable capacity for proliferation. Even a single email message leaves copies and traces of itself on dozens of servers as it makes its way across the Internet from me to you,” said Matthew Kirschenbaum, the associate director of the University of Maryland’s Maryland Institute for Technology in the Humanities.
“Add in the potential for backup copies at each site, and you start to see what I mean. Once information is ‘on’ the Web it's almost impossible to completely expunge.”
Even though YouTube videos and instant messages seem more fleeting than illuminated manuscripts or stone carvings, almost every bit of information passed over the Internet has been saved somewhere, by someone, said Howard Rosenbaum, an associate professor of information science at Indiana University in Bloomington’s School of Library and Information Science.
“When Gmail first got started, people didn’t read their end user agreement, and they were shocked to realize even if they left Gmail, Google would still save their emails,” said Rosenbaum. “Ebay has saved every transaction that has ever taken place. They save everything.”
In addition to companies and individuals preserving digital information, institutions have also devoted themselves to saving the immense volume of information on the Internet.
The Internet Archive, a nonprofit founded in 1996, has saved almost every version of every publically accessible webpage posted since its founding, Unsworth said. Similarly, the Library of Congress has teamed up with Twitter to save every Tweet.
With that much material saved in so many places, the problem for future historians shifts from one of looking for rare bits of writing to one of mining huge stores of data.
“We’re going to need strategies to deal with lots of information, and they’re going to be computational,” said Unsworth. “The future historian will need to do some data mining.”
Solving the Abundance Problem
As of last year, the Internet Archive has collected data at a rate of 3 Terabytes a day. For comparison, the entire book holdings of the Library of Congress, the largest library in the world, only adds up to about 20 Terabytes, according to the Library of Congress.
Twitter claims to process 50 million Tweets a day. At 140 characters per Tweet, that’s a mass of letters almost 1,400 times the size of the complete works of William Shakespeare produced every 24 hours.
To find the material they want to study, future historians studying the present will need to develop computer programs that can indentify information relevant to their particular interest amongst the noise of the nearly limitless mass of data.
Programs that can separate pertinent text from useless text already exist, such as the software credit card companies use to monitor accounts for suspicious behavior, Kirschenbaum said. Incidentally, some historians have already begun using this technology.
And if historians can learn to wrestle with those large data sets, a whole new field of history could emerge, Rosenbaum said. With such a large number of literate people producing such a large quantity of writing, historians could construct social histories, as opposed to great-man focused narratives, like never before.
“Rather than concentrating on an individual, this database will allow historians to make a profile of an entire population over time,” Rosenbaum said.
However, before historians can apply any searching programs to collected emails and blog posts of future biography subjects, they must first obtain those emails. Since that data belongs to the companies controlling the email or social networking program, historians might find it difficult to gain access to the data.
“The biggest challenge to researchers of the future is not finally going to be technological in my opinion, but legal and social,” Kirschenbaum said.
Corporations vs. History
In the past, personal communications like letters belonged solely to the people sending and receiving them. Notable civic figures often donated their papers to universities or museums, while the surviving family members of other famous people granted historians access to their relative’s correspondences.
But in our digital age, emails and text messages belong as much to the company that owns the communications as they do to the correspondents, Kirchenbaum said.
“Every different online service has its own Terms of Service, and these can make it difficult, almost impossible, for persons other than the individual who created the account to gain access,” Kirschenbaum said. “We've seen this, for example, with servicemen and -women killed overseas, when the family and next of kin tries to get access to their email accounts. It’s not always been possible, and some cases have gone to court. Given that, you can imagine the kind of hurdles scholars and archivists will face.”
This problem is only getting worse. Unlike the early days of the Internet, when people created autonomous, individual web pages for themselves, more and more personal information is ending up on platforms owned by intermediary companies such as Facebook or MySpace.
Not only does that information become the possession of those companies, but the password wall that prevents people from viewing Facebook pages also prevents archival organizations like the Internet Archive from recording the pages, Unsworth said.
“Corporations are legally considered to be persons in U.S. law, and have the same rights, including privacy rights," Unsworth said. “It’s tremendously difficult to get at that stuff, and it rarely last long enough to pass out of privacy restrictions. If Jesus had a really good lawyer, we never would have heard of him.”
To get around this problem, people can explicitly write a note ensuring the release of their emails upon their death. Or, better still for historians, download all their emails on to a hard drive, at which point the emails are no longer under corporate restrictions, Unsworth said.
But even a large portion of the future’s historical documents remain locked in a corporate vault, it won’t be a new problem for historians. From lost languages to missing texts, compiling a narrative from partial information has been part of writing history since the ancient Greek historian Herodotus, and will remain so well into the future.
“It’s true that these are problems, but it’s worth remembering that they’re not new problems,” Unsworth said.
“The cultural record is always partial. The reason it is missing stuff might change, but it will always be missing stuff.”