User:ThomasW/Notes What the Web Said Yesterday

From XPUB & Lens-Based wiki

Lepore, Jill (2015) The Cobweb, Can the Internet be archived?, newyorker.com[Online] Available: hhttp://www.newyorker.com/magazine/2015/01/26/cobweb (24.11.2015)

The average life of a Web page is about a hundred days

BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.

Facebook has been around for only a decade; it won’t be around forever. Twitter is a rare case: it has arranged to archive all of its tweets at the Library of Congress. In 2010, after the announcement, Andy Borowitz tweeted, “Library of Congress to acquire entire Twitter archive—will rename itself Museum of Crap.” Not long after that, Borowitz abandoned that Twitter account. You might, one day, be able to find his old tweets at the Library of Congress, but not anytime soon: the Twitter Archive is not yet open for research.

To overwrite, in computing, means to destroy old data by storing new data in their place; overwriting is an artifact of an era when computer storage was very expensive.

But a 2013 survey of law- and policy-related publications found that, at the end of six years, nearly fifty per cent of the URLs cited in those publications no longer worked. According to a 2014 study conducted at Harvard Law School, “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”

As Licklider saw it, books were good at displaying information but bad at storing, organizing, and retrieving it. “We should be prepared to reject the schema of the physical book itself,” he argued, and to reject “the printed page as a long-term storage device.” The goal of the project was to imagine what libraries would be like in the year 2000. Licklider envisioned a library in which computers would replace books and form a “network in which every element of the fund of knowledge is connected to every other element.”

Copyright is the elephant in the archive. One reason the Library of Congress has a very small Web-page collection, compared with the Internet Archive, is that the Library of Congress generally does not collect a Web page without asking, or, at least, giving notice. “The Internet Archive hoovers,” Abbie Grotke, who runs the Library of Congress’s Web-archive team, says. “We can’t hoover, because we have to notify site owners and get permissions.” (There are some exceptions.) The Library of Congress has something like an opt-in policy; the Internet Archive has an opt-out policy.

When the Conservative Party in Britain deleted ten years’ worth of speeches from its Web site, it also added a robots.txt, which meant that, the next time the Wayback Machine tried to crawl the site, all its captures of those speeches went away, too. (Some have since been restored.) In a story that ran in the Guardian, a Labour Party M.P. said, “It will take more than David Cameron pressing delete to make people forget about his broken promises.

But Britain’s legal-deposit laws mean that the British Library doesn’t have to honor a request to stop collecting.

It’s extremely audacious,” Illien says. “In Europe, no organization, or very few, would take that risk.” There’s another feature to legal-deposit laws like those in France, a compromise between advocates of archiving and advocates of privacy.

Kahle is a digital utopian attempting to stave off a digital dystopia. He views the Web as a giant library, and doesn’t think it ought to belong to a corporation, or that anyone should have to go through a portal owned by a corporation in order to read it.

It’s hard not to worry that the Wayback Machine will end up like the computer in Douglas Adams’s “Hitchhiker’s Guide to the Galaxy,” which is asked what is the meaning of “life, the universe, and everything,” and, after thinking for millions of years, says, “Forty-two.” If the Internet can be archived, will it ever have anything to tell us?