#Amazonfail, the Google Books Settlement, and the importance of open access for preserving cultural heritage: In honor of National Library Week

Over the past two years for National Library Week, I have posted about the importance of openness of publication and accessibility of government information and the limitations of relying on Google. Free Government Information, Public.Resource.org, OpentheGovernment (PDF),  and others, are continuing to do a great job of promoting openness in regards to government (and scholarly) information. Unfortunately, most people are not aware of the great usefulness and importance of government information. But they do know about Amazon, Google, and YouTube, with many among us using them everyday. What would many do to find information if they stopped working?

The #Amazonfail censorship/ glitch / griefing situation last weekend shows the power of publics working together and the organic nature of much of tagging and movementsourcing; people will often be able to create a simple way of communicating information with each other (the first person to use the #Amazonfail tag on twitter used it because it worked as a folksonomy of the situation and it spiralled from there because it was effective). But it also shows the difficulty for all when most rely on one source — Amazon — for information about bestsellers and similar items.

Siva Vaidhyanathan says that #Amazonfail is more than just about crowdsourcing and user tagging, it is about “metadata, cataloging, books, Web commerce, and justice.” A commenter quoted in the New York Times states that “We have to now keep a more diligent eye on Amazon and how they handle the world’s cultural heritage.”

Have we really placed Amazon (and similar companies) in charge of our cultural heritage? Perhaps not directly, but many people have high expectations for these companies’ ability to make information accessible –even if this does not take into account most of the aspects of information literacy.

But libraries differ from these for-profit companies in how they organize information and why they exist. Most libraries are not-profit and their goal is to serve some type of public (what librarians call a patron group). Libraries are generally built on similar organizational systems to each other– such as Library of Congress or Dewey classification, but libraries are intentionally duplicative in their collections. Not only do libraries often have the same item in their collections, but through interlibrary loan, libraries are tied together in a larger network.  And unlike Amazon and Google, even if a library’s online catalog wasn’t working, a user could still use the organizational system to find useful information.

But another major difference is that libraries — and even twitter — directly rely on people for the system to work, not a algorithm, as with Amazon and Google. As we’ve seen with Googlebombing and likely with #Amazonfail, it is possible for an algorithm to be fooled. Or provide inaccurate information.

We rely on Google quite openly, even though sometimes the information is not right. For example, as of when this post is posted, the top result when googling “four stages of tornadoes” gives the blunt answer of “u suck balls” from wiki.answers. This can’t possibly anywhere close to the correct answer to this scientific question, but it is the one Google’s algorithm is choosing!

In my previous posts, I mentioned how what Google has promised from Google Books isn’t what is actually available in many cases. However, some are expecting this settlement between two private/non-public entities to somehow also be a settlement that protects the interests of the public, though there are many that disagree, including Siva Vaidhyanathan, some vehemently. There is a group of professors attempting to intervene in the Google settlement on behalf of the public:

“The proposed settlement will make Google the only company in the world with a license to use orphaned works.  No other company will be able to buy a similar license because, outside the context of the proposed class-action settlement in this case, there is no one from whom to buy such a license….The settling parties plot a cartel in orphaned works.

…  Because exclusive rights in orphaned works do not serve the ultimate purpose of copyright, the public domain has a claim to free, fair use of orphaned works.

We have the right to intervene to present the public domain’s claim to free, fair use of orphaned works.  None of the present parties will present our claim….”

And what about YouTube? While there is much government information on YouTube, what happens if the company goes out of business? Free Government Information ponders whether

agencies that rely on YouTube as a channel of communication keeping copies of the videos they post there? Would they make them available through another channel? What if … libraries had copies?

Relying on private companies — like Google, like YouTube, like West — to give us access to government information — leaves us without options if these access points disappear.

Presently under challenge is access to government-funded scientific information by H.R. 801 – The Fair Copyright in Research Works Act introduced by Rep. John Conyers. If enacted, the bill would reverse the National Institutes of Health (NIH) Public Access Policy regarding public access to taxpayer-funded research and make it impossible for other federal agencies to put similar policies into place. Publicly funded medical research is the metadata of our lives — we don’t see it, but it affects our health and how we live our lives.

Many oppose this bill, including Harvard University, which has written a letter opposing this legislation:

The NIH public access policy has meant that all Americans have access to the important biomedical research results that they have funded through NIH grants. Some 3,000 articles in the life sciences are added to this invaluable public resource each month because of the NIH policy, and one million visitors a month use the site to take advantage of these research papers. The policy respects copyright law and the valuable work of scholarly publishers.

[Instead of passing this bill], Congress should broaden the mandate to other agencies, by passing the Federal Research Public Access Act first introduced in 2006. Doing so would increase transparency of government and of the research that it funds, and provide the widest availability of research results to the citizens who funded it.

Google, Amazon, and the publishing industry — are highly valuable and useful tools and services — but we should not allow closed proprietary systems to determine how we address information that belongs entirely or in part to the public — like the public domain, government publications, and publicly funded studies. And even when “public” information is not at issue, we need to become more wary on relying solely on these systems.

Multiple systems, locations, and means of access are essential to preserve our cultural heritage — as Free Government Information discusses in regards to government information, yet applicable to so much more:

… no single digital archive or repository can ever be as secure and safe as multiple archives, libraries, and repositories. … The nature of digital information is that it can easily be corrupted, altered, lost, or destroyed. It can become unreadable or unusable without constant attention. Relying on any single entity is simply not as safe as relying on multiple organizations. … But this is about more than redundant copies. It is also about relying on different organizations because they have different funding sources, different constituencies, different technologies, and different collections. No single digital collection can ever be as safe as multiple, reliable digital collections.


Our first nerd president?

Arguably, Obama is the first nerd president. (Considering that Thomas Jefferson’s books were the basis of the Library of Congress, my vote is for second nerd president).

Obama, who collected Spider-Man comics as a kid, has now appeared in a sold-out Spiderman comic.
And while Obama has more pressing problems to fix — like the economy — there are “nerd” issues that should be considered, such as intellectual property policy.

The Obama campaign, in Technology and Innovation for a New Generation stated

Intellectual property is to the digital age what physical goods were to the industrial age. Barack Obama believes we need to update and reform our copyright and patent systems to promote civic discourse, innovation and investment while ensuring that intellectual property owners are fairly treated.

Public Resource.org has five suggestions regarding how the government can better serve the public. They include

1. Rebooting .Gov. How the Government Printing Office can spearhead a revolution in governmental affairs…[including making government publications, including caselaw, available in an easier to access format]
2. FedFlix. Government videos are an essential national resource for vocational and safety training and can also help form a public domain stock footage library, a common resource for the YouTube and remix era.
3. The Library of the U.S.A. A book series and public works job program to create an archival series of curated documents drawn from our cultural institutions, …
4. The United States Publishing Academy. …
5. The Rural Internetification Administration …bring[ing] high-speed broadband to 98% of rural Americans just as the Rural Electrification Administration did for electricity in the last century.

While the incoming Obama administration is interested in these issues, some have serious concerns about the implementation. Siva Vaidhyanathan says

the General Services Administration is negotiating with YouTube (a Google service) to post federal hearings, etc.

… there is no clear reason for the government to solidify YouTube’s market dominance. In fact, there is no reason why the GSO could not mandate that all federal agencies post their videos in open forms — accessible, repostable, and mashable — on their own sites.

Then We the People could repost them on YouTube with commentary and maybe some cartoon graphics mixed in. Better yet, because .gov can’t deal with the bandwidth demands of too many folks pulling down popular videos, the federal government should post open format video as bittorrent files.

Maybe the Obama administration can help explain why Nancy Pelosi has Congress’ Youtube channel intro video hosted by cats, Capitol Cat Cam, — with a Rickroll (question: is including a section of Never Gonna Give You Up fair use? I doubt the lawyers of the RIAA would think so!)

An update on locked-up / “owned” government information: In Honor of National Library Week

Just try to read me on Google Book Search!

Works by the U.S. government are in the public domain* — but are they truly available to the public? Some publishers have managed to lock up public domain materials or have not made them accessible as publicly promised.

Government-created public domain materials have been locked away from the public through contract (Westlaw directly with the government) and through default settings that vary from stated policy (Google Books). An additional complicating factor for public domain government documents is their official status or “citability.” I’ll be discussing all of these factors.

No company should be allowed to hold all of the public domain cards through contracts or licensing agreements when the entire deck of the public domain clearly belongs to the public.

So first with the contract/licensing that keeps public domain materials locked away. The most recent highly publicized incident includes the U.S. Government Accountability Office (GAO) legislative histories, an excellent source of information about the reasoning behind why laws were created the way they were.

Since 1921, the GAO has compiled 20,597 legislative histories of most public laws from 1915-1995. Daniel Cornwall at Free Government Information posts about how GAO entered an agreement to scan these documents

with a commercial partner [West] when the GAO office is within driving distance of a number of major universities and when public-spirited organizations like the Internet Archive and Public Resource might have been happy to come up with a solution to provide this taxpayer-funded information at zero cost to the taxpayers and either zero or minimal costs to GAO. Conceivably, there might have been some way for the Government Printing Office to incorporate this into GPO Access.

In regards to the West contract, several commenters have explained the reasons why this situation is deeply troubling:

Why would the GAO enter into a relationship giving a private commercial entity exclusive rights to this valuable public resource? … As far as I can tell, no one from either the GAO or Thomson West has responded to the concerns raised. Robert J. Ambrogi at Legal Blog Watch

Wholesale privatization without a careful, public examination of other, more citizen-friendly, alternatives is not acceptable. Daniel Cornwall at Free Government Information

Locking up these documents is ‘a cautionary tale for any government agency that wants to leverage its records with the help of private enterprise.’ Simon Fodden at Slaw.ca

Fortunately, “rogue archivist” Carl Malamud has made documents relating to the GAO legislative histories available, including a website/ad that has

[West] go[ing] so far as to boast that you should purchase this exclusive “product” from West because the GAO law librarians (public employees!) have done all the work for you!

Unfortunately, this is far from the only example of a private company locking up public domain materials created by a government agency, due to the value of those materials. For example, the OpenCRS project (and others) attempt to make available Congressional Research Service Reports that as a whole are only available from commercial vendors ( Penny Hill Press, LexisNexis, and CRS documents). We the taxpayers are spending over $100 million a year on the production of these reports by government employees — yet we do not have comprehensive access without paying. The House bill introduced to open these reports to the public, H.R. 2545, has had no action since the date of introduction.

Think about how this bizarre situation has turned public domain on its head – a government employee has created public domain documents, a company has acquired them, and the public now needs to go through that company to see those documents. And if you are a subscriber who has paid for access and then you attempt to download all of those public domain documents to make them publicly available – look out! If you do this, you’ve likely violated your contract / license with the company to access those public domain documents, even though those public domain documents have no copyright protection. The commercial vendor considers the license to trump public domain status.

Now if you think locking up materials with exclusive deals is bad, consider being told that government documents are accessible, yet they are not! A year ago, I posted An Open Letter to Google, William Patry, and Google’s Library Partners regarding the amount of public domain materials that according to Google’s own policies should be available for download.

Some of the documents I discussed that should be available are:

These types of documents are still not available. One of the examples given by the Prelinger Library over two years ago of a document that ironically was not available — was the law itself. The specific version of the copyright law mentioned on the Prelinger blog that was digitized September 2005 from the University of Michigan? Still not available as of today!

U.S. Copyright law — yes, the law itself and government documents regarding it– are still not freely available on Google Books here, here, here , and I could go on. I ironically love the copyrighted material notice on this snippet view of Circular 92 — the Copyright Office’s publication of the law.

Since these documents are in the public domain, yet tied up by Google who can we go to correct this? Obviously, my post last year and the complaints of others didn’t do enough. Last year, in addition to posting, I contacted some of the partner libraries — the response (mentioned here) was it was Google’s responsibility to make those documents downloadable.

Siva Vaidhyanathan has frequently written about the dangers of trusting Google too much (see The Googlization of Everything) including in regards to Google Book Search:

We could solve each of the problems [of difficulty of finding materials in books, exclusivity of research sources, and the public’s unwillingness to use print sources] without Google, although it would take a deep commitment from the public and its institutions to make good information more accessible. … Google’s is still the most ambitious plan, however, and its much bolder venture into the world of print offers us at least three reasons to worry: privacy, privatization, and property.

An additional wrinkle to this already complicated situation is “If government documents are freely available can we trust their accuracy? And can we cite them?”

The American Association of Law Libraries (AALL) last year published a report through on authentication of online state legal resources. The results are startling:

Of the five states (Alaska, Indiana, New Mexico, Tennessee, and Utah) which give official status to their online legal resources, none are authenticated and only Utah requires permanent public access. (emphasis added)

While there are many new upstarts trying to make government information available to all (AltLaw, Precydent, PublicResource), because these new services do not provide government documents that are authenticated or published on a trusted vendor (Lexis or Westlaw), these documents aren’t citable in court filings (due to citation rules). Therefore, attorneys and the public still need to rely on paid services to public domain government documents.

Ian Gallacher has suggested a radical solution to the reliance on paid access to court cases and statutes, stating that law schools collectively take up the mantle of assuring that all of the legal public domain materials are available to all. He admits that this will be a struggle; after all, West fought to have its internal pagination system recognized as copyrightable.

But can we continue to rely on companies to allow access to what belongs to us all? I don’t think that is a safe bet to make. There is nothing wrong with non-profit organizations, government entities, and for-profit companies all having access to these materials. I would not want for-profits to be shut out of the marketplace because they should continue to provide value-added services, such as annotations, unique search functions and organization, and people-written analysis. As Jim Jacobs on Free Government Info states,

“The problem as we see it is that so many agencies seem ignorant of the fact that privatizing access to said digitized public domain information actually limits access in the long run.”

Siva Vaidhyanathan’s statement about Google applies more generally to the present status quo leaving the responsibility for access to public domain materials in the hands of companies:

The process of privatization is particularly troubling. Of course, we should not pretend that libraries operate outside market forces or do not depend on outsourcing many of their functions. But we must recognize that some of the thorniest problems facing libraries today — paying for and maintaining commercial electronic databases and cataloging services — are a direct result of rapid privatization and onerous contract terms. … The long-term risk of privatization is simple: Companies change and fail. Libraries and universities last. Should we entrust our heritage and collective knowledge to a business that has been around for less time than Brad Pitt and Jennifer Aniston were together? A hundred years from now, Google may well not exist.

For U.S. citizens, it is the responsibility of us all to push to make public domain documents accessible to us and resist privatization of public domain materials.

*Government works created by the U.S. federal government are not protected by copyright; instead these works (with limited exceptions for materials withheld for security, export control, and policy reasons) are in the public domain.

**Bias statement: I use products by Lexis, Westlaw, and Google every day, due to their usefulness to my work and personal life. These companies have a stated responsibility to their shareholders, not to the United States public.