Out-Law / Your Daily Need-To-Know

EDITORIAL: Every search engine should obtain permission from a website before copying its pages or even snippets of text, according to a ruling by a Belgian court today.

The Court of First Instance in Brussels upheld a previous ruling in favour of newspaper group Copiepresse, as OUT-LAW reported earlier today. Google News and Google's caching of web pages infringe copyright, it said.

A colleague's translation of today's judgment (and she asks me to point out that it is only a rough translation) suggests that the approach of each of the world's leading search engines, Google, Yahoo! and Windows Live Search (formerly MSN Search) is incompatible with Europe's copyright regime. Remarkable as that may seem, lawyers at all these companies likely saw it coming.

These search engines use automated programs called robots to index pages on the web. Nobody expects Google to phone or write a nice letter asking permission in advance before indexing a page. Nobody, perhaps, except Copiepresse. Secretary General Margaret Boribon told me last year that all search engines should obtain permission before indexing pages that carry copyright notices.

The much more practical approach of search engines has been to follow a protocol known as the robots exclusion standard. If a site owner does not want its pages to be found, it says so in a file on its website. That file is always called robots.txt. (See, for instance, the robot instructions of the New York Times, allowing much of the site to be indexed but identifying particular pages and sections as off-limits). That protocol has existed since 1994.

Google, Yahoo! and Windows Live Search don’t just identify pages when their robots visit; they also take snapshots of the web pages. They offer access to these snapshots via links in their search results marked 'cached' or 'cached page'. Following that link, rather than the headline link, takes the user to a page on the search engine's own site – not the target site. The user is seeing a copy and Copiepresse says it's an unlawful copy.

As with the robots.txt file, a meta data command prevents caching. NOARCHIVE is a flag to Google that a page should be excluded from its cache. Experienced site operators know this: you can find the term in the code of many online newspaper pages that charge for archive material.

Google argued in Belgium that where robots.txt and NOARCHIVE commands were missing, a site editor was "explicitly or at least implicitly" consenting that their pages would be indexed and accessible via Google's cache.

That argument has worked in the US. A Nevada court considered the issue and ruled last January (25-page / 201KB PDF) that a failure by a site operator who knew about these protocols and chose to ignore them amounted to "a grant of a licence to Google" for indexing and caching.

The Belgian court felt differently.

It ruled today (44-page / 1.2MB PDF) that it cannot be deduced that the absence of technical protections is an unconditional authorisation. Google's method of storing copyright-protected work in its cache and granting access to the internet user without transferring the user to the original site is an act of unauthorised reproduction and communication to the public, contrary to Belgium's copyright law, it said. Google's situation was even more reprehensible, the court reasoned, because Google News went further than indexing and caching: it reproduced a headline and extract from a third party site.

A literal reading of UK copyright laws might draw the same conclusion. It may be no coincidence that the search engine industry took off in the US, with its more flexible approach to fair use, rather than Europe. The issue probably hasn't arisen here before because it is so much cheaper and easier to follow the established protocols than to sue a search giant.

The court rejected Google's attempt to fit Google News within copyright law's recognition of a right to review. Google News counts articles and classifies them by theme, said the court. This is automated. Google does not give any analytical opinion or comparison or criticise the articles. It cannot fall within the exception of news reporting either, it said. And the failure by Google News to carry a writer's byline was characterised as an attack on the moral rights of an author.

So the court concluded that Google's cache infringes copyright and so does Google News.

The case did not have to address the separate question of whether infringement takes place when a search engine indexes a page to perform its primary search function, a process that involves breaking a page into tiny elements for analysis and cross-referencing in its huge index. That's another argument, discussed briefly in OUT-LAW's previous analysis of this case.

Google has vowed to appeal, but there is a slight twist in today's ruling.* The penalty, if I understand it correctly, is not a disaster for Google. The parties are still disputing whether Google complied fully with an earlier ruling and if it did not, a daily fine applies to for any past non-compliance. Google has now removed the Copiepresse members' content, as I understand it. However, there were other media organisations that supported Copiepresse's claim in the court and the court set a penalty for infringements of their members' work. This penalty strikes me as inconsistent with the court's rejection of an implied licence to copy because it seems to place the onus on these content owners to notify Google by emailing a particular address that an infringement has been spotted – and Google then has a grace period of 24 hours in which to stop that infringement. If it fails, Google pays €1,000 for each day that the infringement continues. But for the appearance of the work in Google News or Google's cache in the first place, Google is not penalised.

This case was more about money than the technicalities of copyright law. Copiepresse made clear that it wants paid for its content appearing in Google News. I can't see Google paying up. So Copiepresse wins a moral victory but its members will surely have lost considerable traffic and consequent ad revenue that Google News brought to their sites. Users will lose access to some news and the use of the cache function. I can't see how anyone wins here.

By Struan Robertson, Editor of OUT-LAW. These are the personal views of the author and do not necessarily represent the views of Pinsent Masons.

* 14/02/2007: These final paragraphs have been updated since the article first appeared last night.

We are processing your request. \n Thank you for your patience. An error occurred. This could be due to inactivity on the page - please try again.