Data Scraping from Web Services

This month’s Wired magazine has a perceptive article about so-called "data scraping" or "screen scraping" practices.  It discusses the practical aspects of data scraping (such as IP address banning or blocking as a practical remedy to prevent scraping), use of cease and desist letters, and use of properly-licensed web services application programming interfaces (API’s) as a way to control such practices.

The article does not provide any detail about underlying legal theories or court cases to prevent data scraping, such as those based on the Computer Fraud and Abuse Act (CFAA) or court cases concerning unfair competition.

Source: Should Web Giants Let Startups Use the Information They Have About You?, by Josh McHugh

2 Responses to Data Scraping from Web Services

  1. Wm Kapke says:

    This ends abruptly- you don’t state any conclusions or offer any knowledge on the subject.

    Can you offer any?

  2. Harry says:

    Thanks for your comment. My brief blog posting was merely intended to make note of an article, and point out the issue of screen scraping and data scraping.

    The issue raises a number of legal concerns, such as:
    - Copyright infringement and limitations of copyright law as it applies to databases, as well as fair use of copyrighted works;
    - The evolving common law doctrine of “trespass to chattels” (i.e., intermeddling with personal property as contrasted to real estate) as it applies to excessive use of computer resources;
    - Violation of the Computer Fraud and Abuse Act (CFAA) by an unauthorized intrusion;
    - Violation of anti-circumvention requirements of the Digital Millennium Copyright Act (DMCA); and
    - Breach of online contracts, such as obligations arising from web site Terms of Service (TOS) or clickable online agreements.

    Permission by the data owner also raises questions. For example, the Robots Exclusion Standard or Robots.txt protocol is an advisory approach, which can prevent scraping by “law-abiding” bots, such as the Googlebot used by Google to index web content. Rogue bots simply ignore it. Separately, end users may give permission to a data scraper to act as their agent to scrape “their” data from multiple web sources, for example, to scrape their personal data for use in aggregated form with a single web service.

    For a very comprehensive discussion of the issue, please see Ian C. Ballon, Bots, Screen Scraping, Content Aggregation and the Evolving Doctrine of Database Trespass, 6 Cyberspace Lawyer 15 (May 2001). That seven page article also gives practical suggestions for data base owners and data scrapers.

    For a briefer discussion, see an excerpt from the 2009 Supplement of Michael D. Scott, Scott on Information Technology Law (Aspen, 3d Ed.).