By Tmothy B. Lee | Ars Technica | Sept. 4, 2012
Users who participate in BitTorrent swarms for popular files are likely to have their IP addresses logged by monitoring companies within three hours. That's the conclusion of a paper being presented this week at the SecureComm conference in Italy by Tom Chothia and colleagues at the University of Birmingham.
To arrive at this conclusion, the researchers observed "1,033 swarms across 421 trackers for 36 days over 2 years." They reported that "monitoring is prevalent for popular content (i.e., the most popular torrents on The Pirate Bay) but absent for less popular content."
Users who think they can evade detection just by using common blocklists are probably fooling themselves. "Publicly available blocklists, used by privacy-conscious BitTorrent users to prevent contact with monitors, contain large incidences of false positives and false negatives," the Birmingham team concludes.
The BitTorrent protocol relies on servers called trackers to help clients find others interested in swapping pieces of the same file (the total collection of people exchanging the same file with each other is called the "swarm"). When a client joins a swarm, it announces itself to the tracker, which provides a list of other peers on the network. This systems produces two basic ways to monitor a BitTorrent network. With indirect monitoring, peers simply join the network in order to get the tracker to provide a list of the IP address of other network users, but then take no further action. Direct monitoring, on the other hand, involves going a step further and communicating with other peers.
Indirect monitoring provides only weak evidence that peers have engaged in copyright infringement, since clients can join a network without actually swapping files. Direct monitoring can provide more conclusive evidence of infringement, since it allows the monitors to see how much of the file each client claims to have downloaded and—in principle, at least—actually exchange copies of an infringing file with others on the network.
Chothia and his colleagues used several criteria to distinguish between ordinary BitTorrent clients and those that appear to have joined the network only to observe the activities of other peers. First, subnets belonging to monitoring firms tend to have a large fraction of the IP addresses connected to BitTorrent networks, they tend to stay connected to the network for long periods of time, and each IP address tends to connect to many different swarms. Few ordinary users use the BitTorrent network so intensively. These characteristics can all be determined merely by requesting lists of active clients from BitTorrent trackers.
More conclusive evidence can be found by interacting directly with other clients on the network. BitTorrent clients swap files in small pieces called blocks, and they must announce which blocks they have and which they still need in order to facilitate the swapping process. They do this by announcing a bitfield, in which each bit indicates whether the client has obtained a specific block. If a peer is actually trying to download a file, the client should report that it has more and more blocks over time, and it should never report not having a block after previously reporting having it. But the researchers found that not all peers behave this way.
"Although the majority of peers reported steady progression towards completing the download, peers in 20 small subnets always reported completions of between 45 percent and 55 percent," the paper reports. "For these IP addresses, further inspection of the bitﬁelds showed no consistency: they appeared to be generated randomly, rather than reflecting a progressively completing download."
The researchers note that this behavior "was not observed in any of the swarms sharing public domain content; the most likely explanation is that these were monitors."
Using these methods, the researchers compiled a list of IP addresses they suspected of being used for monitoring, then compared them with known information about who is observing the BitTorrent network. In some cases, the IP addresses correspond to firms that have publicly acknowledged that they engage in BitTorrent monitoring. In other cases, the IP addresses belong to firms that are engaged in IP enforcement efforts but have not publicly acknowledged running monitoring software. (The researchers report that one firm publicly acknowledged that it operated monitoring software after the firm's IP addresses had come up in the team's research.) Still other subnets belonged to hosting companies; the researchers speculate these are being leased by copyright enforcement firms.
The researchers also compared the subnets identified in their research with blocklists used by BitTorrent users to prevent their clients from communicating with IP addresses suspected of belonging to monitoring companies. While there was significant overlap between the blocklists and the researchers' own findings, they found both false positives and false negatives. That means that using currently available blocklists cannot protect BitTorrent users from detection, and potential legal problems, for sharing infringing files on BitTorrent.
The research is likely to set off a new round in the perpetual arms race between file-sharers and copyright enforcers. Users will presumably take advantage of the new monitor-detection techniques identified by Chothia et al to produce more accurate blocklists. Monitoring firms may respond by tweaking their monitoring clients to behave more like real clients, and by more frequently changing the subnets they use for monitoring.
In principle, a monitoring client could perfectly emulate the behavior of an ordinary client, completely foiling detection efforts. But to fully emulate the behavior of ordinary clients, monitoring companies would need to avoid making an implausibly large number of connections to the BitTorrent network from any specific subnet. To do that, they would likely need to lease a large number of servers and IP address blocks, which could be cost prohibitive. Given that these firms don't have infinite resources, the arms race between infringing users and enforcement firms isn't likely to stop any time soon.