Detecting Textual Reuse in News Stories, At Scale

Authors

  • Tom Nicholls Reuters Institute for the Study of Journalism, University of Oxford

Keywords:

computational methods, news production, churnalism, news agency, automated content analysis, online news

Abstract

Motivated by the debate around “churnalism” and online media, this article develops, evaluates, and validates a computational method for detecting shared text between different news articles, at scale, using n-gram shingling. It differentiates between newswire copy, public relations material, source-to-source copying, and common-source and incidental overlaps. I evaluate the method, quantitatively and qualitatively, and show that it can effectively handle newswire content, copying, and other forms of reuse. Substantively, I find lower levels of news agency and press release copy reuse than is suggested by previous studies, and conclude that the news agency finding is robust, but the lack of press release copy found might reflect limitations of the method and the changing practices of journalists.

Author Biography

Tom Nicholls, Reuters Institute for the Study of Journalism, University of Oxford

Research Fellow, Reuters Institute for the Study of Journalism, University of Oxford

Downloads

Published

2019-09-09

Issue

Section

Articles