Big Data, Big Questions| Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data

Authors

  • Kevin Driscoll USC Annenberg School for Communication & Journalism
  • Shawn Walker University of Washington

Keywords:

methodology, data, big data, social media

Abstract

Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the “fire hose” provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.

Author Biographies

Kevin Driscoll, USC Annenberg School for Communication & Journalism

PhD candidate    9/2013

Shawn Walker, University of Washington

PhD Student    9/2013

Downloads

Published

2014-06-16

Issue

Section

Special Sections