Table of Links
Abstract and 1 Introduction 2. Data
3. Measuring Media Slant and 3.1. Text pre-processing and featurization
3.2. Classifying transcripts by TV source
3.3. Text similarity between newspapers and TV stations and 3.4. Topic model
4. Econometric Framework
4.1. Instrumental variables specification
4.2. Instrument first stage and validity
5. Results
5.1. Main results
5.2. Robustness checks
6. Mechanisms and Heterogeneity
6.1. Local vs. national or international news content
6.2. Cable news media slant polarizes local newspapers
7. Conclusion and References
Online Appendices
A. Data Appendix
A.1. Newspaper articles
A.2. Alternative county matching of newspapers and A.3. Filtering of the article snippets
A.4. Included prime-time TV shows and A.5. Summary statistics
B. Methods Appendix, B.1. Text pre-processing and B.2. Bigrams most predictive for FNC or CNN/MSNBC
B.3. Human validation of NLP model
B.4. Distribution of Fox News similarity in newspapers and B.5. Example articles by Fox News similarity
B.6. Topics from the newspaper-based LDA model
C. Results Appendix
C.1. First stage results and C.2. Instrument exogeneity
C.3. Placebo: Content similarity in 1995/96
C.4. OLS results
C.5. Reduced form results
C.6. Sub-samples: Newspaper headquarters and other counties and C.7. Robustness: Alternative county matching
C.8. Robustness: Historical circulation weights and C.9. Robustness: Relative circulation weights
C.10. Robustness: Absolute and relative FNC viewership and C.11. Robustness: Dropping observations and clustering
C.12. Mechanisms: Language features and topics
C.13. Mechanisms: Descriptive Evidence on Demand Side
C.14. Mechanisms: Slant contagion and polarization
A. Data Appendix
A.1. Newspaper articles
First, we provide some more info on NewsLibrary. For each article, the NewsLibrary provides the newspaper name, the headline, the date, the byline (if any), and (approximately) the first 80 words of the article. We focus on the first 80 words since, at the time of our data construction (June-August 2019), our subscription allowed us to access these article previews at large. [21] An example of such a newspaper snippet is shown in Figure A.1.
Figure A.1: Example of a local newspaper article snippet
Alameda Times-Star
County outlines ways to lower shelter hostility 8 March 2005
Can Alameda County blunt opposition to current plans to permit emergency homeless shelters at hundreds of residential locations in unincorporated communities? That appeared likely Monday as county planners suggested ways in which shelters – such as in the land-use game Monopoly – would not automatically pass go and neighbors could voice their approval or opposition. […]
In principle, NewsLibrary encompasses around 4,000 unique outlets for 2005-2008. However, for many outlets, there are only a handful of articles available: around 1,500 outlets contain less than 1,000 snippets (for all four years combined). In all our analyses, we only consider outlets with more than 1000 articles. Also, many outlets are not local newspapers in the sense that they cannot be assigned to a county (e.g., the “Army Communicator” or the “Air & Space” magazine). Furthermore, NewsLibrary often lists different editions of the same outlet separately. For instance, “Augusta Chronicle, the (GA)”, “Augusta Chronicle, the: Web Edition Articles (GA)”, and “Augusta Chronicle, the: Blogs (GA)” are listed separately. While our initial corpus covers all 2,618 outlets with >1,000 articles (amounting to almost 50 million article snippets, see Section A.3), our main analyses focus on 305 outlets for which county-level circulation data is available (see Section 2). We also collapse different editions of the same outlet (as in the Augusta Chronicle example) to one observation because the Alliance for Audited Media circulation data is typically not available separately for different editions of the same outlet. The 16 million articles mentioned in the main text refer to the outlets used in our main regression analyses.
[21] Full articles were available on a pay-per-piece basis, which was prohibitively expensive given our broad coverage in time and space.