Authors:
(1) Yan Long, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(2) Chen Yan, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(3) Shilin Xiao, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(4) Shivan Prasad, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(5) Wenyuan Xu, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(6) Kevin Fu, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]).
Table of Links
Abstract and I. Introduction
II. Threat Model & Background
III. Webcam Peeking through Glasses
IV. Reflection Recognizability & Factors
V. Cyberspace Textual Target Susceptibility
VI. Website Recognition
VII. Discussion
VIII. Related Work
IX. Conclusion, Acknowledgment, and References
APPENDIX A: Equipment Information
APPENDIX B: Viewing Angle Model
APPENDIX C: Video Conferencing Platform Behaviors
APPENDIX D: Distortion Analysis
APPENDIX E: Web Textual Targets
Abstract—Personal video conferencing has become a new norm after COVID-19 caused a seismic shift from in-person meetings and phone calls to video conferencing for daily communications and sensitive business. Video leaks participants’ on-screen information because eyeglasses and other reflective objects unwittingly expose partial screen contents. Using mathematical modeling and human subjects experiments, this research explores the extent to which emerging webcams might leak recognizable textual and graphical information gleaming from eyeglass reflections captured by webcams. The primary goal of our work is to measure, compute, and predict the factors, limits, and thresholds of recognizability as webcam technology evolves in the future. Our work explores and characterizes the viable threat models based on optical attacks using multi-frame super resolution techniques on sequences of video frames. Our models and experimental results in a controlled lab setting show it is possible to reconstruct and recognize with over 75% accuracy on-screen texts that have heights as small as 10 mm with a 720p webcam. We further apply this threat model to web textual contents with varying attacker capabilities to find thresholds at which text becomes recognizable. Our user study with 20 participants suggests present-day 720p webcams are sufficient for adversaries to reconstruct textual content on big-font websites. Our models further show that the evolution towards 4K cameras will tip the threshold of text leakage to reconstruction of most header texts on popular websites. Besides textual targets, a case study on recognizing a closed-world dataset of Alexa top 100 websites with 720p webcams shows a maximum recognition accuracy of 94% with 10 participants even without using machine-learning models. Our research proposes near-term mitigations including a software prototype that users can use to blur the eyeglass areas of their video streams. For possible long-term defenses, we advocate an individual reflection testing procedure to assess threats under various settings, and justify the importance of following the principle of least privilege for privacy-sensitive scenarios.
I. INTRODUCTION
Online video calls have become ubiquitous as a remote communication method, especially since the recent COVID19 pandemic that caused almost universal work-from-home policies in major countries [24], [27], [31] and made video conference a norm for companies and schools to accommodate interpersonal communications even after the pandemic [6], [15], [43], [51].
While video conferencing provides people with the convenience and immersion of visual interactions, it unwittingly reveals sensitive textual information that could be exploited by a malicious party acting as a participant. Each video
participant’s screen could contain private information. The participant’s own webcam could capture this information when it is reflected by the participant’s eyeglasses and unwittingly provide the information to the adversary (Figure 1). We refer to this attack as a webcam peeking attack. Furthermore, adversary capabilities will only continue to increase with improvements to resolution, frame rate, and more. It is thus important to understand the consequences and limits of webcam peeking attacks in present-day and possible future settings.
Previous work shows that similar attacks exploiting optical reflection off nearby objects in controlled setups are feasible, such as observing teapots on a desk with highend digital single-lens reflex (DSLR) cameras and telescopes at a distance [25], [26]. The challenge and characterization of peeking using the more ubiquitous webcams, however, are qualitatively different due to the lower-quality images of present-day webcams. The lower-quality webcam images are caused by unique types of distortions, namely the shot and ISO noise due to insufficient light reception, and call for new image-enhancing techniques. In addition, new mathematical models and analysis frameworks are needed to understand the threat model of webcam peeking attacks. Finally, this new threat model requires a dedicated evaluation to clarify the potential threats and mitigations to the average video conference user.
There are many types of media that can leak over optical reflections, including text and graphics. We focus on textual leakage in this work as it’s a natural starting point for measurable recognizability and modeling of the fundamental baseline of information leakage, but also provides insights into the leakage of non-textual information such as inferring displayed websites through recognizing graphical contents on the screen. We seek to answer the following three major questions: Q1: What are the primary factors affecting the capability of the webcam peeking adversary? Q2: What are the physical limits of the adversary’s capability in the present day and the predictable future, and how can adversaries possibly extend the limits? Q3: What are the corresponding threats of webcam peeking against cyberspace targets and the possible mitigations against the threats?
To answer Q1, we propose a simplified yet reasonably accurate mathematical model for reflection pixel size. The model includes factors such as camera resolution and glass-screen distance and enables the prediction of webcam peeking limits as camera and video technology evolve. By using the complex-wavelet structural similarity index as an objective metric for reflection recognizability, we also provide semi-quantitative analysis for other physical factors including environmental light intensity that affect the signal-to-noise ratio of reflections.
To answer Q2, we analyze the distortions in the webcam images and propose multi-frame super resolution reconstruction for effective image enhancement to extend the limits. We then gather eyeglass reflection data in optimized lab environments and evaluate the recognizability limits of the reflections through both crowdsourcing workers on Amazon Mechanical Turk and optical character recognition models. The evaluation shows over 75% accuracy on recognizing texts that have a physical height of 10 mm with a 720p webcam
To answer Q3, we focus on web textual targets to build a benchmark that enables meaningful comparisons between present-day and future webcam peeking threats. We first map the limits derived from the model and evaluations to web textual content by surveying previous reports on web text size and manually inspecting fonts in 117 big-font websites. Then, we conduct a user study with 20 participants and play a challengeresponse game where one author acts as an adversary to infer HTML contents created by other authors. Results of the user study suggest that present-day 720p webcams can peek texts in the 117 big-font websites and future 4K webcams are predicted to pose threats to header texts from popular websites. We investigated the underlying factors enabling easier webcam peeking in the user study by analyzing the correlation between adversary recognition accuracy and multiple factors. We found, for example, user-specific parameters including browser zoom ratio play a more important role than the glass-screen distance. Besides texts, we also explored the feasibility of recognizing websites through graphical content with 10 participants and observed accuracies as high as 94% on recognizing a closed-world dataset of Alexa top 100 websites.
Finally, we discuss possible near-term mitigations including adjusting environmental lighting and blurring the glass area in software. We also envision long-term solutions following an individual reflection assessment procedure and a principle of least privilege. In summary, the goal of this work is to provide a theoretical foundation and benchmark for the study of emerging webcam peeking threats with evolving webcam technologies and the development of securer video conferencing infrastructures. We summarize our main contributions:
∙ Our work quantifies the limits and primary factors that predict the degree of information leakage from webcam peeking by using theoretical modeling and experimentation. This characterization helps predict future unknown vulnerabilities tied to the limits of evolving webcam technologies that do not yet exist.
∙ A benchmark centering on web textual targets that enables comparisons of webcam peeking threats. Our benchmarking methodology builds upon web text design conventions and a 20-participant user study on present-day cameras such that the benchmark can be applied to both hypothetical and emerging cameras in the coming years.
∙ Analysis on near-term mitigations including using software-based blurring filters and changing physical setups as well as possible long-term defenses by proactive testing and following a principle of least privilege. Our analysis investigates the potential effectiveness and implementation methods of different protections.
This paper is