Table of Links
Abstract and I. Introduction
II. Threat Model & Background
III. Webcam Peeking through Glasses
IV. Reflection Recognizability & Factors
V. Cyberspace Textual Target Susceptibility
VI. Website Recognition
VII. Discussion
VIII. Related Work
IX. Conclusion, Acknowledgment, and References
APPENDIX A: Equipment Information
APPENDIX B: Viewing Angle Model
APPENDIX C: Video Conferencing Platform Behaviors
APPENDIX D: Distortion Analysis
APPENDIX E: Web Textual Targets
II. THREAT MODEL & BACKGROUND
A. Threat Model
In this work, we study the webcam peeking attack during online video conferences, where the adversary and the victim are both participants. We assume the device the victim uses to join the video conference consists of a display screen and either a built-in or an external webcam that is mounted on the top of the screen as in most cases, and the victims wear glasses with a reflectance larger than 0, i.e., at least a portion of the light emanated by the monitor screen can be reflected from the glasses to the webcams. We do not enforce constraints on the devices used by the adversary. When the adversary launches the attack, we assume the victim is facing the screen and webcam in the way that the screen emanated light has a single-reflection optical path into the webcam through the eyeglass lens’s outer surface. We do not assume the adversary has any control or information on the victim’s device.
We assume that the victim’s up-link video stream is enabled during the attack, and the adversary can acquire the down-link video stream of the victim. The adversary can achieve that by either directly intercepting the down-link video stream data, or recording the victim’s video with the video conferencing platform being used or even third-party screen recording services. Since the webcam peeking attack does not require active interaction between the victim and the adversary, the adversary does not need to attempt a real-time attack but can store the video recording and analyze the videos offline.
B. Glasses
The most common types of glasses that people wear in a video conferencing setting are prescription glasses [40] and blue-light blocking (BLB) glasses [11], [50]. BLB glasses can either have prescriptions with BLB coating or be nonprescription (flat). The reflectance and curvature of glass lenses are the two most important characteristics in the process of reflecting screen optical emanations.
Reflectance. Reflectance of a lens surface is the ratio between the light energy reflected and the total energy incident on a surface [5]. Reflectance is wavelength-dependent. The higher the reflectance, the more light can be reflected to and captured by a webcam
Curvature. Curvature of a lens surface represents how much it deviates from a plane. The concepts of curvature, radius, and focal length of an eyeglass lens are used interchangeably on different occasions and are related by: Curvature = 1/Radius = 2/FocalLength. Smaller curvature leads to larger-size reflections. Both the outer and inner surfaces of a lens can reflect, but the outer surface often has smaller curvature and thus produce better quality reflections (Appendix A). This paper refers to the eyeglass lens curvature/radius/focal length as that of the outer surface if not specified otherwise.
C. Digital Camera Imaging System
Digital cameras have sensing units uniformly distributed on the sensor plane, each of which is a Charge-coupled Device (CCD) or Complementary Metal-oxide-semiconductor (CMOS) circuit unit that converts the energy of the photons it receives within a certain period of time, i.e., the exposure time, to an amplitude-modulated electric signal. Each sensing unit then corresponds to a “pixel” in the digital domain. The quality of a digital image to human perception is mainly determined by its pixel resolution, color representation, the amount of received light that is of our interest, and various imaging noise. The 2 key imaging parameters that are closely related to webcam peeking attacks are described below.
Exposure Time. Theoretically, the longer the exposure time, the more photons will hit the imaging sensors, and thus there can be potentially more light of interest captured. The images with a longer exposure time will generally be brighter. The downside of having a longer exposure time is the aggravated motion blur when imaging a moving object.
ISO Value. The ISO value represents the amplification factor of the photon-induced electrical signals. In darker conditions, the user can often make the images brighter by increasing the ISO value. The downside of having a higher ISO is the simultaneous amplification of various imaging noises.
D. Text Size Representations
It is important to select proper representations of text size in both digital and physical domains since the size of the smallest recognizable texts is the key metric for webcam peeking limits. When texts are digital, i.e., in the victim’s software such as browsers and in the webcam image acquired by the adversary, we use point size and pixel size to represent the text size respectively. In the physical domain, i.e., when the texts are displayed on users’ screens as physical objects, we use the cap height of the fonts and the physical unit mm to represent the size as it is invariant across different computer displays and enable quantitative analysis of the threats. Cap height is the uniform height of capitalized letters when font style and size are specified and is thus usually used as a convenient representation of physical text size and the base for other font parameters [22], [23].
Authors:
(1) Yan Long, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(2) Chen Yan, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(3) Shilin Xiao, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(4) Shivan Prasad, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(5) Wenyuan Xu, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(6) Kevin Fu, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]).
This paper is