Table of Links
Abstract and I. Introduction
II. Threat Model & Background
III. Webcam Peeking through Glasses
IV. Reflection Recognizability & Factors
V. Cyberspace Textual Target Susceptibility
VI. Website Recognition
VII. Discussion
VIII. Related Work
IX. Conclusion, Acknowledgment, and References
APPENDIX A: Equipment Information
APPENDIX B: Viewing Angle Model
APPENDIX C: Video Conferencing Platform Behaviors
APPENDIX D: Distortion Analysis
APPENDIX E: Web Textual Targets
VII. DISCUSSION
A. Proposed Near-Term Mitigations
B. Improve Video-conferencing Infrastructure
Individual Reflection Assessment Procedure. Our analysis and evaluation reveal that different individuals face varying degrees of potential information leakage when subjected to webcam peeking. Specifically, various factors of software settings, hardware devices, and environmental conditions affect the quality of reflections. Even for the same user, the potential level of threats varies when the user joins video conferences from different places or at different times of the day. These factors make it infeasible to recommend or implement a single set of protection settings (e.g., what glasses/cameras/filter strength to use) before the actual user settings are known.
Providing usable security requires an understanding of how serious the problem is before trying to eliminate the problem. In light of this, we advocate an individual reflection assessment procedure that can potentially be provided by future video conferencing platforms. The testing procedure can be made optional to users after notifying them of the potential risk of webcam peeking. The procedure may follow a similar methodology as the one used in this work by (1) displaying test patterns such as texts and graphics, (2) collecting webcam videos for a certain period of time, (3) comparing reflection quality in the video with test patterns to estimate the level of threats of webcam peeking. With the estimated level of threats, the platform can then notify the user of the types of on-screen content that might be affected and offers options for protection such as filtering or entering the meeting with the PoLP principle that will be discussed below.
Principle of Least Pixels. Cameras are getting more capable than what average users can understand—unwittingly exposing information beyond what users intend to share. The fundamental privacy design challenge with webcam technology is “oversensing” [28] where overly-capable sensors can provide too much information to downstream processing—more data than is needed to complete a function, such as a meaningful face-to-face conversation. This oversensing leads to a violation of the sensor equivalent to the classic Principle of Least Privilege (PoLP) [52]. We believe long-term protection of users ought to follow a PoLP (perhaps a Principle of Least Pixels) as webcam hardware and computer vision algorithms continue to improve. Thus, we recommend that future infrastructure and privacy-enhancing modules follow the PoLP not just for software, but for the camera data streams themselves. In sensitive conversations, the infrastructure could provide only the minimal amount of information needed and allow users to incrementally grant higher access privileges to the other parties. For example, PoLP blurring techniques might blur all objects in the video meeting at the beginning and then intelligently unblur what is absolutely necessary to hold natural conversations.
C. User Opinion Survey
We collected opinions on our findings of webcam peeking risks and expectations of protections from 60 people including the 20 people who participated in the user study and 40 people who did not. We did not find apparent differences between the two group’s opinions. The overall opinions are reported below.
Textual Recognition. For the discovered risk of textual recognition, 40% of the interviewees found it a larger risk than what they expected; 48.3% thought it was almost the same as their expectation; 11.7% expected worse consequences than what we found. In addition, 76.7% of the interviewees think this problem needs to be addressed while 23.3% think they can tolerate this level of privacy leakage.
Website Recognition. 61.7% of the interviewees found it a larger risk than what they expected; 30% thought it was almost the same as their expectation; 8.3% expected worse consequences than what we found. In addition, 86.7% of the interviewees think this problem needs to be addressed while 13.3% think they can tolerate this level of privacy leakage.
Reflection Assessment. Regarding the proposed idea of reflection assessment procedures that may be provided by video conferencing platforms in the future, 95% of the interviewees said they would like to use it; 85%, 68.3%, 45%, and 20% of the 60 interviewees would like to use it when meeting with strangers, colleagues, classes, and family/friends respectively.
Glass-blur Filters. Regarding the possible protection of using filters to blur the glass area, 83.3% of the interviewees said they would like to use it; 78.3%, 51.7%, 43.3%, and 11.7% of the 60 interviewees would like to use it when meeting with strangers, colleagues, classes, and family/friends respectively.
D. Ethical Considerations
The AMT and user opinion survey received IRB waiver (No.HUM00208544) from the authors’ institutes. The downloaded results are de-anonymized by only keeping their answers and deleting all other identifiable information including worker IDs. The results on the AMT and survey websites are deleted. We provided compensation of $18/h for the workers.
The textual and website recognition user studies are IRBapproved (No.ZDSYHS-2022-5). We ensured that participants and others who might have been affected by the experiments were treated ethically and with respect and anonymized participants with random orders. No personal information other than the videos and questionnaires was collected. The HTML files they used were created randomly by the authors and do not involve the participants’ private information or contain any unethical or disrespectful information. The securely stored videos were used only for this research and not disclosed to third parties or used for other purposes.
E. Limitation & Future Work
This work used human-based recognition to evaluate the performance limits of reflection recognition. In future scenarios such as forensic investigations carried out by specialized institutions, we believe trained expert humans or machine learning methods may be employed to further increase the accuracy of reflection recognition. Compared to machine learning-based recognition, human-based recognition helps us understand the threats posed by a wide range of adversarial parties including even common users of video conferencing, and thus provides an estimate of the lower bound of the limits posed by camera hardware and other factors. We believe it is always possible to improve the attack performance by designing a more sophisticated machine learning model with more parameters, increasing the size and diversity of the training dataset, etc. Further, machine learning recognition is likely to face over-fitting and generalizability problems in webcam peaking due to highly varying personal environment conditions. Thus, we believe limits posed by a machine learning recognition back end are subjected to very large variances and require dedicated future works to quantify.
Certain levels of biases have been introduced in the user study by informing the participants of the study’s purpose. We envision that a future study may conduct a real-world validation of this attack by performing it without participants’ awareness while carefully following ethical regulations. Alternatively, public videos on social media may be analyzed to investigate how often such information leakage happens. A future study could also systematically interview professionals in different types of businesses and explore information leakage conditions, frequencies, and concerns. Contextual factors and user attitudes in real-world situations are complementary to this work’s focus and worth investigating in future research.
Authors:
(1) Yan Long, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(2) Chen Yan, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(3) Shilin Xiao, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(4) Shivan Prasad, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]);
(5) Wenyuan Xu, College of Electrical Engineering, Zhejiang University, Hangzhou, China ([email protected]);
(6) Kevin Fu, Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA ([email protected]).
This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-NODERIVS 4.0 INTERNATIONAL license.
[5] Details and open-source code of this prototype implementation can be found at https://github.com/longyan97/EyeglassFilter.