Face Recognition Study - FAQ

 

Faces of Facebook: Privacy in the Age of Augmented Reality


Alessandro Acquisti (Heinz College, Carnegie Mellon University)

Ralph Gross (Heinz College, Carnegie Mellon University)

Fred Stutzman (Heinz College, Carnegie Mellon University)



DRAFT slides: Faces of Facebook: Privacy in the Age of Augmented Reality. Complete results to be presented at BlackHat Las Vegas, August 4, 2011



The authors gratefully acknowledge research support from the National Science Foundation under grant # 0713361, from the US Army Research Office under contract # DAAD190210389, from the Carnegie Mellon Berkman Fund, from the Heinz College, and from Carnegie Mellon Cylab. The authors thank Nithin Betegeri, Aravind Bharadwaj, Varun Gandhi, Markus Huber, Aaron Jaech, Ganesh Raj ManickaRaju, Rahul Pandey, Nithin Reddy, and Venkata Tumuluri for outstanding research assistantship, and Laura Brandimarte, Samita Dhanasobhon, Nitin Grewal, Anuj Gupta, Hazel Diana Mary, Snigdha Nayak, Soumya Srivastava, Thejas Varier, and Narayana Venkatesh for additional assistantship.



Please note: this is a DRAFT document. We will keep adding Q&As if we receive or read relevant questions about the study in comments and emails. Please bear with us as we add content and work towards a final, clean version of this FAQ. Thank you!


Summary


We investigated the feasibility of combining publicly available Web 2.0 data with off-the-shelf face recognition software for the purpose of large-scale, automated individual re-identification. Two experiments demonstrated the ability of identifying strangers online (on a dating site where individuals protect their identities by using pseudonyms) and offline (in a public space), based on photos made publicly available on a social network site. A third proof-of-concept experiment illustrated the ability of inferring strangers' personal or sensitive information (their interests and Social Security numbers) from their faces, by combining face recognition, data mining algorithms, and statistical re-identification techniques. The results highlight the implications of the inevitable convergence of face recognition technology and increasing online self-disclosures, and the emergence of ``personally predictable'' information. They raise questions about the future of privacy in an "augmented" reality world in which online and offline data will seamlessly blend.

General questions


Q. What is this research about?


We studied the consequences and implications of the convergence of three technologies: face recognition, cloud computing, and online social networks. Specifically, we investigated whether the combination of publicly available Web 2.0 data and off-the-shelf face recognition software may allow large-scale, automated, end-user individual re-identification. We identified strangers online (across different online services: Experiment 1), offline (in the physical world: Experiment 2), and then inferred additional, sensitive information about them, combining face recognition and data mining, thus blending together online and offline data (Experiment 3). Finally, we developed a mobile phone application to demonstrate the ability to recognize and then predict someone's sensitive personal data directly from their face in real time.



Q. What were the results of Experiment 1?


Experiment 1 was about online-to-online re-identification. We took unidentified profile photos from a popular dating site (where people use pseudonyms to protect privacy), compared them - using face recognition - to identified photos from social networking sites (namely, we used what of a Facebook profile can be publicly accessed via a search engine; we did not even log on to the network itself), and ended up re-identifying a statistically significant proportion of members of the dating site.



Q. What were the results of Experiment 2?


Experiment 2 was about offline-to-online re-identification. It was conceptually similar to Experiment 1, but we focused on re-identifying students on the campus of a North American college. We took images of them with a webcam and then compared those shots to images from Facebook profiles. Using this approach, we re-identified about one third of the subjects in the experiment.



Q. What were the results of Experiment 3, and how do they relate to "Augmented Reality"?


We use the term augmented reality in a slightly extended sense, to refer to the merging of online and offline data that new technologies make possible. If an individual's face in the street can be identified using a face recognizer and identified images from social network sites such as Facebook or LinkedIn, then it becomes possible not just to identify that individual, but also to infer additional, and more sensitive, information about her, once her name has been (probabilistically) inferred. In our third experiment, as a proof-of-concept, we predicted the interests and Social Security numbers of some of the participants in the second experiment. We did so by combining face recognition with the algorithms we developed in 2009 to predict SSNs from public data. SSNs were nothing more than one example of what is possible to predict about a person: conceptually, the goal of Experiment 3 was to show that it is possible to start from an anonymous face in the street, and end up with very sensitive information about that person, in a process of data "accretion." In the context of our experiment, it is this blending of online and offline data - made possible by the convergence of face recognition, social networks, data mining, and cloud computing - that we refer to as augmented reality.



Q. Are these results scalable?


The capabilities of automated face recognition *today* are still limited - but keep improving. Although our studies were completed in the "wild" (that is, with real social networks profiles data, and webcam shots taken in public, and so forth), they are nevertheless the output of a controlled (set of) experiment(s). The results of a controlled experiment do not necessarily translate to reality with the same level of accuracy. However, considering the technological trends in cloud computing, face recognition accuracy, and online self-disclosures, it is hard not to conclude that what today we presented as a proof-of-concept in our study, tomorrow may become as common as everyday's text-based search engine queries.



Q. What are the implications of this study?


Our study is less about face recognition and more about privacy concerns raised by the convergence of various technologies. There is no obvious answer and solution to the privacy concerns raised by widely available face recognition and identified (or identifiable) facial images. Google's Eric Schmidt observed that, in the future, young individuals may be entitled to change their names to disown youthful improprieties. It is much harder, however, to change someone's face. Other than adapting to a world where every stranger in the street could predict quite accurately sensitive information about you (such as your SSN, but also your credit score, or sexual orientation), we need to think about policy solutions that can balance the benefits and risks of peer-based face recognition. Self-regulation, or opt-in mechanisms, are not going to work, since the results we presented are based on publicly available information.



Q. Face recognition has been around for a long while, and Web 2.0 companies have deployed it in their tools/applications. What is new about this study?


Indeed, in recent times, Google has acquired Neven Vision, Riya, and PittPatt and deployed face recognition into Picasa. Apple has acquired Polar Rose, and deployed face recognition into iPhoto. Facebook has licensed Face.com to enable automated tagging. So far, however, these end-user Web 2.0 applications are limited in scope: They are constrained by, and within, the boundaries of the service in which they are deployed. Our focus, however, was on examining whether the convergence of publicly available Web 2.0 data, cheap cloud computing, data mining, and off-the-shelf face recognition is bringing us closer to a world where anyone may run face recognition on anyone else, online and offline - and then infer additional, sensitive data about the target subject, starting merely from one anonymous piece of information about her: the face.



Q. Who funded your research?


The National Science Foundation (under Grant 0713361) and the U.S. Army Research Office (under Contract DAAD190210389, through Carnegie Mellon's CyLab). We also received support from the Carnegie Mellon Berkman Fund, Heinz College, and CyLab.



Q. Were the tests IRB approved?


Yes, they were approved. As in our previous studies, no SSNs (or faces) were harmed during the writing of this paper.