Information-Theoretic Approach to Prioritize Data for Post-Disaster Survey

With the popularity of sensor-equipped mobile devices, a huge amount of multimedia data, e.g., text, image, audio, video is being generated at unprecedented scale. Harnessing this data for humanitarian efforts, such as disaster response has recently become popular. In disaster response, data triage – prioritizing data collection and data analysis is a crucial phase in disaster response as the generated data, either by human or sensor) is not only large in amount but also noisy.

A popular approach is to prioritize the data based on the importance (i.e.,  the urgency of having the data and further understand it). Thereafter, the important data will be analyzed first, either by human or machine, to make sense out of it. An alternative approach that we proposed in our recent study [1] is to prioritize the data based on the uncertainty of whether or not the data would enhance the situational awareness during disaster, quantified by information entropy. The reason for using uncertainty as the measure of priority is that if the probability of a bridge being damaged is 99%, we are certain about this event and thus not willing to allocate resources (i.e., human) for interpreting the data (i.e., satellite images of the bridge) associated with the bridge. Instead, it makes more sense to spend more resource to analyze the data that we are uncertain about, i.e., the image of a bridge with 50% chance of being damaged.

In [1], we proposed a framework to assess post-disaster data by giving higher priority to the data that contains more information (larger uncertainty).  Thus by assessing the high-priority data first, the framework quickly reduce the total uncertainty of the disaster over time. We evaluated the framework in a case study of inspecting bridges after the 2011 Nisqually earthquake. The results show the superiority of our framework when compared to the current surveying method.

[1] Sasan Tavakkol, Hien To, Seon Ho Kim, Patrick Lynett, and Cyrus Shahabi, An Entropy-based Framework for Efficient Post-disaster Assessment Based on Crowdsourced Data, The 2nd Workshop on Emergency Management using GIS (EM-GIS 2016), San Francisco, USA, October 2016

Differentially Private Publication of Location Entropy

Location entropy (LE) is an eminent metric for measuring the popularity of various locations (e.g., points-of-interest). It has applications in various areas of research, including multi-agent systems, wireless sensor networks, geosocial networks, personalized web search, image retrieval and spatial crowdsourcing. Location entropy can be used to capture the intrinsic diversity of a location without necessarily looking at the functionality of that location (e.g., is it a coffee shop or a private home? is it an airport terminal? is it a park or museum?). To illustrate LE, the figure below shows two locations with the same number of users and the number of visits. Which one do you think is more popular Intuitively, the second location is more popular because all users frequently visit the location as opposed to the first location where the black user visits most of the times.

Picture1.png

Current solutions for computing LE (or location popularity in general) require full access to the past visits of users to locations, which has serious privacy concerns. Thus, in our recent study [1], we proposed a set of techniques based on differential privacy to publish location entropy from raw location visit data without violating users’ location privacy. Our technique would enable data aggregators such as Google, Microsoft and Apple to share their data with many industries and organizations (e.g., academia, CDC) in the form of aggregated or processed location data for the greater good, e.g., research, prevent the spread of disease.

Furthermore, we envision the data aggregators such as Google to use location entropy in two ways. First, location entropy can be used as a metric to find popular locations from location data. Our techniques help to publish such popular locations with high entropy to third parties without violating users’ location privacy, e.g., Niantic could use the published popular locations as PokeStops in the Pokemon Go game. Second, Google may use location entropy as the measure of privacy, in which Google would only reveal a location on a user’ behalf only if the location is a popular place (quantified by location entropy). Instead of directly using location entropy, our techniques add noise to its actual value so that an attacker is not able to reliably learn whether or not a particular user is present in the original data.

[1] Hien To, Kien Nguyen, and Cyrus Shahabi, Differentially Private Publication of Location Entropy, In Proceeding of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2016), San Francisco, CA, USA, October 31 – November 3, 2016