Research Challenges in Spatial crowdsourcing

This post tries to answer 2 questions.

  1. Briefly define spatial crowdsourcing (i.e., mobile crowdsourcing)
  2. Research challenges & proposed solutions

What is Spatial Crowdsourcing?

I am sure many of you know crowdsourcing, applications like Amazon Mechanical Turk, where you have a bundle of tasks that could be difficult for the computer to solve. So you give them to people and the crowd can do the tasks for you, let say one dollar per task. Spatial crowdsourcing is similar to crowdsourcing, except people must travel to the task location in order to perform it. For example, going to a specific place to take a picture or capture a video. The question is why suddenly spatial crowdsourcing become popular The reason is that 1) smartphones are now so popular 2) there are many sensors within them and 3) the network quality is getting higher, like 4G LTE. With those advancements, you now can now take a video and upload to server anywhere. Those three also enable us to develop SC applications listed below.


Examples of Spatial Crowdsourcing Applications

For example, we can report traffic condition on the way with Waze while OpenSignal helps us to collects data on wireless coverage and tell us the best carrier network. We actually build a system for collecting video data in our lab. Its name is MediaQ & iRain.

More information about spatial crowdsourcing can be found here.

Research challenges in spatial crowdsourcing can be found [24].

Research challenges & proposed solutions

As SC is a special case of crowdsourcing where workers need to physically present at task locations in order to perform the task, research challenges in crowdsourcing are also applicable in SC. Those challenges includes but not limited to 1) worker quality [1,2,3,4], 2) trust issue [5,6,7] (workers may not be trustful), 3) how much to pay [8,9], crowdsourcing database [12,13] and user studies [10,11]. However, I here focus on challenges that are unique to SC, particularly dynamism, privacy and trust.

The first one is the dynamic nature of the workers and tasks because they can come and go anytime. For one time instance is optimal, if we know everything. However, we don’t know the future.

With server-assigned SC [14], the objective is to maximize the number of assigned tasks (i.e., utility) given task set and worker set, referred to as Maximum Task Assignment (MTA). Note that not all the tasks are assigned because of the spatial constraint between workers and tasks. Constraint 1: A worker is likely to perform tasks in his/her proximity, defined as a working region. A working region may cover multiple tasks and a task may be covered by multiple working regions. Constraint 2: each worker may also have a capacity on the maximum number of tasks his/her can perform in a period of time (i.e., one-time instance), referred to as maxT. With the two constraints, in [14] MTA is reducible to max flow problem, which can be solved optimally in polynomial time, referred to as BASIC.

In [14], MTA is also extended to multiple time instances, where the server continuously receive workers and tasks and the goal is to maximize the number of assigned tasks in multiple time instances. The challenge is that the server has no idea about the future tasks and workers; therefore, the global optimum is not possible. However, the BASIC approach can be significantly improved given the (global or partial) knowledge of worker density. The intuition is that a task is likely to be performed in the future if it is in worker dense area and vice-versa. Therefore, we assign a higher priority to the tasks in worker sparse area, referred to as density-based heuristic. Another distance-based heuristic tries to minimize the average travel distance of the workers by assigning a higher priority to tasks whose covering workers are close. Both density-based heuristic and distance-based heuristic improves the BASIC algorithm in terms of #assigned tasks and the average travel distance, respectively.

[15] is an extension of [14], in which multiple kinds of spatial tasks require different skill-sets the workers to be performed best, e.g., the task of taking a high-quality picture is assigned to a photographer or a chef can be a good candidate to perform the task of rating a particular dish in a restaurant. With this assumption, to maximize overall task assignment, we define maximum score assignment (MSA) problem and show that MTA is a special case of MSA. MSA (one instance) is reducible to the maximum weighted bipartite matching problem (BASIC) while the heuristics density-based heuristic and distance-based heuristic can also be applied to improve the BASIC solution.

[16] is another extension of [14], in which we define the notion of complex tasks consisting of some spatial sub-tasks. A complex task require assignments of all of its sub-tasks! For example, professional works are often complicated and could be divided into smaller and atomic sub-tasks, e.g., one wants to obtain pictures of 5 specific buildings and he requires pictures of all of those buildings; none of them is allowed to be missed, otherwise the capturing of all other buildings becomes useless. With this problem setup, the problem of maximizing complex task assignment (MCTA) can be transformed to max flow problem by adding dummy nodes. Similarly, max flow algorithm guarantees optimal solution with on time instance.

The second challenge is location privacy of the workers. Current solution requires the workers to send their locations to a centralized server. And someone with access to this server can infer sensitive information, such as their health status or religious views. For example, if someone is visiting cancer center someone can infer he probably has cancer.

Location privacy is one of the major impediments that may hinder workers from participation in spatial crowdsourcing systems. In [17], we focus on protecting the location privacy of the workers, our goal is to develop an efficient framework for worker-selected spatial crowdsourcing that enables the participation of the crowdsourcing users (i.e., requesters and workers) without compromising their privacy (i.e., the server may be not trustful). The framework enables the server to efficiently assign tasks to workers without knowing workers’ locations. The experiment results show that the cost of privacy (communication cost and travel cost) is practical, particularly the increase in travel cost is minimal, which is important and somewhat surprising. A demonstration of this study is showcased in [18], which provides a toolbox for multiple parties, cell service provider, SC administrator, end users to evaluate the impact of different factors in the private SC framework. Recently, in [21], we published a journal version that subsumes [17], in which we deal with dynamic datasets (i.e., workers moves) and the trustworthiness of the assigned tasks (i.e., redundant task assignment).

There are two different perspectives of the privacy problem in [17], from the data owner (cell service provider – CSP) and adversaries (crowdsourcing service provider – SC-server). From the CSP perspective, the framework enables CSP to release their customer data to SC companies with strong privacy protection. From the SC-server’s point of view, the framework enables SC companies to use sanitized data to provide crowdsourcing services for end users (i.e., workers/requesters).

The last challenge is the issue of trust. That is how to assign tasks to workers that guarantee quality of the tasks

In [19], a trustworthy SC framework is proposed. The main idea is to assign a task to multiple workers such that the aggregated reputation of the assigned workers satisfy the confidence level of the task.

Future directions

The study in [17] can be extended to protecting both privacy of both workers and requesters at the same time. In fact, in a study of TaskRabbit – a real-world SC app, we show that adversary can infer both requesters’ homes and worker’s trajectory. This is SC-related problem because requesters’ homes and workers’ trajectories are closely related. There are existing studies on protecting either locations or trajectory, but there is no work on combining them. Having a framework to release both data would be interesting.

Note that our privacy framework is minimal and generic. That is, for each task, the goal is to find a geocast region that contains sufficient users. This framework would be applicable for other applications. For example, Starbucks may want to send ads (e.g., deal of the day, discounts) to all nearby customers via push notification without knowing their locations.

There are other directions that we have not explored yet, including 1) how do we know worker actually went to the task location (spatial assurance) 2) how much to pay the workers? 3) methods to guarantee that the quality of the task responses.

[1] 2010 – Corroborating information from disagreeing views

[2] 2010 – Learning From Crowds

[3] 2010 – Quality Management on Amazon Mechanical Turk

[4] 2011 – How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy

[5] 2011  – Truth in Crowdsourcing

[6] 2011 – Iterative Learning for Reliable Crowdsourcing Systems

[7] 2012 – Whom to ask? jury selection for decision-making tasks on micro-blog services

[8] 2009 – Financial Incentives and the “Performance of Crowds”

[9] 2010 – The Labor Economics of Paid Crowdsourcing

[10] 2008 – Crowdsourcing User Studies With Mechanical Turk

[11] 2010 – Running experiments on MTurk

[12] 2011 – CrowdDB: Answering Queries with Crowdsourcing

[13] 2011 – Human-powered sorts and joins.

[14] Leyla Kazemi and Cyrus Shahabi, GeoCrowd: Enabling Query Answering with Spatial Crowdsourcing, ACM SIGSPATIAL GIS, Redondo Beach, CA, November 2012

[15] Hien To, Leyla Kazemi, and Cyrus Shahabi, A Server-Assigned Spatial Crowdsourcing Framework, Journal ACM Transactions on Spatial Algorithms and Systems , Volume 1 Issue 1, Article No. 2, New York, NY, USA, August 2015

[16] Hung Dang, Tuan Nguyen, and Hien To, Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing, in Proceedings of International Conference on Information Integration and Web-based Applications & Services, Vienna, Austria , 2-4 December 2013

[17] Hien To, Gabriel Ghinita, and Cyrus Shahabi, A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing, In Proceedings of the 40th International Conference on Very Large Data Bases, Pages 919-930, Hangzhou, China, September 2014

[18] Hien To, Gabriel Ghinita, and Cyrus Shahabi , PrivGeoCrowd: A Toolbox for Studying Private Spatial Crowdsourcing, 2015 IEEE 31st International Conference on Data Engineering (ICDE), (demonstration), Page 1404 – 1407, Korea, 13-17 April 2015

[19] Leyla Kazemi, Cyrus Shahabi, and Lei Chen, GeoTruCrowd: Trustworthy Query Answering with Spatial Crowdsourcing, International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), Orlando, Florida , November 5-8, 2013

[20] Hien To, Mohammad Asghari, Dingxiong Deng, Cyrus Shahabi. SCAWG: A Toolbox for Generating Synthetic Workload for Spatial Crowdsourcing. In proceeding of International Workshop on Benchmarks for Ubiquitous Crowdsourcing: Metrics, Methodologies, and Datasets (CROWDBENCH 2016), Sydney, Australia, March 14-18, 2016

[21] Hien To, Liyue Fan, Luan Tran, Cyrus Shahabi. Real-Time Task Assignment in Hyper-Local Spatial Crowdsourcing under Budget Constraints. In proceeding of IEEE International Conference on Pervasive Computing and Communications (PerCom 2016), Sydney, Australia, March 14-18, 2016

[22] Hien To, Gabriel Ghinita, Liyue Fan, and Cyrus Shahabi, Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing, IEEE Transactions on Mobile Computing, June 29, 2016

[23] Hien To, Ruben Geraldes, Cyrus Shahabi, Seon Ho Kim, and Helmut Prendinger, An Empirical Study of Workers’ Behavior in Spatial Crowdsourcing, Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, DOI:, San Francisco, CA, USA, June 26 – July 1, 2016

[24] Hien To, Task Assignment in Spatial Crowdsourcing: Challenges and Approaches, The 3rd ACM SIGSPATIAL 2016 PhD Symposium, San Francisco, CA, USA, October 31 – November 3, 2016