Book recap – Don’t make me think

“Don’t make me think” is a great book for acquiring basic understanding of UI design. I “recap” here my favorite quotes from the books.

  1. we don’t read web, we scan them
  2. we don’t try to figure out how things work, we muddle through
  3. things that are related logically also related visually
  4. web conventions/formats are your friends
  5. make it obvious what is clickable
  6. wide v.s deep site hierarchies tradeoff. but generally, users don’t mind a lot of clicks as long as each click is painless and they have continued confidence that they’re on right track
  7. eliminate instructions entirely by making everything self-explanatory, or as close to it as possible
  8. similarity bwn web navigation and how people find directions in Stores
  9. web navigation conventions, search, home, bookmarks are useful
    1. persistent navigation includes: site id, home, search, utilities (i.e., How to Buy), sections (i.e., News, Products, Downloads)
    2. spell out the scope of search to avoid confusion, e.g., Search for a Book. You can provide search options
  10. every page need a name, needed to be in right place, needs to be prominent and the name is matched what you clicked
  11. showing “you are here” by marker, breadcrumbs (show the path from homepage to where you are) , breadcrumbs should be at top
  12. tabs are good for large sites b/c they are self-evident, hard to miss, clickable…
  13. Trunk test
    • what site is this? site id
    • what page I am on? page name
    • what are the major sections of this site? sections
    • what are my options at this level? local navigation
    • where am I in the scheme of things? you are here
    • how I can search?
  14. Designer vs. developer arguments
    • Moving the discussion away from the realm of what’s right or wrong into the realm of what works or doesn’t work
  15. What do you test?
    1. What do they like/love?
    2. How does it fit into their lives?
    3. What works well?
    4. How hard is it to do key tasks?
    5. Do they get the point of the site?
    6. Does it seem like what they need?
    7. Do they get the navigation?
    8. Can they guess where to find things?
    9. Pay more attention to actions and explanations than opinions as opinions during user tests are notoriously unreliable
  16. Typical tests problems
    1. Users are unclear of the concept
    2. The words they are looking for aren’t there
    3. There is too much going on

Research Challenges in Spatial crowdsourcing

This post tries to answer 2 questions.

  1. Briefly define spatial crowdsourcing (i.e., mobile crowdsourcing)
  2. Research challenges & proposed solutions

What is Spatial Crowdsourcing?

I am sure many of you know crowdsourcing, applications like Amazon Mechanical Turk, where you have a bundle of tasks that could be difficult for the computer to solve. So you give them to people and the crowd can do the tasks for you, let say one dollar per task. Spatial crowdsourcing is similar to crowdsourcing, except people must travel to the task location in order to perform it. For example, going to a specific place to take a picture or capture a video. The question is why suddenly spatial crowdsourcing become popular The reason is that 1) smartphones are now so popular 2) there are many sensors within them and 3) the network quality is getting higher, like 4G LTE. With those advancements, you now can now take a video and upload to server anywhere. Those three also enable us to develop SC applications listed below.


Examples of Spatial Crowdsourcing Applications

For example, we can report traffic condition on the way with Waze while OpenSignal helps us to collects data on wireless coverage and tell us the best carrier network. We actually build a system for collecting video data in our lab. Its name is MediaQ & iRain.

More information about spatial crowdsourcing can be found here.

Research challenges in spatial crowdsourcing can be found [24].

Research challenges & proposed solutions

As SC is a special case of crowdsourcing where workers need to physically present at task locations in order to perform the task, research challenges in crowdsourcing are also applicable in SC. Those challenges includes but not limited to 1) worker quality [1,2,3,4], 2) trust issue [5,6,7] (workers may not be trustful), 3) how much to pay [8,9], crowdsourcing database [12,13] and user studies [10,11]. However, I here focus on challenges that are unique to SC, particularly dynamism, privacy and trust.

The first one is the dynamic nature of the workers and tasks because they can come and go anytime. For one time instance is optimal, if we know everything. However, we don’t know the future.

With server-assigned SC [14], the objective is to maximize the number of assigned tasks (i.e., utility) given task set and worker set, referred to as Maximum Task Assignment (MTA). Note that not all the tasks are assigned because of the spatial constraint between workers and tasks. Constraint 1: A worker is likely to perform tasks in his/her proximity, defined as a working region. A working region may cover multiple tasks and a task may be covered by multiple working regions. Constraint 2: each worker may also have a capacity on the maximum number of tasks his/her can perform in a period of time (i.e., one-time instance), referred to as maxT. With the two constraints, in [14] MTA is reducible to max flow problem, which can be solved optimally in polynomial time, referred to as BASIC.

In [14], MTA is also extended to multiple time instances, where the server continuously receive workers and tasks and the goal is to maximize the number of assigned tasks in multiple time instances. The challenge is that the server has no idea about the future tasks and workers; therefore, the global optimum is not possible. However, the BASIC approach can be significantly improved given the (global or partial) knowledge of worker density. The intuition is that a task is likely to be performed in the future if it is in worker dense area and vice-versa. Therefore, we assign a higher priority to the tasks in worker sparse area, referred to as density-based heuristic. Another distance-based heuristic tries to minimize the average travel distance of the workers by assigning a higher priority to tasks whose covering workers are close. Both density-based heuristic and distance-based heuristic improves the BASIC algorithm in terms of #assigned tasks and the average travel distance, respectively.

[15] is an extension of [14], in which multiple kinds of spatial tasks require different skill-sets the workers to be performed best, e.g., the task of taking a high-quality picture is assigned to a photographer or a chef can be a good candidate to perform the task of rating a particular dish in a restaurant. With this assumption, to maximize overall task assignment, we define maximum score assignment (MSA) problem and show that MTA is a special case of MSA. MSA (one instance) is reducible to the maximum weighted bipartite matching problem (BASIC) while the heuristics density-based heuristic and distance-based heuristic can also be applied to improve the BASIC solution.

[16] is another extension of [14], in which we define the notion of complex tasks consisting of some spatial sub-tasks. A complex task require assignments of all of its sub-tasks! For example, professional works are often complicated and could be divided into smaller and atomic sub-tasks, e.g., one wants to obtain pictures of 5 specific buildings and he requires pictures of all of those buildings; none of them is allowed to be missed, otherwise the capturing of all other buildings becomes useless. With this problem setup, the problem of maximizing complex task assignment (MCTA) can be transformed to max flow problem by adding dummy nodes. Similarly, max flow algorithm guarantees optimal solution with on time instance.

The second challenge is location privacy of the workers. Current solution requires the workers to send their locations to a centralized server. And someone with access to this server can infer sensitive information, such as their health status or religious views. For example, if someone is visiting cancer center someone can infer he probably has cancer.

Location privacy is one of the major impediments that may hinder workers from participation in spatial crowdsourcing systems. In [17], we focus on protecting the location privacy of the workers, our goal is to develop an efficient framework for worker-selected spatial crowdsourcing that enables the participation of the crowdsourcing users (i.e., requesters and workers) without compromising their privacy (i.e., the server may be not trustful). The framework enables the server to efficiently assign tasks to workers without knowing workers’ locations. The experiment results show that the cost of privacy (communication cost and travel cost) is practical, particularly the increase in travel cost is minimal, which is important and somewhat surprising. A demonstration of this study is showcased in [18], which provides a toolbox for multiple parties, cell service provider, SC administrator, end users to evaluate the impact of different factors in the private SC framework. Recently, in [21], we published a journal version that subsumes [17], in which we deal with dynamic datasets (i.e., workers moves) and the trustworthiness of the assigned tasks (i.e., redundant task assignment).

There are two different perspectives of the privacy problem in [17], from the data owner (cell service provider – CSP) and adversaries (crowdsourcing service provider – SC-server). From the CSP perspective, the framework enables CSP to release their customer data to SC companies with strong privacy protection. From the SC-server’s point of view, the framework enables SC companies to use sanitized data to provide crowdsourcing services for end users (i.e., workers/requesters).

The last challenge is the issue of trust. That is how to assign tasks to workers that guarantee quality of the tasks

In [19], a trustworthy SC framework is proposed. The main idea is to assign a task to multiple workers such that the aggregated reputation of the assigned workers satisfy the confidence level of the task.

Future directions

The study in [17] can be extended to protecting both privacy of both workers and requesters at the same time. In fact, in a study of TaskRabbit – a real-world SC app, we show that adversary can infer both requesters’ homes and worker’s trajectory. This is SC-related problem because requesters’ homes and workers’ trajectories are closely related. There are existing studies on protecting either locations or trajectory, but there is no work on combining them. Having a framework to release both data would be interesting.

Note that our privacy framework is minimal and generic. That is, for each task, the goal is to find a geocast region that contains sufficient users. This framework would be applicable for other applications. For example, Starbucks may want to send ads (e.g., deal of the day, discounts) to all nearby customers via push notification without knowing their locations.

There are other directions that we have not explored yet, including 1) how do we know worker actually went to the task location (spatial assurance) 2) how much to pay the workers? 3) methods to guarantee that the quality of the task responses.

[1] 2010 – Corroborating information from disagreeing views

[2] 2010 – Learning From Crowds

[3] 2010 – Quality Management on Amazon Mechanical Turk

[4] 2011 – How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy

[5] 2011  – Truth in Crowdsourcing

[6] 2011 – Iterative Learning for Reliable Crowdsourcing Systems

[7] 2012 – Whom to ask? jury selection for decision-making tasks on micro-blog services

[8] 2009 – Financial Incentives and the “Performance of Crowds”

[9] 2010 – The Labor Economics of Paid Crowdsourcing

[10] 2008 – Crowdsourcing User Studies With Mechanical Turk

[11] 2010 – Running experiments on MTurk

[12] 2011 – CrowdDB: Answering Queries with Crowdsourcing

[13] 2011 – Human-powered sorts and joins.

[14] Leyla Kazemi and Cyrus Shahabi, GeoCrowd: Enabling Query Answering with Spatial Crowdsourcing, ACM SIGSPATIAL GIS, Redondo Beach, CA, November 2012

[15] Hien To, Leyla Kazemi, and Cyrus Shahabi, A Server-Assigned Spatial Crowdsourcing Framework, Journal ACM Transactions on Spatial Algorithms and Systems , Volume 1 Issue 1, Article No. 2, New York, NY, USA, August 2015

[16] Hung Dang, Tuan Nguyen, and Hien To, Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing, in Proceedings of International Conference on Information Integration and Web-based Applications & Services, Vienna, Austria , 2-4 December 2013

[17] Hien To, Gabriel Ghinita, and Cyrus Shahabi, A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing, In Proceedings of the 40th International Conference on Very Large Data Bases, Pages 919-930, Hangzhou, China, September 2014

[18] Hien To, Gabriel Ghinita, and Cyrus Shahabi , PrivGeoCrowd: A Toolbox for Studying Private Spatial Crowdsourcing, 2015 IEEE 31st International Conference on Data Engineering (ICDE), (demonstration), Page 1404 – 1407, Korea, 13-17 April 2015

[19] Leyla Kazemi, Cyrus Shahabi, and Lei Chen, GeoTruCrowd: Trustworthy Query Answering with Spatial Crowdsourcing, International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), Orlando, Florida , November 5-8, 2013

[20] Hien To, Mohammad Asghari, Dingxiong Deng, Cyrus Shahabi. SCAWG: A Toolbox for Generating Synthetic Workload for Spatial Crowdsourcing. In proceeding of International Workshop on Benchmarks for Ubiquitous Crowdsourcing: Metrics, Methodologies, and Datasets (CROWDBENCH 2016), Sydney, Australia, March 14-18, 2016

[21] Hien To, Liyue Fan, Luan Tran, Cyrus Shahabi. Real-Time Task Assignment in Hyper-Local Spatial Crowdsourcing under Budget Constraints. In proceeding of IEEE International Conference on Pervasive Computing and Communications (PerCom 2016), Sydney, Australia, March 14-18, 2016

[22] Hien To, Gabriel Ghinita, Liyue Fan, and Cyrus Shahabi, Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing, IEEE Transactions on Mobile Computing, June 29, 2016

[23] Hien To, Ruben Geraldes, Cyrus Shahabi, Seon Ho Kim, and Helmut Prendinger, An Empirical Study of Workers’ Behavior in Spatial Crowdsourcing, Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, DOI:, San Francisco, CA, USA, June 26 – July 1, 2016

[24] Hien To, Task Assignment in Spatial Crowdsourcing: Challenges and Approaches, The 3rd ACM SIGSPATIAL 2016 PhD Symposium, San Francisco, CA, USA, October 31 – November 3, 2016

Kinh nghiệm tìm software intern

Dù kinh nghiệm ít ỏi, nhưng lấy động lực bởi bài viết “Kinh nghiem du hoc theo VEF” gần 3 năm trước, đã benefit cho một số bạn, mình viết post này cho những bạn năm đầu có một cái nhìn toàn xuyên suốt.

Bản thân: 3rd graduate student from USC (theo VEF). 2 years rule nên đi intern cho hết 18 tháng. Intern 1st year @ Teradata Corp. Năm nay đi tiếp, hi vọng chỗ khác!

1) Apply vào đâu?

Rất nhiều phía dưới. Bạn có thể bổ xung thêm.

Google Oracle Akamai Technologies Groupon
Amazon eBay Square VMware
Facebook Twitter TellApart Qualcomm
Microsoft TERADATA Grey lock splunk
Yahoo ESRI Bloomberg PINTEREST
IBM Research Saleforce Samsung Evernote
Linked in CBS Interactive Nimble Storage Orbitz
Cisco Systems, Inc. Adobe eBay Ericsson
Qualcomm Lifecrowd Yelp

2) Quy trình apply

Đa số intern thì pass phone interview là ok. Có một số có host matching hoặc onsite interview (Teradata) sau phone interview (Google).

  • Google “company_name careers”, search for “intern”
  • Chọn một số job phù hợp
  • Submit application with 1 page resume, e.g mine
  • Đợi phỏng vấn, thường phone/skype interview
  • Đợi 1-2 tuần kết quả

Resume: chỉ nên 1 trang và để những thứ quan trọng nhất thôi vì recruiter chỉ lướt resume của bạn trong 20 second.

Cover letter?

Một số job có mục cover letter (thường optional), năm đầu apply mình có học viết CL nhưng sau này thấy viết mất nhiều thời gian và cũng không quan trọng cho intern application nên thôi. Mình nghĩ fulltime thì cover letter cần hơn.

Apply intern @ Google qua phone interview là bạn đã qualified cho vị trí đó (your resume is putted in the candidate pool). Sau đó là bước host matching, project/group sẽ tìm bạn. Bước này khá máy rủi ở chỗ 1) có project nào matches profile của bạn không? 2) họ có search ra bạn không? Với prob apply càng sớm càng tốt. Từ tháng 10 là nhiều công ty đã tìm intern cho summer rồi! Với prob 2, recruiter (contact của bạn) đóng vai trò quan trọng ở đây, đó là người marketing profile đến các group mà cần intern. “Tuy nhiên có recruiter active, có người không (tùy năng lực của họ)!” Source từ labmate, intern 3 lần ở Google.

Với GG, mình đang đợi bà recruiter của mình tìm cho project mà mãi chẳng thấy! Có gửi email cho friends làm ở GG để xem họ có biết nhóm nào cần người không.

3) Referrer, Career fair?


Được bạn (employee hoặc intern) refer là rất quan trọng để bạn có cơ hội được phỏng vấn (cửa đầu tiên). Vì thế nên networking, email hỏi bạn refer. Mình năm đầu đi intern ở Teradata cũng được labmate refer, rồi được phone interview, rồi onsite interview. Bản thân cũng đã refer cho một bạn vào fulltime ở Teradata. Tất nhiên profile phải match với nên refer.

Theo mình biết ở Microsoft hay IBM nếu không có referrer thì khả năng được phỏng vấn là ít. Năm nay mình có bạn làm ở MS, put resume mình vào “internal database”, có nhận được email của MS là có referrer. Hi vọng khá hơn năm trước (apply mà không trả kết quả).

Research Intern

Apply research intern (Microsoft, IBM,…) thì càng cần referrer. Vào MS research thì kể cả có referrer cũng không lợi thế gì nhiều, bạn cần contact với người ở MS research mà bạn muốn làm cùng. Họ đều có cái gọi là internal database nên nếu bạn chỉ apply online thì khẳ năng được phỏng vấn là thấp. Mình đi conference quen một bà làm ở IBM Almaden, bà ấy bảo “I will put your resume in our database. We will start looking at resumes early next year”. Hi vọng được phỏng vấn.

Career fair?

Cũng giống lý do có nên viết cover letter, mình thấy trừ khi bạn có schedule phỏng vấn ở career fair từ trước còn không đến đó chỉ mất cả ngày của bạn. Bởi vì liệu bạn có thể impress recruiter trong 2 phút? Nếu không đến career fair bạn được recruiter cho brochure và bảo bạn apply online thôi.

4) Coding interview Tổng quan Phần này quan trọng nhất, typical là 45 phút (10-15 intro, 30 coding, 0-5 Q/A).

  1. 10-15 mins for intro: resume-based questions
  2. 30 mins for coding: 1-3 questions, write code on Google Docs or any online IDE (e.g.,
  3. 0-5 mins for Q/A: hỏi để thấy mình quan tâm đến cty

Nhiều khi họ chỉ có 1-2 mins for intro và code luôn. Về coding, bạn tham khảo 2 quyển sách: 1) Cracking the coding interview 2) Programming Interviews Exposed: Secrets to Landing Your Next Job, Second Edition. Tham khảo thêm, có rất nhiều updated interview questions . Nói chung nếu chưa có kinh nghiệm phỏng vấn thì cần 24 tiếng để ôn. Một số kinh nghiệm rút ra của mình:

  1. Khi ôn tập trung vào mấy data structures như list, set, hashtable, tree
  2. Chịu khó nhớ lệnh bằng cách viết code trên giấy hoặc trên notepad 🙂
  3. Vừa code vừa giải thích cho interviewer, e.g., sao mình code ntn
  4. Viết code sáng sủa, thi thoảng comment khi cần, bắt lỗi (null, empty, zero…) khi cần

Coding interview

Mình lấy 2 phone interviews của mình với Google làm ví dụ. lần đầu fail, lần sau pass.

To be updated/continued…

Gia đình

Học càng nhiều mình càng ngẫm thấy gia đình thật quan trọng. Gia đình là nền tảng cho sự phát triển của mỗi cá nhân lẫn của xã hội. Đợt về VN một trong những mục đích chính là thăm gia đình, họ hàng. Đã biết nói lời yêu thương với bố mẹ, ace để thúc đẩy tình cảm cũng như mối quan hệ được gắn bó. Đi thăm họ hàng nội ngoại để update tình hình, và giữ mối quan hệ. Có nhiều người mình có thể học hỏi, ông bà, bố mẹ. Nghe ông bà kể những tấm gương thành công, những tấm gương thất bại trong công việc/sự nghiệp. Những tấm gương gia đình vừa có hiếu với bm vừa làm kinh tế giỏi. Chưa bao giờ mình cảm thấy học được nhiều từ mọi người như 2 tuần về VN này.

Đợt này, về mặt vật chất cũng mua ít đồ dùng gia đình để cải thiện cs. Cũng tính đổi cái nhà trong ngõ ra mặt đường to, sau này về VN, ace cũng có nhà đi làm thuận tiện. Về mặt tình cảm cũng cảm thấy mãn nguyện.

My Favorite quotes


  1. “Research is finding the truth, which is very hard to find. But when the fact is found, it is neat and clever and it just works.” Hien To
  2. “The mind is everything. What you think you become” Buddha
  3. “My true religion is Kindness.” Dalai Lama
  4. “Everything has beauty, but not everyone can see.” Confucious
  5. “All you need is love.” John Lennon
  6. “It is never too late to be what you might have been.” George Eliot
  7. “Don’t go through life, grow through life.” Eric Butterworth
  8. “Live Each Day As If It Was Your Last.” Steve Jobs
  9. “Remember that happiness is a way of travel, not a destination.” Roy L. Goodman
  10. If you live in fear of the future because of what happened in your past, you’ll end up losing what you have in the present.
  11. “Darkness cannot drive out darkness: only light can do that. Hate cannot drive out hate: only love can do that.” Martin Luther King, Jr.
  12. “The best way to predict the future is to invent it.” Alan Kay
  13. “Your time is limited, so don’t waisted it living someone else’s life” Steve Jobs
  14. “All men dream: but not equally. Those who dream by night in the dusty recesses of their minds wake in the day to find that it was vanity: but the dreamers of the day are dangerous men, for they may act their dreams with open eyes, to make it possible. This I did.” T. E. Lawrence
  15. “If you want to make your dreams come true, the first thing you have to do is wake up.” J.M Power
  16. “Stay hungry, stay foolish” Steve Jobs
  17. “Twenty years from now you will be more disappointed by the things that you didn’t do than by the ones you did do. So throw off the bowlines. Sail away from the safe harbor. Catch the trade winds in your sails. Explore. Dream. Discover.” H. Jackson Bown, Jr
  18. “The only way to do great work is to love what you do” Steve Jobs
  19. “I am prepared for the worst, but hope for the best.” Benjamin Disraeli
  20. “A successful man is one who can lay a firm foundation with the bricks that others throw at him.” Sidney Greenberg
  21. “You can have everything in life you want, if you will just help other people get what they want.” Zig Ziglar
  22. “People don’t buy for logical reasons. They buy for emotional reasons.” Zig Ziglar
  23. “A manager is not a person who can do the work better than his men; he is a person who can get his men to do the work better than he can.” Frederick W. Smith
  24. “Management is doing things right; leadership is doing the right things.” Peter F. Drucker
  25. “Leadership is simply the ability of an individual to coalesce the efforts of other individuals toward achieving common goals. It boils down to looking after your people and ensuring that, from top to bottom, everyone feels part of the team.” Frederick Smith
  26. “Ask not, what your country can do for you. Ask what, you can do for your country.” John F. Kennedy
  27. “People who are crazy enough to think they can change the world, are the ones who do.” Apple Inc