I have been completely enamored with +Jon Kleinberg keynote address from HCOMP2013. It is the first model of human computation in field-theoretic terms I’ve encountered, and it is absolutely brilliant.
Kleinberg is concerned with badges, like those used on Foursquare, Coursera, StackOverflow and the like. The badges provide some incentive to complete tasks that the system wants users to make; it gamifies the computational goals so people are motivated to complete the task. Kline’s paper provides a model for understanding how these incentives influence behavior.
In this model, agents can act in any number of ways. If we consider StackOverflow, users might ask a question, answer a question, vote on questions and answers, and so on. They can also do something else entirely, like wash their cars. Each of these actions is represented as a vector in high dimensional space: one dimension for each action they might perform. In Figure 2, they consider a two dimensional sample of that action space, with distinct actions on the x and y axis. The dashed lines represent badge thresholds; completing 15 actions of type A1 earns you a badge, as does 10 actions of type A2. On this graph, Kleinberg draws arrows the length and orientation of which represent the optimal decision policies for users as they move through this action space.
Users begin with some preferences for taking some actions over others, and the model assumes that the badges have some value for the users. The goal of the model is to show how the badges augment user action preferences as they approach the badge. Figure 2 shows a user near the origin has no strong incentives towards actions of either type. But as one starts accumulating actions and nearing a badge, the optimal policy changes. When I have 12 actions of type A1, I have a stronger incentive for doing that action again than I did when I only had 5.
In this way the badge thresholds work like attractors for user behaviors; Kleinberg discussed how to decide where to place badges in order to motivate desirable behavior. It’s also interesting to see what happens to user behavior after they cross the threshold. In Figure 2, you see that once you’ve received a badge in dimension, you lose all incentive to continue performing actions in that dimension. And indeed that’s exactly what you find from the data taken from StackOverflow. Figure 3 models the activity of users in the days before and after earning a badge. In the run up to a badge, user activity in that dimension of action sees a sharp spike in activity. After the badge is received, there’s a precipitous fall as that activity returns to levels when it is not motivated by a badge.
I’ve been calling fields like the one represented in Figure 2 “goal fields”, because they represent orientation towards certain goals. The goal field describes the natural “flow” or trajectory of users as they move through this action space. In his talk, Kleinberg compared the model to electrical fields, with workers like magnetic filings orienting along the field. He’s interested in making the metaphor accurate enough to work for describing the goal orientation of agents in real activity fields like StackOverflow.
Interestingly, his model suggests bounds on where badges might be places, and limitations on what behavior might be extracted from users in this way. Figure 9 from the paper describes this beautiful and confusing landscape carves out by all possible badge placements for two dimensions of action. Notice that some of the action space is inaccessible for any badge placement policy; for instance, no badge thresholds will yield an action policy of (10%, 60%).
I’ve been fascinated by this paper and its implications for the last few days, and I’m flooded with ideas for extending and refining the model. I’ll try to list a few thoughts I had:
I’d be interested to see if the model can be extended to accommodate action sets that aren’t entirely orthogonal. I wonder if actions might be categorized into certain types or clusters, with badges aimed at augmenting behavior for all actions within that category. For instance, “not using StackOverflow” isn’t just another dimension of action, it’s an entirely different class of actions. It would then be interesting to see how badges aimed at one action class impact incentives for different classes.
I’m also interested in the possibility of evolving badges that represented moving targets instead of static thresholds, as a way of avoiding the precipitous cliff past a threshold. If users were able to maintain a constant (and short) distance from a potential reward, then the user will always act like a highly motivated worker on the verge of an incentive. Call this the “carrot and stick” badge, in contrast to the threshold badge. If users never get a reward they’ll lose incentive; but if reaching a threshold always unlocks new, close-by badges, then I’m always in a motivated position to keep acting. Instead of a magnetic force field, I’m thinking of something like an event horizon: I always keep getting close without ever falling in.
One way to engineer a carrot-stick badges is to have its placement emerge as a function of the average activity of users. For instance, if instead of giving me a badge for answering 600 questions, I had a badge for being in the top 1% of question-answerers, this becomes a moving target. If I ever stop answering questions, I risk losing the badge to workers staying busier than me.
Badges are often a way of identifying experts, and you might think that carrot-stick badges are especially suited to identifying experts in real-world situations. In science, experts aren’t merely the authorities who have the “right answers” for deciding answers to hard questions. Being an expert in physics 30 years ago would not suffice for being an expert today, and carrot-stick badges might model this situation better. Experts also deform the problem space itself, and can be responsible for changing the standards for future experts and the space of what questions can be answered. In other words, expertise is a higher-order type of work: you have your plumbers and scientists, and then you have experts of those types. Experts from one domain might not have any relation to experts of another domain; what they share in common are their relations to the average performance within their respective domains. These relations might not be obvious or translatable across domains. For instance, lots of people can drive cars, so the actions of an expert driver (a stunt driver, or NASCAR) might be very close to the actions of a normal, competent driver. In contrast, not many people are good at physics, so the difference between the average person’s skills at physics and the expert’s skills might be huge. In the driving case, the large competence in the community means that finding experts requires very sensitive methods for distinguishing the two; in the physics cases, the differences are so large that the criteria for expertise will be much looser.
I have much more to say, but I should get back to the conference.
The keynote address was a version of the paper “Steering user behavior with badges”, and can be found here: http://www.cs.cornell.edu/home/kleinber/www13-badges.pdf
More on HCOMP2013: http://www.humancomputation.com/2013/index.html