Are reciprocal ratings the cure or the curse?

You take a Uber ride. You get off the car, open your Uber app, and leaves the driver a 4.0 / 5.0 rating. As you walk away from the car, the driver pulls out her app, and gives you a 5.0 / 5.0 rating. This is an example of reciprocal ratings, i.e., where both parties get to rate each other.

There are many examples of reciprocal ratings, especially in areas where the experience is co-created and / or shared by both parties (albeit could be in different ways). For example, on Airbnb, the guest(s) get to rate the host(s) and vice versa. In debate competitions, the adjudicators score the speakers and the speakers often get a chance to rate the adjudicators in return based on the justification of their decisions and the quality of their feedback.

The question I’d like to discuss in this post is: do reciprocal ratings bring net benefits or net harm?

The Case For Reciprocal Ratings: Fairness & Incentives to Perform

Starting from principles, it seems fair to let both sides rate each other, especially if both sides share responsibility in an experience and / or are impacted by the other side’s actions. For example, the holistic Uber ride experience is affected by both the driver’s performance (e.g., cleanliness of vehicle) and the user’s behavior (e.g., arriving on time).

I really like the concept of The Wittgenstein’s Ruler, which Nassim Nicholas Taleb (the author of “Black Swan”) talked about in a Tweet:

It is worth repeating: “When you use a ruler to measure the table, you are also using the table to measure the ruler.” Sometimes, the best measurement of how good a ruler is is not an external judge, but the tables that are measured by itself.

brown ruler with stand

If we look at practical consequences, reciprocal ratings may help both parties become more accountable for their behavior and / or decisions. In the example of a debate competition, for example, letting debaters rate adjudicators in return incentives the adjudicators to: (a) be more responsible in reaching a decision, and (b) be more detailed & elaborate in explaining their decision. Just as an adjudicator’s feedback could help debaters improve, so does a debater’s feedback let an adjudicator learn how to better judge a debate. Taking a step back, this type of benefit is not unique to reciprocal ratings, but to ratings in general: when people know that their performance is being measured (and that measurement is linked to some carrots or sticks), they are more likely to put in more effort. It is all about incentives. Economics 101.

You could say that reciprocal ratings make the interests of both parties more intertwined with each other – because debaters have a chance to rate adjudicators, it is now in both the debater and the adjudicators’ best interests to let debaters receive well thought-through feedback after a debate round. Reciprocal ratings put everyone “in the same boat” in a way.

two people shaking hands

The Case Against Reciprocal Ratings: Inflated Ratings?

But sometimes you could go from two parties being close to each other to two parties being too close to each other. Making the interests of both parties interrelated provides incentives to cooperate, as well as incentives to cheat. For example, if debater feedback for adjudicators were submitted using their real names in a debate competition, then one could argue some debaters may inflate their score for an adjudicator for fear of retaliation by that adjudicator (assuming the debaters have a significant chance of running into the same adjudicator in a future round).

There are real-world examples where people are asked to leave comments under their real name – Airbnb guest reviews, for example, are published under the guests’ real names & profile pictures. This helps to increase the perceived legitimacy and authenticity of the reviews in the eyes of interested people checking out the property’s profile page.

Assuming that (a) hosts get to rate guests in return and (b) hosts get to see the guests’ ratings & comments, then one possible scenario may happen: a guest inflates the rating for his / her host out of fear that if he / she gives the host the low rating, the host would retaliate with a low (or even lower) rating in return. The problems is symmetrical, as in one could argue that the host also has an incentive to inflate his / her ratings of the guest for the exact same reasons.

Assuming the ratings are indeed inflated, would that break the whole rating system?

Before we dive into this question, let us first look at the bigger picture: why do ratings matter in the first place? How are ratings used by a platform like Airbnb? It is worth pointing out that what matters more is the relative ranking rather than the absolute score. It is the differential rather than the absolute value that holds the key. For example, Airbnb uses relative ranking of host property to decide the ranking of search results for properties that match a user’s search criteria; similarly, Uber uses driver ratings to prioritize ride assignment.

With that established, let’s come back to the inflated ratings problem. For simplicity, let us study one side of the problem, i.e., let us assume Airbnb guests inflate their ratings of their hosts. What happens then? (Note the other side of the problem, i.e., hosts inflating their ratings of the guests, should follow a similar thought process as below.)

Let’s break down the problem into two possible scenarios:

[Scenario A] Rating inflation is a generic problem, i.e., the majority of guests inflate their ratings of hosts, or what the defenders of fairness would call “the whole system is rigged”.

There are two sub-scenarios:

(A1) If the majority of guests inflate their ratings by a similar absolute amount, e.g., +1 star higher.
=> Verdict: In this sub-scenario, rating inflation does not impact the effectiveness of the search ranking algorithm. This is because if the score of every host gets bumped up by +1 star, then their ranking does not change, i.e., a potential guest searching for a property would still see a list of hosts ranked in the same order;

(A2) If the majority of guests inflate their ratings to a certain level, e.g., if everyone gives their hosts 4 stars (regardless of whether they think they only deserve 2 stars or 3 stars), then things get a bit tricky. You could say in this case, the really stellar hosts will still get their 5-star ratings and rise to the top of the competition – they would still be prioritized by Airbnb’s search ranking algorithm. However, in this case, one could no longer differentiate between the mediocre hosts (e.g., those who deserve 3 stars) from the really bad ones (e.g., those who deserve 2 stars), as their scores are all inflated to 4 stars across the board.
=> Verdict: In this sub-scenario, rating inflation does make the ranking algorithm less effective – it is still able to break down hosts into groups based on their ratings (stellar hosts vs. other hosts), and prioritize the groups in search results. However, the grouping becomes less granular. One could argue the practical results may not be too bad – as the super-stellar hosts that get 5 stars would still come up at the top of search results for hosts. If we assume that the top search results are also the most-clicked-on results by potential guests, then it is likely that the final choice of the guests are not distorted that much. This kind of reasoning reminds me of the Pareto principle (80/20 rule) – applied in this context, 20% of your search results (the top ones) may generate 80% of your revenue. If this holds, then as long as the top search results are not distorted, then the search ranking algorithm has served its purpose.

Image result for 80/20 rule

[Scenario B] Rating inflation is an isolated problem, i.e., only a very small % of the guests inflate their ratings of the hosts. The majority of the guests rate their hosts honestly.

The answer here is quite straightforward: this would have very limited impact on the search ranking results. Perhaps a small number of hosts would get their ratings bumped up a bit, but the majority of the hosts are ranked fairly. By definition of an “isolated problem” above, this is not a problem that causes massive headaches for the average user – and hence not worth losing your sleep over.

The Verdict on Reciprocal Ratings

Having reciprocal ratings is probably a good idea – based on the very limited analysis thus far. Caveats: 1) I have (very lazily) only considered inflated ratings as a down side to reciprocal ratings, though there could be many more, and 2) the designs of the ratings could affect the incentives of players – for example, is one side asked to rate another side first? Are the ratings published in real time? Are the ratings published anonymously? Etc.,

All in all, I find reciprocal ratings design – and ratings in general – to be a fascinating real-world game-theory topic. The next time I take a Uber ride and rate a driver, I’ll certainly “think twice” before giving that 5 stars.

person holding black iphone 5