Here's a more visual look at the score distributions by rating and by copyright, using percent-stack bars:
https://www.dropbox.com/s/jtmw9gdmjopz7ak/score_percent_stack.png?dl=0
It's color-coded by score. I left the high/low extremes in shades of gray.
-
Actually I take back what I said about some tags being useless for analysis because they don't have enough variance in score. They're still informative, insofar as it tells us that posts with those tags tend to have lower scores than in general, so an approver/uploader shouldn't be penalized for low scores if such tags were the cause.
Inasmuch as it'd be fantastic if everything was distributed like love_live!_school_idol_project -- huge range with nice, almost even distribution in score -- the fact is that most tags, and all posts in general, are distributed such that it is really hard to use score as a basis to distinguish between a) posts that are just mediocre or slightly subpar versus b) outright bad. score:0 usually (with notable exceptions) places around the 15th to 20th percentile -- so I don't think most folks would be comfortable concluding across the board that posts with score:0 are terribad and shouldn't deserve to be on the site.
Posts with negative scores are generally the bottom 1% to 5%, but they're too rare to be truly useful. I don't think users care too much about going out of their way to vote down posts that they don't like, and besides, you can just blacklist stuff that you don't want to see anyway.
So the proposed percentile-vs-peers approach probably isn't going to help us do a much better job of detecting low quality, compared to what we have at the moment.
What the distribution data is truly useful for, I think, is refining how we set the bar for good- to high-quality posts. Right now, the janitor trial report uses a score:3+ threshold, presumably because 3 is the median post score. It's asking, "what fraction of the approved posts are kinda-sorta better than at least half of all other posts". But it's clear that depending on rating/copyright, the threshold for "half of posts" could be at anything from score:0 (jojo) to score:6 (lovelive).
This is where percentile-vs-peers would really shine, because a percentile of over 50~60% after taking into account inflated/deflated post scores would be a lot more meaningful than a single flat cutoff across the board.
I have to rethink my idea for graphical visualization, though, because it would have displayed using fixed width intervals, but I'm now convinced that that would not be a good representation.