Donmai

Ressurecting the Score Weighting Conversation

Posted under General

Just a thought that occurred to me a while back: When ordering a tag by "score", it seems the oldest posts may be underrepresented because the membership increases with age, so the number of votes does as well. The later a post is added, the more votes it gets depending on its impact. Even for extremely popular posts from way back, they only represent a part of the population at that time. In order to receive new votes, they must appear in a search.

I'm not suggest modifying the current score value, but what if we added a new property that was a multiple of the existing score and the decimal percentage of the population at the time the post was added compared to the current population? ...assuming that such an historic value exists or can be calculated?

Age-Weighted Score = Score * (current population value / (population value at date/time post was added))

Weighted scores were approached in forum #1413, but that was related to the rating rather than age.

Challenge my sanity. Go ahead.

How would you measure “population”? I think the only measure available would be the amount of accounts existing, but we don’t know how many users could be considered active at a given time. Just going by account count is pretty useless since with time, the amount of abandoned accounts is probably growing a lot faster than the amount of concurrently active users. If I had to guess the amount of concurrently active users by forum activity, I’d even say that it’s been staying roughly the same for quite a while.

Btw, I sometimes look for top-scoring images from given age brackets:
some_tag order:score id:1..1000000
some_tag order:score id:1000001..2000000
some_tag order:score id:2000001..3000000
some_tag order:score id:3000001..4000000

kittey said:

How would you measure “population”? I think the only measure available would be the amount of accounts existing, but we don’t know how many users could be considered active at a given time. Just going by account count is pretty useless since with time, the amount of abandoned accounts is probably growing a lot faster than the amount of concurrently active users. If I had to guess the amount of concurrently active users by forum activity, I’d even say that it’s been staying roughly the same for quite a while.

Btw, I sometimes look for top-scoring images from given age brackets:
some_tag order:score id:1..1000000
some_tag order:score id:1000001..2000000
some_tag order:score id:2000001..3000000
some_tag order:score id:3000001..4000000

Right, I was talking about the number of users on a given date. If that's not possible, then I'll try the ID grouping method next time it matters. Thanks!

Why would the population at the date/time the post was added be relevant? I've altered the score (both up and down) of many posts that were made many years before I joined the site, and I'm pretty sure post #1 has a much higher score than the total site population at the time it was posted.

Unbreakable said:

Just a note, your forum #1413 link should be topic #1413 instead.

Thanks--I'm a bit rusty.

skylightcrystal said:

Why would the population at the date/time the post was added be relevant? I've altered the score (both up and down) of many posts that were made many years before I joined the site, and I'm pretty sure post #1 has a much higher score than the total site population at the time it was posted.

My thought was that the membership is always growing. If that's true, then in 2010, the member population may have been (completely random number here) 10,000. In 2019 it could be 50,000. So viewing sample posts from 2010 and 2019, there are going to be 5 times as many members voting on a post in 2019 than in 2010. I'm just curious to see if there's a way to level that score value to a percentage of the population at the time of the post.

And on that note:

kittey said:

How would you measure “population”? I think the only measure available would be the amount of accounts existing, but we don’t know how many users could be considered active at a given time...

A thought: Member accounts have start dates ("Member since 2007-01-28") and possibly inactive dates for bans. That would give active periods for all members, so it would be possible to give a membership count for any given date.

Again, this is just a thought exercise for me so I don't wind up with Alzheimer's.

10half said:

Again, this is just a thought exercise for me so I don't wind up with Alzheimer's.

You can't equate membership numbers and score since at some point in the past ALL accounts were able to influence score, while now only those with gold level or above can. In addition to this line of thinking, more total members registered does not equate equally to activity. Not everyone who has an account will be an active member. A more complicated model that would track # of members that edited, favorited, translated or uploaded a work would be needed to accurately track activity - then we would have to put that side by side with score metrics. Even then this isn't an ideal metric for what I think you want to do.

10half said:

My thought was that the membership is always growing. If that's true, then in 2010, the member population may have been (completely random number here) 10,000. In 2019 it could be 50,000. So viewing sample posts from 2010 and 2019, there are going to be 5 times as many members voting on a post in 2019 than in 2010. I'm just curious to see if there's a way to level that score value to a percentage of the population at the time of the post.

Only if nobody in the 9 years since then has seen and voted on the post.

I think the only way this could have potentially been done is if the unique view count for all vote-eligible members were being accurately tracked for the last 12 years, which it wasn't. However even then, that doesn't count members that vote up posts from the index view without even looking at the post. Additionally, there have been several voting bugs over the years that have allowed members to endlessly vote up/down a post if they wanted to.

So all-in-all, it's a little to late to consider this concept. Things would have needed to have been done the right way back when this site got started, but they weren't, so you'll need to use the imperfect method of just ordering the score count by different periods of time as already mentioned above.

If score ordering is to be weighted at all, we should go back to the ideas mentioned in topic #1413 and start by trying to correct for rating and/or tags that appear on a disproportionate number of highly upvoted posts. A smart ordering system would make some attempt to emphasize the posts that have the highest overall quality rather than simply what is the most popular.

By my count, less than 10% of the top 1000 posts listed by order:score are rated Safe, and most of those are of the bikini/lingerie/pantyshot type of Safe. If you can't fix this, you needn't bother adjusting score weights based on assumptions about site viewership through time — the ordering will still be meaningless, because the pinups and porn still rise to the top. Considering how much terribly-drawn jerkoff material (post #449447 and post #529971, for instance) was uploaded in the early years of this site, adding a multiplier that improves its position in the ranking may only make the problem worse.

BrokenEagle98 said:

...this could have potentially been done is if the unique view count for all vote-eligible members were being accurately tracked for the last 12 years, which it wasn't...
...that doesn't count members that vote up posts from the index view without even looking at the post...
...there have been several voting bugs over the years that have allowed members to endlessly vote up/down a post...

iridescent_slime said:

...correct for rating and/or tags that appear on a disproportionate number of highly upvoted posts...
...make some attempt to emphasize the posts that have the highest overall quality rather than simply what is the most popular...
...adding a multiplier that improves [porn's] position in the ranking may only make the problem worse...
...If you can't fix this, you needn't bother adjusting score weights based on assumptions about site viewership through time — the ordering will still be meaningless...

Erk. Oh well--I'm wiser for knowning.

iridescent_slime said:

...Considering how much terribly-drawn jerkoff material (post #449447 and post #529971, for instance) was uploaded in the early years of this site...

Hey--reference link pop-ups! I really am rusty.

Thanks all for indulging me.

1