Donmai

Discovering similar users

Posted under Bugs & Features

Provence said:

Pretty interessting.
My range start at 4,95% (or something like that and goes up to 12% ^-^).

Yeah, it's very interesting and so I decided to look at the stats for my profile.
Oh! Mine starts at 1.73% up to 3.18%.
Only one of them I recognize the nick, from comments and other stuff.

Pywackett-Barchetta said:

I've been waiting several hours for mine to generate, with no luck... is that to be expected?

Even after it's generated, the page says "check back later" until you manually refresh it.

Pywackett-Barchetta said:
I've been waiting several hours for mine to generate, with no luck... is that to be expected?

Yeah, me too [even with hard refresh]. Probably completely crushed under everyone trying it.

MagicalAsparagus said:

I removed about a hundred posts from favourites yesterday, and suddenly I get 11,89%. Magical.

Hmm, removing some post automatically leads to a much smaller variety of your favourites and therefore to a larger sync.

Highest match 5%... btw I wonder how this is calculated too. For example: if I had only one faved picture, and the other person 100 with the one of mine included so the match would be 100% or 1%? Or maybe something else?

MagicalAsparagus said:

How is this calculated, anyway? I decided to do a manual check. I'm supposed to have a 11,70% match with this guy:
http://danbooru.donmai.us/users/146366
If he has 900 favs, then the smallest number of those we have in common is 90. But we don't :/ I looked through all his favs, and maybe about 10 to 20 of them actually match, if not less. Maybe there's some kind of error?

It only looks at posts uploaded in the last 2 years.
Take your number of favorites: 71
The other guy's number of favorites: 4
And your shared number of favorites: 2
Then calculate: √(71 * 4) = 16.85
Then: 2 / 16.85 = 0.1186 (11.86%)

The problem is that he has extremely few favorites within the last 2 years (you don't have that many either).
Currently it only looks at people that have 200 total favorites or more, but if most of those 200 are older than 2 years the sample size can still be too small. It should ignore posts more than 2 years old when checking if someone has 200 favorites.

richie said:

Highest match 5%... btw I wonder how this is calculated too. For example: if I had only one faved picture, and the other person 100 with the one of mine included so the match would be 100% or 1%? Or maybe something else?

Your favorites: 1
Other guy's favorites: 100
Shared favorites: 1
√(1 * 100) = 10
1 / 10 = 0.1 (10%)

BrokenEagle98 said:

Just curious, but is there a mathematical, or programming reason for the 200 post limit?

I think you need a bigger sample size or the results may become too imprecise with large margins of error.

Huh. Yesterday, it didn't do anything. Today, everyone it finds is around 35 million % similar.

So far I've got:

35118900.00%
1983100.00%
40628900.00%
27847800.00%
45156600.00%
34837700.00%
33843200.00%
42963000.00%
36114200.00%
1208200.00%
45808500.00%
39338100.00%
0.00%

The 0% one isn't even actually 0%.

Updated

1 2 3