Donmai

Duplicate parenting changes?

Posted under Tags

Hi, I've seen lately that many images have started to be tagged duplicate, where it's usually the 2nd image uploaded of a type, even if the image is of larger size, or better compression.
ex, the 2nd image has a pixiv source, while the first uploaded is from twitter, with worse but not really noticeable compression.

The general rule of thumb was to label the bigger image as parent and not use duplicate for either in this case, since they have different sources. Artist's tending to upload to twitter first, then to a site with proper or no image compression, like nico nico or pixiv.
Has this changed in the last few months? I see the twitter images are a lot of times labeled as parents, while the pixiv 'higher qualities' are left to be unaproved.
Can't find any conversation about it so i thought to ask.

keonas said:

[…], where it's usually the 2nd image uploaded of a type, even if the image is of larger size, or better compression.
ex, the 2nd image has a pixiv source, while the first uploaded is from twitter, with worse but not really noticeable compression.

The general rule of thumb was to label the bigger image as parent and not use duplicate for either in this case, […]

FYI, both images you linked are pixel-exact duplicates. The Pixiv version has a larger filesize, but it doesn’t look any better, it just has less efficient compression. The Twitter version looks just as good and has more efficient compression (and thus a smaller filesize), which is why it was made the parent. Bigger doesn’t always mean better and all these duplicates are seriously annoying many users.

kittey said:

FYI, both images you linked are pixel-exact duplicates. The Pixiv version has a larger filesize, but it doesn’t look any better, it just has less efficient compression. The Twitter version looks just as good and has more efficient compression (and thus a smaller filesize), which is why it was made the parent. Bigger doesn’t always mean better and all these duplicates are seriously annoying many users.

Twitter traditionally had far worse meta compression, which is why it was never made the parent. If it is about annoying users, we used to have threads about delaying/banning twitter image upload for 2-3 days to wait for the pixiv ones, because the twitter ones would become useless.
Unless the images were of different file type or one had artifacts you visually couldn't tell the difference anyway.
If twitter compression got so much better that we don't need the pixiv/nico versions which appear later, why don't we make an announcement about it so people stop what's now considered reposting? I'm sure many are still reposting just because they still think this is the way.

Or if it were possible now to implement image versioning, so we can just stack them in one post?

Updated

This has now been going on for a while now. The amount of pixel duplicates has been dramatically decreased thanks to the bot implementation deterring the one up with a "better" Pixiv duplicate practice, since these will not make it past the que anymore. Unless the approver doesn't see the duplicate tag...

Twitter stepped up their compression. That has also been a thing for a while now. Not to mention, sometimes artists post the higher file size/res version on Twitter compared to their Pixiv uploads. So instead of being source dependent, it is now dependent on the image itself and the uploader.

It begs to question, why upload something with minimal file size difference blindly? I suggest using the duplicate checker https://duplicatebooru.herokuapp.com/ if you absolutely must upload that 0.01MB difference image.

Updated

As the others said, my bot only tags perfect pixel duplicates, so it's not a matter of compression or resolution. My bot will never tag or change the parenting of posts with different dimensions.
As for the Twitter vs Pixiv dilemma, Twitter stopped compressing as harshly about a year ago, and Pixiv started stripping metadata from their images, so effectively there's no difference in 90% of the cases between the two platforms. If we already have a 800 KB picture, you really shouldn't upload a 860 KB picture just because of 60 KB of difference. Realistically, there's a visual distinction only at significant percentages of filesize difference, something like 200 KB vs 1MB, or 1MB vs 6 MB. 10% of filesize difference means the pictures are the exact same in all but the most rare of cases.

It's really annoying when you're browsing the site and half the posts you see are duplicates, and were only uploaded because someone decided to raise their profile counts.
I tried to revert the parenting at first and DM the users who abused this the most, but I was ignored and the parenting was routinely re-reverted so that the dupe uploader's post was the parent, so I decided to automate it. You can still find people constantly trying to revert my edits. Note that the worst offenders for this are builders, not members. I wouldn't even have implemented a bot if the issue was with new users, but it was always the same usual suspects doing this, so there was no other way to deal with the issue.

It had gotten to the point where for certain artists there's several hundreds of duplicate posts. Just look at ridiculous cases like duplicate ebifurya status:any, ixy duplicate status:any, shiseki_hirame status:any duplicate. All of them are pet tags of certain builders who just spam duplicates without even looking at the pictures.

Since I started running the bot in December the duplicate uploads have decreased by four times. If I was a bit more disingenuous I'd say it's because they've noticed that child posts don't rack as much upvotes as parents, but I'll just assume instead that they realized their errors and started checking better.

In any case, the rule against duplicates was always there, it simply wasn't enforced until recently - and you really shouldn't need a rule to know that you shouldn't upload the same exact picture three times in a row.

Updated

Again, i agree with these, visually we couldn't tell the difference between images anyway. Thank you for explaining about the bot, i really couldn't find info on duplicates earlier than 7 months ago and it didn't say about this.

There's still clearly a bias left over from a lot of people not knowing about these changes though.
Maybe it's because i wasn't uploading images with just 2-300k difference anyway that i didn't see it till recently but, there are hundreds of people still re-uploading what they might consider as the "better" version from pixiv. There was always the duplicate rule, yet "better quality" (even if it wasn't really different) was not considered a duplicate which they seem to have stuck with.
There was even the argument that Pixiv is "true compression" since it's the actual size of the image on your device, even after uploading it. Twitter on the other hand re-compressing it after uploading. Again, visually it doesn't look different so this is fine.

It's good that duplicate traffic was reduced but, to completely deter reposting, we should do an announcement is what I'm thinking now.

1