I hate having to explain this repeatedly in private, so I'm putting it up in a thread because this requires concrete resolution and I won't shut up about it until that happens.
The policy on how we resolve duplicate uploads is outdated and offers next to no reasoning why it is as such. The goal of this thread discussion is to help elaborate on this policy so we can reduce the amount of confusion that revolves around this concept and policy among uploaders, gardeners, and approvers.
So let's explain this a bit. The current duplicate policy (per use of the tag which is also pretty faulty) states this:
- Do not flag duplicates for deletion if your only reason is that the image is a duplicate.
However, you will find that there are users that, by way of plain ignorance or plainly by seeking an easy way to become promoted or 'look like a good uploader' in the gallery, upload these what we would call 'duplicates'. The most concerning part especially when it happens to be a third-party edit or a discrete image sample (such as a pixiv sample) seemingly made legitimate by re-uploading the image to a mirror site so that it looks difficult to dispute (obscure sources like imgur, discord, or fc2).
This is a multi-faceted problem exacerbated by the fact that we basically have zero current policy or code of operations by which images are gardened, vetted, and approved. Approvers are simply told to use their own instincts and approve what they see they like regardless of that image's actual validity or integrity. Uploaders are guided towards the way to do things correctly by way of the howto pages, but nothing locks them down from purposefully uploading incorrectly without any consequence given the chance they are allowed to get away with it, aside from more experienced and well-meaning users catching a glimpse of that behavior and deducing it firsthand.
There is a current pattern though, and this is what it looks like right now:
- User uploads a or an image known to be a sample from a source site (such as a pixiv sample) unknowingly. This is fixed and remedied through scripts and Moderator+ level gardeners in topic #14156. This is expected behavior.
- User knowingly uploads a sample when an original already exists. Their post is moved to deletion. This is expected behavior.
However, if you throw in the fact that a user is allowed to upload any image from their computer and does not necessarily have to provide a site source, this gets invariably more complicated since we can't verify such above cases ourselves.
Let's take post #3003247 for example. I moved this post to immediate deletion under the assumption and guise that it was a sample (and subsequently lost my approval privileges over it, go figure). This isn't totally wrong, but it ignores the base reason why we would be interested in removing posts like these.
For one, if you check, then wherever the uploader claims this image is from is really not where it comes from. It already fails that first check of validity by being an md5 mismatch from the source provided. Dig deeper and you actually find it's a lossy-lossy image, and the metadata present in the original image is missing, suggesting it is indeed a resaved 'edit'.
Everytime you resave a file in a lossy format through your editor, you introduce more lossy artifacts regardless of if you actually changed anything. You can observe this by way of an actual image diff, or a visual layer comparison in photoshop, GIMP, Krita, whatever allows you to compare it personally. On that scale it is quite small and practically insignificant to the naked eye, but it is a change regardless.
This third-party saved image was uploaded after the original was already upped from the source site, and it was even mistakenly marked as a revision by a gardener, despite the fact that the artist almost never revises their work on pixiv. Here's the crux then: How do we handle third-party edits like this?
Regarding that specific image itself it's hard to deduce, really, what the original uploader did or was thinking when they did it even when they were asked for comment. Such a user was already pointed to resources on how to upload correctly and yet they still happen to mess up somehow and some way. But it happened, and we have no way of handling such things under current policy, though a number of approvers with the knowledge to confirm an image's validity have taken things into their own hands, myself included.
Before, it would be: Did the user understand their commitment to a mistake? Then perhaps its best to point them to topic #14156 and hope that someone amends their mistake. Since we have no support for uploading from sites like E-Hentai, Weibo or bcy, sometimes this mistake honestly happens (see topic #14119). So it'd be appropriate to request a replacement, even if it isn't from a site that gains automated sample replacement support through user-run scripts (primarily pixiv, again see topic #14119).
Yet when a user uploads an image they unknowingly or even knowingly made an edit to, does that warrant replacement or forgiveness? Especially in an 'innocent' case like this where the image was double-saved in their editor and then re-uploaded here. After all, here in booru there is no real concept of truly deleting an image aside from expunging it, which we almost never do aside from completely inappropriate content.
To put this into example, and I'll copy this from another message I recently sent.
We've done it in the past before and that's exactly what I complained about regarding replacing images from Twitter -> Pixiv or third-party Discord -> Pixiv. But to put it in simpler terms: Say nonamethanks uploads an image correctly from an artist's Twitter and I upload an image with 99.9% visual identity... the catch being that I uploaded it from "Mike's Anime Gallery" on pinterest. Let's say Mike here has sourced the image from Pixiv or Tumblr where the image is untainted by resampling artifacts. But pinterest resamples images. And then I claim I forgot where I found it because I 'thought' it was the same image from Pixiv and the link is already lost, or better yet I stay silent and feign total ignorance.
This is my opinion and my reasoning. Replacing third-party edits isn't helpful, nor is it even warranted because we cannot claim for sure whether we are replacing an image that we'd like to reference in the future. It attributes credit to the wrong user as they are not the original uploader of such an image when it is found, and it is a poor admittance of due credit. It has been done before in the past, where images uploaded from Facebook or Discord have been replaced with their pixiv/seiga/artist twitter variants, but I raised hell about this enough to make the policy change and was partly responsible for why approvers aren't allowed to replace images in-place anymore.
Just to wrap things up here: We say we don't delete duplicates, but we delete tons of 'duplicate' content that's rejected in help:third-party edit: lossy-lossless, lossless-lossy, or even the currently nonexistant lossy-lossy (which would go for a turbulent many images in the gallery if it were ever applied). Artists can apply any of these conditions themselves... but its a rare and in-between case, and oftentimes its a third party that applies it.
These categories are where the concern over deleting duplicates arises, since it's hard to tell and sometimes can only really be verified with tools you provide yourself.
If you need to read more on topical appropriateness for where a duplicate would be completely fine to upload, you can check out my writeup at https://hackmd.io/s/SJXiK1L9- or in topic #14426. Of course, sometimes I'm not totally right but I'm open to editing and fixing it with suggestions.
Updated