Donmai

On the topic of replacing non-samples

Posted under General

Because I'm fairly annoyed at the way some approvers choose to approach this feature, I've decided to open up a new topic separate from topic #14063 describing what I personally think is wrong with this approach.

The post replacement feature was something originally developed to replace image samples only, so that the post index wouldn't be cluttered en masse with explicit 1up replacements and we wouldn't have to resort to moving favorites (that which isn't perfect even right now). Yet it is now used in a multitude of cases, both of which may be fairly justified or unjustified:

  • A user uploaded the wrong image (say, a thumbnail or site asset, or the wrong image in a gallery set) and an approver forgives that mistake by replacing it with the correct image. This is fine.
  • Corrupted images -> uncorrupted images. Also fine. About par since corrupted images are also unconditionally unwanted.
  • An approver replaces an "inferior" image with a "superior" image, sometimes with and usually without checking for integral superiority in every single aspect of an image file's quality. Sometimes the image doesn't even share complete visual identity, putting aside compression artifacts.

There have been a number of cases where an approver has went out of their way to replace images that were actually worse than their original counterpart without checking, or inherently different, however. In that respect, I'd like to clear up why we shouldn't do the third case, and stick to the normal way of 1upping or supplanting Twitter uploads with their "superior" Pixiv counterparts.

  • It messes with repost bots. This indirectly harms an image's survival in other imageboards. In the case that ours is eventually lost through either an artist requesting a hard deletion of all their work, if no one catches it then, it is gone for good. And even if we still have it, it harms the artist's exposure.
  • It messes with IQDB and SauceNAO. Again, an unintended effect of the post replacement feature since they have to re-index whatever gets replaced.
  • Some users prefer alternate sources, simply because it's easier to share a user's work that way. A somewhat unique case, but it is one nonetheless.
  • Cases where a Twitter image, or an image from another inferior site would be considered superior in at least one other criterion listed in help:post relationships, or other similar criteria listed as such.
    • Notably if a Twitter image (PNG, RGBA32) is better compressed than one from Pixiv/Seiga/Tumblr/whathaveyou. See post #2844376.

I have finished a writeup a few months ago on this in topic #14426, about which "duplicates" we want to keep. Specifically in this area. There are numerous cases where detecting if an image is revised without doing a layer comparison is difficult and not immediately obvious. See post #2851989, post #2863296, and post #2767376 (along with their respective children).

post #2767376, in particular, is interesting because the pawoo counterpart image sources are no longer alive. And they were different than that of the pixiv versions (as the pixiv versions had the palm tree in the background drawn in after, whereas the pawoo images didn't have them at all or were drawn in differently). Of course, I would say they are now "long lost" but luckily gelbooru has them saved because of their repost bots.

It's a great example in particular, because it describes to the extent which human error goes in regards to comparing identity with images. Let me draw this out with one of them.

While I don't disagree with the notion that it's great to have less duplicates on the server, the fact is simply that we cannot trust ourselves to do otherwise because it will eventually lead to the sort of case where if we're not careful, we'll be replacing totally fine images with images that are potentially worse. What if somebody replaces an image uncensored on twitter/pawoo with an image censored on pixiv? And if no one notices until months, perhaps years later, long after the original source is gone?

The fact is that while duplicates are not preferred, I prefer having duplicates over inherent human mistakes any day of the week. This sort of impromptu duplicate superseding and "removal" is subject to mistakes that are simply out of our control lest we go through an arduous check of everything concerning an image's quality, no matter in what respect it is. The reason why replacing samples and corrupted images, or replacing the wrongly uploaded image with the right image is accepted is because we can confirm, with zero doubt in our minds, that the previous image was unwanted. Samples and corrupted images will ALWAYS be worse than their original counterparts. You cannot say the same for this.

More importantly, why change a system that's worked completely fine in regards to 1upping other users? If you upload from Twitter, you basically accept the risk that your post will be superseded by a superior post if it is reposted somewhere that doesn't resample the original image. It's better than always having to quickly check for visual identity and then have to check the file quality or information present also.

Anyways, I'll have more to bump this topic with, but I recently got a new phone in the mail today and I have to set that up.

Updated

I would say that it should still be allowed for the third option.
MD5 mismatch doesn't fall under this third section and they should never get replaced. If that's done multiple times it should be ground for a negative feedback.
It probably also should not be allowed for different..ehm...2suffixes" like .png or .jpg, but i think that's already clear since months.

As for metadata alone...Should we really put importance to this? I actually doubt that we should, because we are first and foremost a site where we simply look at images. It might be important to a few users, but I think it's not worth creating a fight over it because the normal user, even up to Builder won#t notice that.
I also doubt that it will help approvals or that someone will reject an image because of a difference in meta data. The most important thing there is still if the image is lossy (jpgs from Twitter) or if they are lossless (png). Every further approach would probably be too much and is too hard to understand for most users.
So I think metadata alone really shouldn't matter or at least not to an extent to call something "inferior". If it's necessary to replace those, probably not, but I wouldn't budge here, because we first care about how the picture looks.

If the Twitter post is still superior in one aspect. I think that depends and there is a margin of appreciation. There might be a Twitter post with more artifacts but higher reslution than an artifactless pixiv version that is only half in size.
I honestly wouldn't replace those images if we have one or more different aspects. But I also wouldn't nag someone about it, either.

In short, I personally think that re-uploading an image should be done in every case, but I wouldn't take it so far to attack someone for replacing something with a worse posts that only matters in metadata because it isn't visible for the normal (i.e. nearly all) users.

Mikaeri said:

While I don't disagree with the notion that it's great to have less duplicates on the server, the fact is simply that we cannot trust ourselves to do otherwise because it will eventually lead to the sort of case where if we're not careful, we'll be replacing totally fine images with images that are potentially worse. What if somebody replaces an image uncensored on twitter/pawoo with an image censored on pixiv? And if no one notices until months, perhaps years later, long after the original source is gone?

The fact is that while duplicates are not preferred, I prefer having duplicates over inherent human mistakes any day of the week. This sort of impromptu duplicate superseding and "removal" is subject to mistakes that are simply out of our control lest we go through an arduous check of everything concerning an image's quality, no matter in what respect it is. The reason why replacing samples and corrupted images, or replacing the wrongly uploaded image with the right image is accepted is because we can confirm, with zero doubt in our minds, that the previous image was unwanted. Samples and corrupted images will ALWAYS be worse than their original counterparts. You cannot say the same for this.

I tend to agree. Human error is the biggest source of problems in not just Danbooru, but in basically every field. In this example, I think it's worth considering if we augmented the replacement system with an image comparison tool that programmatically compares images to validate if a replacement is valid, or if a new upload should be done instead.

Resemble.js is a really cool example of how this problem could be solved. Their demo lets you upload two images, scale them to the same size, and then compare them. As an exercise, I took two images from Danbooru with considerable differences and downscaled one of them just to ensure that it was a good representation of what might happen. I changed a couple defaults and got this result, which shows about a 13% difference and clearly highlights the differences.

If someone were doing an image replacement, this type of analysis could shown to throw up a red flag to say "Hey! This isn't really the same image!" Since samples usually have minor artifacting differences and are at different scales, it wouldn't show too much of a difference between a sample and its original. By comparison, a palm tree existing or not existing would be quite a lot different. In fact, Danbooru could just reject a replacement if the diff came out above a certain threshold.

On a more unrelated note: I've often found it frustrating to play "spot the differences" between parent and child posts. I would really love some fields that would enable us to track the differences between posts and show them in some clear way. It's not always obvious what's different between 5 different images.

SoulEvansEater said:

I tend to agree. Human error is the biggest source of problems in not just Danbooru, but in basically every field. In this example, I think it's worth considering if we augmented the replacement system with an image comparison tool that programmatically compares images to validate if a replacement is valid, or if a new upload should be done instead.

Y'know, there should be a Github issue for that... oh wait, there is... (issue #3196) :p

On a more unrelated note: I've often found it frustrating to play "spot the differences" between parent and child posts. I would really love some fields that would enable us to track the differences between posts and show them in some clear way. It's not always obvious what's different between 5 different images.

That's an interesting idea. If there's more support for that, I could pitch it on Github. (ツ)

SoulEvansEater said:

I tend to agree. Human error is the biggest source of problems in not just Danbooru, but in basically every field. In this example, I think it's worth considering if we augmented the replacement system with an image comparison tool that programmatically compares images to validate if a replacement is valid, or if a new upload should be done instead.

Resemble.js is a really cool example of how this problem could be solved. Their demo lets you upload two images, scale them to the same size, and then compare them. As an exercise, I took two images from Danbooru with considerable differences and downscaled one of them just to ensure that it was a good representation of what might happen. I changed a couple defaults and got this result, which shows about a 13% difference and clearly highlights the differences.

If someone were doing an image replacement, this type of analysis could shown to throw up a red flag to say "Hey! This isn't really the same image!" Since samples usually have minor artifacting differences and are at different scales, it wouldn't show too much of a difference between a sample and its original. By comparison, a palm tree existing or not existing would be quite a lot different. In fact, Danbooru could just reject a replacement if the diff came out above a certain threshold.

On a more unrelated note: I've often found it frustrating to play "spot the differences" between parent and child posts. I would really love some fields that would enable us to track the differences between posts and show them in some clear way. It's not always obvious what's different between 5 different images.

I like this idea.
Especially if you want to compare images you still struggle (even users that have a lot of experience with images) and I assume one doesn't have the two images at hand when replacing them.
Of course errors can still happen even with this tool that might get implemented sometime in the future.

BrokenEagle98 said:

Y'know, there should be a Github issue for that... oh wait, there is... (issue #3196) :p

I added a reference to my thoughts in that issue! Thanks!

That's an interesting idea. If there's more support for that, I could pitch it on Github. (ツ)

I kinda would have suggested it myself, but I can't think of a way to add anything to an existing model or to create a new model that would track the diffs. Ideally you'd see them on the parent and the children showing differences, but that could become expensive query wise. That's my only real negative thing on it.

SoulEvansEater said:

I kinda would have suggested it myself, but I can't think of a way to add anything to an existing model or to create a new model that would track the diffs. Ideally you'd see them on the parent and the children showing differences, but that could become expensive query wise. That's my only real negative thing on it.

Ah, I thought you were talking about a Javascript tool that would be made available client-side that would allow users to compare images visually... or would that alone suffice with your intent?

SoulEvansEater said:

On a more unrelated note: I've often found it frustrating to play "spot the differences" between parent and child posts. I would really love some fields that would enable us to track the differences between posts and show them in some clear way. It's not always obvious what's different between 5 different images.

Mm, well I might as well stick my ideas out here, but here's what I had in mind:

  • Display file info and metadata somewhere. Colorspace used, ICC profile if present, thumbnail, offset, shit like that. Clientside or serverside, the computation is cheap and quick. You can just use Exivtool or Exiv2, keep a binary on the server that's available to use or just dump that info into the post table somewhere if you want that information searchable. Tags like `bitdepth:24` or something.
  • Have a diff available for metadata if above is checked off.
  • Have some sort of built-in diffchecker. Indeed, Resemble.js is an image analysis library that provides these features. It doesn't have to be that in particular, but something. Something preferably quicker than ImageMagick, that's for sure. Make sure it tries to stay as neutral as possible in regards to scaling/resizing algorithms, as nearest-neighbor, bilinear, and bicubic among other scaling algorithms all provide noticeably different images.

I won't comment on the main stuff yet until others have had a chance to chime in.

I agree on the issue that uploads from one first-party source should never be replaced by an upload from a different first-party source. Twitter->Pixiv replacements should not be happening at all. Any kind of first-party revisions in the same source should be off-limits too. Replacement is meant for samples (third-party in a sense, since it's resized by the website rather the artist) that are mistakenly uploaded.

Type-kun said:

@Mikaeri (and @Randeel too), while we're at it, I'd like to hear the detailed reasoning for replacing on (post #2895435/post #2894622) and (post #2896838/post #2896652) pairs, especially the latter case when you swapped the image back to twitter sample and then uploaded pixiv one yourself, instead of simply uploading the twitter version if you thought it was worth keeping.

Nothing like having to explain the same exact thing to five different people in three different places, sigh. Not to mention I already got a neutral over it, so what do I have left to lose.

We. Don't. Replace. Better. Images. With. Worse. Ones. PERIOD. Even if it's in a single respect, the Twitter version is better for archiving to most users than the pixiv version simply because most users will not care about the extra embedded data in the image. Users came to favorite one image from one site, not an image for another site. The only time replacement with a different image is appropriate for these kinds of things is if the approver uploads it on accident and sees a better one pop up in a short timespan, letting a user like myself know to replace their image with the superior one.

And the only reason the pixiv version is the parent (which makes this look like a 1up, given) is because as decided from consensus on help:post relationships the image with more metadata present is considered the more 'original' image, even though to be fair it could really go either way. Parent-child relationships don't describe which post is better (albeit the implication is usually there), but instead what posts are blindly related.

As to why I replaced his around is because I told him not to do the re-up inappropriately the first time. The proper method of reuploading from a source has always been to replace back the original, intended image at the time or briefly after, and then upping the new image separately, in the case any sort of revision or major change has happened without our knowledge and confirmation.

So I suppose it's bad behavior (as Renim claims I'm doing) when I'm simply correcting something I saw that shouldn't have happened in the first place. In the same way that some users sometimes hard-replace samples, where in the past that was simply the only option -- and even now when it happens because some users can't be stymied to wait for someone to replace it or simply don't know (as they don't check similar images), especially when it's an unsourced sample that they have absolutely zero obligation to check.

As approvers, we are gracious enough to forgive some uploading mistakes. One example I can think of in particular is replacing AnoCold's unsourced, bloated png finds of lpip's twitter images: lpip user:AnoCold commenter:DanbooruBot. However, we have no obligation to do this everytime, and in fact these uploads should really have went to deletion given we can't confirm where the user found it originally. I would consider this misuse on my part, and in retrospect I would probably have uploaded the Twitter versions myself. In the case that this looked like "credit hoarding" as Renim claims, I can always feign ignorance because I simply did not know they were already uploaded, even if in the case they were worse versions and unsourced. In fact, the previous paragraph describes what happened with this image: post #2880343.

In any case, if the feature is going to be used this way then I would rather any of us not have it at all except for mods. The work's already been done to replace the huge amount of image samples except for Twitter (simply because no one at the moment cares for them). We can take a few soft deletions. What's wrong with forcing users to abide by the original policy of samples always going to deletion? In this case, at least they learn faster instead of us having to always forgive those mistakes simply for the betterment of our gallery with replaceme and the like. It beats having to deal with very human and very messy mistakes like these ones.

Updated

EB said:

I agree on the issue that uploads from one first-party source should never be replaced by an upload from a different first-party source. Twitter->Pixiv replacements should not be happening at all. Any kind of first-party revisions in the same source should be off-limits too. Replacement is meant for samples (third-party in a sense, since it's resized by the website rather the artist) that are mistakenly uploaded.

Pretty much this. We shouldn't hard-replace pre-revisions with revisions that are simply higher-res since we can't always confirm these differences. Although yes, I've done this a few times (such as with kisaragi nana when I assumed he uploaded a smaller res to one of his galleries on accident), I would prefer this be the exception rather than the norm. And that's still something that I take the time to check with deep scrutiny.

There are times where I've "swapped" the order of my posts simply because I wanted it to look in-order in the index while there was still time (as nobody has favorited it yet and the like) such as with post #2880871 but I've quickly come to regret it a little now because it fudges up when you try searching the image on IQDB and SauceNAO. In fact, the Gelbooru link to the image is broken. I don't know if this is a bug anyone has noticed yet, but it is a side effect of us doing this sort of thing. It probably applies to all the replacements uploaded.

Chiera said:

If you see something like that happen, then just upload the Twitter image (or the image that got replaced) again.
That's safe.

Users came for one image, not another. Why mess up a well-established order? And why interfere with the repost bots?

Mikaeri said:

Users came for one image, not another. Why mess up a well-established order? And why interfere with the repost bots?

I guess repost bots for other sites are more or less unimportant to us. They don't belong to this site, so it is Gelbooru's problem if they only rely on reposting botsthen. If we lose an image completely then, then it's not because of "bad working" repost bots but because we replaced Twitter with Pixiv. I guess that's reason enough to not replacing Twitter with Pixiv, also with an extent to possible revisions.

I guess if you replace an image that way but upload the version again under your name then you are doing wrong the replacer/uploader.
So in this case, Randeel did replace the Twitter version with the Pixiv version. There is concensus that this is wrong, I guess.
But I guess it is also not ok to revert this replacement just so. Because now the Pixiv version is uploaded under your name but it was before uploaded under Randeel's name.

That's why one should upload the Twitter version again. Because you can't answer wrong doings with another wrong doing. Maybe it wasn't your intention to make it look like this and I think your reasons why Randeel's approach is wrong does also have very solid points.

So to make another note: Replacements are not there to revert a false replacement. Instead upload the file that got replaced again if possible. Then it is stored here and it won't get flagged (even if inferior) because the re-upload is justified.

Mikaeri said:

Nothing like having to explain the same exact thing to five different people in three different places, sigh. Not to mention I already got a neutral over it, so what do I have left to lose.

If you have to continuously explain why are you doing something, you probably should have taken this as a sign that something is wrong with whatever you are doing.

Mikaeri said:

In any case, if the feature is going to be used this way then I would rather any of us not have it at all except for mods.

FYI, this solution had been brought up already, and I'm afraid we'll have to resort to it after all. I really don't like the idea of adding more workload to our mod staff, but if you builders start using this feature to prove whatever points to each other, we'll have to restrict it it. Maybe it'll become a separate permission afterwards, someday.

Still waiting for @Randeel's answer. Why do you replace twitter images with pixiv ones, shortly after you upload?

---

Going back on-topic, it might be technically possible to allow builder+ only to replace samples from the same source, leaving non-sample replacements to mod+, which probably would be ideal in terms of workload distribution. I'll need all possible data on how sample uploads happen at all, since we generally have rewrite rules in place.

Type-kun said:

Still waiting for @Randeel's answer. Why do you replace twitter images with pixiv ones, shortly after you upload?

Mostly to reduce clutter and the number of duplicates. I usually replace other user's uploads this way, too.
I haven't gotten a warning or anything of the sort from a Mod+ so I thought it was ok to do it.

Randeel said:

Mostly to reduce clutter and the number of duplicates. I usually replace other user's uploads this way, too.
I haven't gotten a warning or anything of the sort from a Mod+ so I thought it was ok to do it.

It's been spelled out not to do it since the feature was implemented. See evazion's second post in topic #14063. It was left open to discussion, but a number in that thread were wary of Twitter->Pixiv replacements and so it certainly doesn't seem there was any agreement it was OK.

EB said:

It's been spelled out not to do it since the feature was implemented. See evazion's second post in topic #14063. It was left open to discussion, but a number in that thread were wary of Twitter->Pixiv replacements and so it certainly doesn't seem there was any agreement it was OK.

QFT, but just to back this up:

Nobody ever said it was okay to do this. In fact, doing so was up to contention from the beginning. It's going to take some time to perhaps "reverse" all of these mistakes, but I'd rather it this way than us having to continue checking manually. This has went under the radar for so long because I trusted that approver to be responsible with checking for similarity or superiority, but he hasn't been. And that's why I issued that negative.

EB said:

It's been spelled out not to do it since the feature was implemented. See evazion's second post in topic #14063. It was left open to discussion, but a number in that thread were wary of Twitter->Pixiv replacements and so it certainly doesn't seem there was any agreement it was OK.

To be honest, that might have been written here, but there is something that is called Custom.
I think that should apply here, even if Evazion wrote something in that topic that is against it and multiple people, too.
The action of replacing Twitter with Pixiv wasn't done by Randeel alone but by a row of Approver and sometimes even by the Admins.
So I would definitely take into account that there was activity going on and that was that Twitter is replaced by Twitter.
I still agree, we shouldn't do that, but that's a different layer in that issue as far as I know.

1 2