Donmai

[New Feature] Post Replacements

Posted under Bugs & Features

fireattack said:

From help:replacement notice:

I didn't get the reasoning about this particular one.

Sure, PNG is larger than JPG and someone may prefer the "lightweight" JPG version.. but one can also prefer "smaller size" sample images, we're replacing them anyway.

And, it's essentially no difference from low quality JPG -> high quality JPG replacement, which is allowed, since both practices increase the file-size.

Basically, I don't understand why replacing low-quality JPEG with high-quality JPEG from alternate source is allowed, while replacing low-quality JPEG with high-quality PNG from alternate source is not.

Again, I am personally fine with both approaches (replacing them in-post or posting as parent), just feel this *reasoning* doesn't make any sense.

You can always convert a lossless format into a lossy format without any repercussions to the original file, as whatever sampling/compression may have happened doesn't alter the original source file. You can even compress a lossy JPEG to be even smaller in filesize from the lossy JPEG itself. But you can't convert lossy into lossless because then that legitimizes the image degradation (lossy-lossless). So I see why there might be confusion.

Truth be told, we shouldn't even be doing low quality JPG -> higher quality JPG in the first place because we can't check consistently for visual identity (small censor bars, line adjustments, etc), but it's only very lightly allowed because some approvers have taken it into their own hands to reduce the number of duplicates on the site. I'm against it overall, but I do see why they'd want to do it, so I don't really argue (albert himself has replaced a Twitter JPG post with pixiv JPG post, which has led to this sort of happening).

Anyways, for further reasoning as to perhaps why we'd want to keep a JPG rather than fully replace it with the lossless PNG, one part of it is because I don't want to see more mistakes, and the other part is that if we know it's from a legitimate source, that means the artist intentionally released it in a JPG format first before uploading it in a png format somewhere else. Twitter, especially, can't just convert your png to jpg out of the blue whenever you upload an image post to the site. And of course, the last note -- we don't know if there are artists out there that unintentionally make their own posts lossy-lossless by converting their own jpg to png (which might be a headache in and of itself to really discover).

Mikaeri said:

I know how compression works, but I think that's beside the point I'm trying to make. All the discussion is built on the assumption that the PNG version is legit and genuine, i.e. it does have no jpeg artifact compared to replaced jpeg version, which is no difference from replacing a low quality jpeg version (twitter for example) with a high quality one (pixiv for example): we need to examine that too (nothing stops an artist to unintentionally re-save a low quality jpeg into a nominally high quality one, either).

Anyway, since you think even low quality JPG -> higher quality JPG from alternate source shouldn't be allowed / encouraged, that is consistent and answered my question.

Twitter, especially, can't just convert your png to jpg out of the blue whenever you upload an image post to the site

Not sure if I understand what you mean correctly here, but twitter does convert png to jpg. I do remotely remember they didn't before, but at least currently, all pngs are converted to jpgs when uploading to Twitter.

And for what it matters, Twitter always re-compress jpeg image to 85% quality (4:2:0 chroma subsampling) too, which is why the quality is always shit over there.

Updated

fireattack said:

Not sure if I understand what you mean correctly here, but twitter does convert png to jpg. I do remotely remember they didn't before, but at least currently, all pngs are converted to jpgs when uploading to Twitter.

Twitter may do this for some accounts, but it doesn't do it for every account. For example (art I saw scrolling through my timeline), pito_(sh02327) (NSFW) and eneco can still post PNGs.

fireattack said:

I know how compression works, but I think that's beside the point I'm trying to make. All the discussion is built on the assumption that the PNG version is legit and genuine, i.e. it does have no jpeg artifact compared to replaced jpeg version, which is no difference from replacing a low quality jpeg version (twitter for example) with a high quality one (pixiv for example): we need to examine that too (nothing stops an artist to unintentionally re-save a low quality jpeg into a nominally high quality one, either).

You are correct. We can always assume good faith from legitimate sources, but I do mention it because some artists have done questionable things before out of naivete. pixiv #63227947, for example, is a collection of all jpg:large samples from Twitter. Then there are posts like post #2528974 and pretty much 80% of everything kimagure_blue md5_mismatch (worse quality after revision). That's why I would rather both even if they are the same file format, as both Twitter and pixiv/seiga uploads are acceptable in tandem, even with Twitter's compression artifacts.

Anyway, since you think even low quality JPG -> higher quality JPG from alternate source shouldn't be allowed / encouraged, that is consistent and answered my question.

The feature was originally developed because there was a need to replace the vast majority of image samples from Twitter (and more recently, Tumblr) without having to resort to making wholly new and separate posts (hard replacing). As such, the only replacements I'm especially tolerant of outside of that is stuff that we can basically confirm to be of indubitably better quality. It's always possible to do a diff if a visual comparison just doesn't offer enough, anyway -- when file formats remain the same,

Not sure if I understand what you mean correctly here, but twitter does convert png to jpg. I do remotely remember they didn't before, but at least currently, all pngs are converted to jpgs when uploading to Twitter.

And for what it matters, Twitter always re-compress jpeg image to 85% quality (4:2:0 chroma subsampling) too, which is why the quality is always shit over there.

fossilnix has answered this in forum #133432, but AFAIK Twitter does not. See this search: filetype:png source:https://twitter.com/. Pixelwise, images remain the same for PNGs, it's just that Twitter makes those files go through a second round of bad compression (some pngcrush-like software? Who knows), which leads to things like post #2687722 and post #2624341 being a common case, even though both have the exact same image data. I've only ever done this sort of replacement once (post #2766377).

Well, Twitter is a pretty shitty site to upload from anyway. The jpeg artifacts usually not enough to completely ruin an image, but they're definitely a killjoy for a great number of users, only made mitigable when an artist provides a resolution 'absurd' enough to offset the effect of image compression. A number of Yoshida Iyo's posts are like that (yoshida_iyo absurdres) and also with yaman's older Twitter posts (yaman absurdres).

Updated

Mikaeri said:

fossilnix has answered this in forum #133432, but AFAIK Twitter does not. See this search: filetype:png source:https://twitter.com/. Pixelwise, images remain the same for PNGs, it's just that Twitter makes those files go through a second round of bad compression (some pngcrush-like software? Who knows), which leads to things like post #2687722 and post #2624341 being a common case, even though both have the exact same image data. I've only ever done this sort of replacement once (post #2766377).

Thanks for pointing out it doesn't always happen, but it does happen in most if not all of my testings (you can try it yourself, I just uploaded a whole batch of PNG files to my own twitter and they all got re-compressed to jpgs (except the one I mentioned above, so it's not my account)). I still don't know what decides it though, considering even some of my very small images (in term of dimension and filesize) got recompressed too.

Updated

OK, after some research, it appears if your image consists transparency information (even if it's just 0% transparency for all pixels←I was wrong, you need to at least have 1 pixel that doesn't have 100% opacity), Twitter will NOT re-compress it to JPG. See: http://ravenworks.ca/twitimagefix/

I just downloaded a few images in filetype:png source:https://twitter.com/ and checked them in Photoshop (to start with, if they have transparent information, they will appear as "layer 0"; otherwise, just "background"; then you can convert transparency to a layer mask (Layer Mask > From Transparency) for further investigation). I can confirm this is correct.

For example, post #2779166 has a transparent pixel at the bottom-left corner.

Updated

fireattack said:

Thanks for pointing out it doesn't always happen, but it does happen in most if not all of my testing (you can try it yourself, I just uploaded a whole batch of PNG files to my own twitter and they all got re-compressed to jpgs (except the one I mentioned above, so it's not my account)). I still don't know what decides it though, considering even some of my very small images (in term of dimension and filesize) got recompressed too.

Strange. Maybe we should stick it in howto:Twitter somewhere.

EDIT: See above, ignore me

Updated

(migrating this question from topic #14156):

☆♪ said:

Also, I have a situation that I'm unsure about: pixiv #58308768 has post #2151686 and post #2118042 (at least), which were uploaded from Twitter. They are PNGs, and I've confirmed that they're pixel-for-pixel identical -- in the case of those two posts, at least -- but the ones from pixiv are better compressed (smaller filesize for the exact same image) and have slightly more metadata. Since the images are the same, it's definitely not worth uploading and parenting IMO. Should we replace the images? They pixiv ones really are superior in every way, and there's nothing to be lost by getting rid of Twitter's versions, although there's also not very much to be gained. I hesitate, however, to set a precedent of replacing from a different source because that can be catastrophic if not done very sparingly and very carefully. Thoughts?

IMO we should hold off on cross-source replacements until we have better verification processes (issue #3196). It's not an easy task, because any verification strict enough to prevent replacement of 99.9% similar revisions would very likely run into problems with preventing replacement of legitimate samples.

I suggest we focus on clearing the image sample backlog first. It will be easier to make stricter checks if we don't have to worry about blocking bot replacements.

evazion said:

(migrating this question from topic #14156):

IMO we should hold off on cross-source replacements until we have better verification processes (issue #3196). It's not an easy task, because any verification strict enough to prevent replacement of 99.9% similar revisions would very likely run into problems with preventing replacement of legitimate samples.

I suggest we focus on clearing the image sample backlog first. It will be easier to make stricter checks if we don't have to worry about blocking bot replacements.

This. I recommend uploading the pixiv ones separately. We can have duplicates, that's fine. I prefer having duplicates over making human error. If you're an approver and you see it, you can, but the responsibility is entirely on you to ensure that those posts are to be correctly replaced.

Mikaeri said in forum #133798:

^

@☆♪ Read topic #14063/p3 when you get the time. It'll provide you some insight on the already existing situation.

EDIT: Also help:replacement notice

I think that the description on that wiki is a tad inconsistent. The bullet point for replacement with better lossless compression says that metadata must be the same, but the example (post #2766377) doesn't have exactly the same metadata -- the Pixiv replacement has physical dimension metadata that the Twitter version lacked, in addition to being better compressed. The two examples I posted are the exact same situation. So it would seem to be a grey area that the rules both condone and forbid.

As for the general discussion, I'm in agreement that we should be very careful and that an automated system to check is desirable.

Mikaeri said:

This. I recommend uploading the pixiv ones separately. We can have duplicates, that's fine. I prefer having duplicates over making human error. If you're an approver and you see it, you can, but the responsibility is entirely on you to ensure that those posts are to be correctly replaced.

I still don't think it's worth the favorite-migration etc. costs of a new upload since the actual images are 100% identical. If these images aren't being replaced, I probably won't upload the Pixiv ones, though I suppose there's nothing stopping someone else from doing so.

My hesitation, as obnoxious as it may be, is largely that I'm confident that I'm careful enough but don't trust most users to do the same. But I may not even trust my future self, if this sort of thing were to become common enough that we got used to it. It's almost impossible to really force yourself to keep paying as much attention when you do something many times, which is why you want a computer having your back. So it does probably make sense to hold off until the samples are cleaned up and we can put a checking system in place. For cases like these even an extremely strict and simple check (pixel-for-pixel identical) would suffice. The only counter-argument that comes to mind is that since works are often deleted from their original sources, we run a small risk of losing the opportunity by waiting. It's probably not so bad in this case because what we'd actually lose is really only the physical dimension metadata (and a chance to save a little bit of space).

☆♪ said:

I think that the description on that wiki is a tad inconsistent. The bullet point for replacement with better lossless compression says that metadata must be the same, but the example (post #2766377) doesn't have exactly the same metadata -- the Pixiv replacement has physical dimension metadata that the Twitter version lacked, in addition to being better compressed. The two examples I posted are the exact same situation. So it would seem to be a grey area that the rules both condone and forbid.

As for the general discussion, I'm in agreement that we should be very careful and that an automated system to check is desirable.

I did not notice that, thanks for catching my mistake. Supposedly I should have uploaded them in separate posts, huh. There are other examples I can dig up later, but for now I'll just close it at that.

I still don't think it's worth the favorite-migration etc. costs of a new upload since the actual images are 100% identical. If these images aren't being replaced, I probably won't upload the Pixiv ones, though I suppose there's nothing stopping someone else from doing so.

My hesitation, as obnoxious as it may be, is largely that I'm confident that I'm careful enough but don't trust most users to do the same. But I may not even trust my future self, if this sort of thing were to become common enough that we got used to it. It's almost impossible to really force yourself to keep paying as much attention when you do something many times, which is why you want a computer having your back. So it does probably make sense to hold off until the samples are cleaned up and we can put a checking system in place. For cases like these even an extremely strict and simple check (pixel-for-pixel identical) would suffice. The only counter-argument that comes to mind is that since works are often deleted from their original sources, we run a small risk of losing the opportunity by waiting. It's probably not so bad in this case because what we'd actually lose is really only the physical dimension metadata (and a chance to save a little bit of space).

Well, yeah -- that's the biggest crux with this whole "if we do do that sort of replacement" that I was talking about in previous posts. The only way we'd be able to do it safely is if we can ensure complete visual identity consistently. Perhaps it could be an option on the sidebar for users with approval privileges: Open up a dialogue box, put in a valid image URL or file, and compare it with the original and return some sort of % value and/or temporary image diff. Heck, come to think of it, such a tool on booru could be solely client-sided just like Diffchecker's image diff tool. As long as there's just some sort of way we can ensure that, then I see absolutely no problem with doing so, even with the extra metadata.

Personally, I do think you or someone else should upload those better versions, and under those provisions it'd actually be acceptable to. But as it stands, it is completely fair game for someone else to up a superior "duplicate" that is more savvy on filesize/lossless compression, as you say. That's actually the reason why I chose to upload a bunch of those karutamo images separately instead of replacing them: karutamo user:Mikaeri. Plus, no one's going to get the idea of uploading them from Pixiv again if say, they saw those images weren't "uploaded" yet.

I think we did have this debate a few times before, about whether to keep twitter :large png samples that were actually better than their :orig counterparts if only for their smaller filesize and same exact image data. And in the end we decided to just keep the worse one even though by all measures the :large would've been fine. Apparently, :orig png files are guaranteed to go through a round of bad compression, but :large png's less so? It's strange.

EDIT: Would also mention that such a tool would also be massively helpful for uploading revisions.

Updated

Some questions have been called up about this section:

What is typically discouraged:

  • Images of non-identical filetypes that aren't samples.
    • GIF/MP4 file → Ugoira. Ugoiras are not considered necessarily better than their GIF/MP4 counterparts (post #2625276). Upload it separately.

In the case of a user taking an ugoira from Pixiv, converting it to another filetype, uploading it and sourcing it to Pixiv, I would assume that it would be preferable to replace it using Danbooru's native "raw" ugoira system.

Trying to get some clarification on what to do with posts that have notes here. Initial post says that replacements are not allowed. Was not sure if that is in the sense that they won't go through at all (because at this point it looks like they do: but the notes have to be manually readjusted) or that they just shouldn't be done. Is there any way post replacements could work like the "Copy notes" function in adjusting the notes to the new resolution?

EB said:

Trying to get some clarification on what to do with posts that have notes here. Initial post says that replacements are not allowed. Was not sure if that is in the sense that they won't go through at all (because at this point it looks like they do: but the notes have to be manually readjusted) or that they just shouldn't be done. Is there any way post replacements could work like the "Copy notes" function in adjusting the notes to the new resolution?

Replacements used to not go through for posts with notes, but that's been fixed for some time, so I've removed that bit from the opening.

As for notes not being adjusted, I thought they used to do that? I've created issue #3815 for it.

Updated

RaisingK said:

As for notes not being adjusted, I thought they used to do that? I've created issue #3815 for it.

I had thought they had done it at one point too, but I'm not sure as I can't remember if I had replaced posts with notes on them before. In any case, this still seems to be an issue.

1 2 3 4