
Tag for orphaned unsampled images

Posted under General

To my knowledge, ArtStation and Tumblr are the two common sources which previously had better quality images available but have removed such access. I suggest for the legacy posts added to the site before access was cutoff, we create a tag to indicate this, and document what exactly was available for new users who may not know where larger images are from that don't match their source (A user asked me about this in post #2744665). The defunct original image situations are described in part at artstation sample and tumblr sample

To go about this, I suggest orphaned_original or some other name if more popular (defunct, legacy?). We could either just have that umbrella tag or, as I suggest, in addition to this tag have site-specific tags such as orphaned_artstation_original and orphaned_tumblr_original and more as we come across them.

tapnek said:

I like the legacy prefix too. My question is how do we go about populating these tags?

I prefer it as well. I think it would be best to fill it through automated tagging. @BrokenEagle98, I imagine you'd be most qualified to run such a script, would you be able to in this case?

One difficulty that I just thought of is how to tell which posts have the Tumblr/raw or Artstation/original images. For both of those sources, only the post ID # and not the image link is (usually) recorded. Some of them can be determined by going through the replacement logs, but other than that, I'm not sure how to go about finding and tagging them.

I could tag all Tumblr posts and Artstation posts with legacy from before the original images were removed from those sites (only those with post ID links), but I wouldn't be able to be absolutely sure that those images were indeed of the raw and original sizes. Is that what we want to do?

BrokenEagle98 said:

I could tag all Tumblr posts and Artstation posts with legacy from before the original images were removed from those sites (only those with post ID links), but I wouldn't be able to be absolutely sure that those images were indeed of the raw and original sizes. Is that what we want to do?

I don't think so. A Tumblr/Artstation post that isn't a legacy original is a post of which a better version exists, which is good to know because we might eventually find it. Tagging those as if they were original makes us think we have the best when we don't.

You could tag everything above each site's maximum resolution. That would only be wrong on posts that are sourced wrongly or upscaled by the uploader or something. You'd miss some, but it's better than nothing?

You could also check the image currently available at the source, and tag if it's different. But I'm pretty sure I've seen tumblr images change after uploading before, so you might accidentally tag non-legacy images that way too.

If an image has one of your own md5 mismatch comments but no replacement, you know it's not a legacy original. You could use that to test for false positives with anything you're trying out.

☆♪ said:

You could tag everything above each site's maximum resolution. That would only be wrong on posts that are sourced wrongly or upscaled by the uploader or something. You'd miss some, but it's better than nothing?

I know the Tumblr max dimensions, but not the Artstation. Does it even have max dimensions?

You could also check the image currently available at the source, and tag if it's different. But I'm pretty sure I've seen tumblr images change after uploading before, so you might accidentally tag non-legacy images that way too.

Not sure what you mean here. The only thing I'll see at the source is an MD5 mismatch if it is a legacy (or not in the case of revisions), but no way to tell if it has always been an MD5 mismatch or if the lack of an available raw makes it an MD5 mismatch.

If an image has one of your own md5 mismatch comments but no replacement, you know it's not a legacy original. You could use that to test for false positives with anything you're trying out.

Some original Tumblr/Artstation posts got tagged for MD5 mismatch from before raws were removed since my script does a recheck every day of 1 week back and 1 month back.

BrokenEagle98 said:

Not sure what you mean here. The only thing I'll see at the source is an MD5 mismatch if it is a legacy (or not in the case of revisions), but no way to tell if it has always been an MD5 mismatch or if the lack of an available raw makes it an MD5 mismatch.

That's all I meant. I wasn't recommending it as is, just throwing out ideas in case someone could think of a way to intersect whatever information is available to us. I actually didn't even know Tumblr supported revisions, so that would probably have even more false positives than I expected.

I know the Tumblr max dimensions, but not the Artstation. Does it even have max dimensions?

No idea. IIRC @chinatsu was our resident Artstation expert?

Generally, I think it's better to have some originals untagged than to have non-originals tagged as them. As we come up with more methods of finding them, we can tag more of them.