Donmai

Tumblr sources (also bookmarklet-related)

Posted under General

tapnek said:

This solves the JPEG instead of PNG problem. Big question however is do images with raw in the "link" get around the 1920px tall and 1280px wide limits, i.e. not get resized?

Yes it does, I have replaced some posts with it already

Big if true. It does appear to solve the PNG->JPEG problem, at least in the few cases I've tested. Also I found a http://data.tumblr.com/ domain which, judging by the headers, is the Amazon S3 bucket that serves as the origin for the http://68.media.tumblr.com proxies. In other words, use http://data.tumblr.com - that's most likely where the original file is ultimately stored.

Big question however is do images with raw in the "link" get around the 1920px tall and 1280px wide limits, i.e. not get resized?

It looks that way. post #2762307: 960x1920 (1.5MB .png). Tumblr raw: 1500x3000 (3.1MB .png).

This is like, the one time I wish we had some sort of system for having 'dual' links... One for the direct image, another for the HTML link.

Tumblr doesn't delete images off their server even if the original post is deleted -- through enough effort you can still find bad tumblr id original sizes if someone has reblogged them, or you somehow have the direct image URL to it.

I tried doing a search through my browser history hopefully for direct image URLs saved from using the bookmarklet, but I've got nothing for some of them... Sigh. Is there anyway we can look through old upload entries (that contain what URL we fetched from) or are those routinely deleted? @evazion

This is like, the one time I wish we had some sort of system for having 'dual' links... One for the direct image, another for the HTML link.

It is a good idea, but the difficulty is with fixing all the places that assume a post only has one source... which is everything.

I tried doing a search through my browser history hopefully for direct image URLs saved from using the bookmarklet, but I've got nothing for some of them... Sigh. Is there anyway we can look through old upload entries (that contain what URL we fetched from) or are those routinely deleted? @evazion

/uploads is only kept for one day so old entries may or may not be in the backups, I'm not sure. I don't know how long the backups are retained either, even if they are in the backups. Would need to bug albert to see if he's able and willing to restore this data.

I really hope he's willing to. We can do so much with that information, and it'd pretty much eradicate all tumblr samples with bad tumblr id's.

Still though, one wonders... @worldendDominator how did you even find that? Were you just playing around with the URL? Because this isn't documented anywhere on the internet AFAIK, you're the first to find this.

Anyways, I'm going to hold off bumping topic #13646 if only because I don't want other users going gungho over replacing these images through a separate post. This is a job for solely approvers to handle. If I find a contributor is massively reuploading full-sizes when we have such a feature readily available, I will immediately issue a negative because we currently don't have any sort of established routine job to 'purge' image samples from the database.

So let me echo it loudly here:

Do not upload _raw versions of existing posts. Let an approver replace them.

Randeel is already at work, and so am I... Perhaps if BrokenEagle98 was promoted to approver we could get stuff done a lot quicker. I also assume reiyasona might be able to come up with a script later.

Updated

Probably, but it took years for both the twitter and tumblr full-sizes to come to light, I assume the same with pawoo.

In any case it's a pretty big discovery, this. How would the site-origin hierarchy for parent-child relationships be changed up by this?

Will depend entirely on either metadata or compression, if the two are found to be the exact same image. In terms of priority:

  • Most metadata
  • Best compression (no artifacts + low filesize)
  • Least corrupted (no corrupted blocks or trailing data, etc)
    • Ex. post #2566630, exact same image, but exiftool gives a warning: Warning: [minor] Trailer data after PNG IEND chunk

Recently there's been a trend to reupload revisions but users should know not to just blindly upload new images if they're found to be worse (more artifacts and what not). I have slight fears that people will just upload blindly to take credit for those things -- there are some artists that make it a strange habit to introduce more artifacts in a replacement without any change.

So this new information is pretty much putting me, among other approvers, on high alert.

Updated

BrokenEagle98 said:

Just FYI... I've started a full regular scan taking the above into account. It may take a while to complete though...

You should really get approver privs for this kind of thing. I think we'd be done a lot faster, haha. Anyways, contacted reiyasona and RaisingK about it in DM.

Note to self: will also require new edits to tumblr sample, image sample, and topic #13646 in addition to the fixes going on with the bookmarklet/upload page.

So currently we're in the midst of deciding whether _1280 should indeed be treated as a sample size or not... They should probably be since we do so for Twitter, but I understand the other problems with doing that since _1280 has been so widespread. Currently, the vast majority of _1280 "samples" match these two searches:

And some of these already have their full-sizes uploaded from pixiv, which would mean an immediate move to deletion for the inferior tumblr post if that's to be the case.

@D1ce This is an important discussion you might need to know, since you just replaced one manually.

@NWF_Renim @Type-kun @albert

Mikaeri said:

So currently we're in the midst of deciding whether _1280 should indeed be treated as a sample size or not... They should probably be since we do so for Twitter, but I understand the other problems with doing that since _1280 has been so widespread. Currently, the vast majority of _1280 "samples" match these two searches:

And some of these already have their full-sizes uploaded from pixiv, which would mean an immediate move to deletion for the inferior tumblr post if that's to be the case.

@D1ce This is an important discussion you might need to know, since you just replaced one manually.

@NWF_Renim @Type-kun @albert

Although I'm glad this resolves having to manually download and check for a correct PNG vs JPEG file, it looks like I'm going to be busy replacing a number of my uploads once again. These sample replacements (and re-replacements) are really adding up on my deletion counts. :/

D1ce said:

Although I'm glad this resolves having to manually download and check for a correct PNG vs JPEG file, it looks like I'm going to be busy replacing a number of my uploads once again. These sample replacements (and re-replacements) are really adding up on my deletion counts. :/

Just... hold off for a moment replacing them if you can. An approver will get to yours in due time. In the meantime, @BrokenEagle98 let's not tag them "tumblr samples" just yet, albeit you can comment that link in for now.

Replacing samples outside of the post replacement function is currently a fairly complicated issue since it inflates the deletion count for users, and opens up the possibility for stealing credit. Currently, we don't have any sort of regular cleanup job for these, so it's a really, really strange situation.

Updated

1 2 3 4 5 6