post #9000000 GET!

Correcting sources for bad_link posts

Posted under General

There are currently about 36k posts tagged with bad_link, and I've recently been adding proper source links for these posts when I have time (if I can find proper source links). However, I have a couple of questions.

1. Is there any sort of campaign to add correct sources for posts tagged with bad_link when possible? (I'm posting this with the assumption that there isn't, as I can't find any trace of one.)

2. Would it be possible to automate the correction of bad_link posts from drawr?
The main reason I'm asking this is because art from drawr is indexed on saucenao, with a few small gaps. However even if it would be possible to automate this somehow, there are some issues that I've noticed while correcting drawr source links, mainly relating to artist tags.
Some drawr posts have no artist tag, but aren't tagged with artist_request. Some of these posts are from artists who already have a tag, but might not have a link to their drawr account on their artist tag URL list. Other drawr posts have "incorrect" artist tags from situations where people didn't connect the dots between an already-tagged pixiv account and a drawr account with the same username. It's a bit of a mess.

3. Should posts tagged with bad_link from sites such as Twitter, Tumblr, and drawr be tagged with source_request if the proper source link can't be found?

Obviously, given factors such as artists deleting their accounts/artworks, the mass deletion of Tumblr blogs, and the fact that some sites that these posts are from don't have any sort of valid source URL that isn't a direct image link, it's likely that a decent portion of these posts can't be "fixed". However, I think that this is an issue that's worth shining a metaphorical spotlight on. If people are interested in helping to fix these posts, I can post some additional tools/advice that would be helpful.

Well, I've been having my bot add bad link to posts when they don't contain a valid source in an attempt to crowd source the finding of that information, since it's way too much for one to handle and I'm way too busy as is. If there was anything I could do that could be automated, I could look into coding something up for that. Basically, anything that doesn't require manual intervention or inspection.

BrokenEagle98 said:

Well, I've been having my bot add bad link to posts when they don't contain a valid source in an attempt to crowd source the finding of that information, since it's way too much for one to handle and I'm way too busy as is. If there was anything I could do that could be automated, I could look into coding something up for that. Basically, anything that doesn't require manual intervention or inspection.

Would it be possible and/or reasonable to replace drawr sources automatically (with the "Drawr ID" URL) based on saucenao hits?
If so, would it be possible to add source_request to any bad_link drawr posts that don't get any saucenao hits, as well as adding artist_request to any posts that are like the situations I mentioned in my original post?

Etou said:

Would it also be possible to automatically correct Nijie sources using saucenao hits, or would that require human discretion?

It looks like a lot of Nijie posts are indexed on ascii2d, so it should be possible to automatically correct some of those using MD5 hashes, if anyone here would be able to do that.

Etou said:

Would it be possible and/or reasonable to replace drawr sources automatically (with the "Drawr ID" URL) based on saucenao hits?
If so, would it be possible to add source_request to any bad_link drawr posts that don't get any saucenao hits, as well as adding artist_request to any posts that are like the situations I mentioned in my original post?

Etou said:

Would it also be possible to automatically correct Nijie sources using saucenao hits, or would that require human discretion?

Sorry, I saw this when you posted it, but then got distracted and forgot about it.

Anyways, I'm limited to 300 sauce per day, so SauceNAO isn't an avenue I can use at least.

Etou said:

It looks like a lot of Nijie posts are indexed on ascii2d, so it should be possible to automatically correct some of those using MD5 hashes, if anyone here would be able to do that.

Recently Ascii2D did something that has made it difficult/impossible to query using a bot. I think they're using some kind of captcha protection, since after a bunch of failed queries on my part, when I went to the site to check manually, it asked me to do a captcha before it would let me query, which isn't something it normally does.

Etou said:

It looks like a lot of Nijie posts are indexed on ascii2d, so it should be possible to automatically correct some of those using MD5 hashes, if anyone here would be able to do that.

Some, maybe. I'll add those to the list of things to check, but given what happened with my twitter sweep, I don't expect much success there, either. And most of those posts are bad id anyway.

BrokenEagle98 said:

Recently Ascii2D did something that has made it difficult/impossible to query using a bot. I think they're using some kind of captcha protection, since after a bunch of failed queries on my part, when I went to the site to check manually, it asked me to do a captcha before it would let me query, which isn't something it normally does.

I've gotten a captcha once before, but I don't remember the circumstances. Maybe my throttling is protecting me?

BrokenEagle98 said:

Sorry, I saw this when you posted it, but then got distracted and forgot about it.

Anyways, I'm limited to 300 sauce per day, so SauceNAO isn't an avenue I can use at least.

Forgive me if I'm missing something obvious, but why can't you just automatically correct 300 posts per day if that's what Saucenao limits you to?

RaisingK said:

Some, maybe. I'll add those to the list of things to check, but given what happened with my twitter sweep, I don't expect much success there, either. And most of those posts are bad id anyway.

Ascii2d seems to have Nijie more comprehensively indexed than Twitter, at least, and it lists MD5 hashes to boot, so it should be more effective than it was with bad link Twitter posts.

Thanks. I've corrected all the remaining ones that I could find a source for through Saucenao. There are ~44 posts remaining, including three from Horne. I'll try to see if I can correct any of the others based on titles/commentary and clues I can get from the Wayback Machine.


punished_K said:

Should I change these to page URL instead?

Yes. Direct image URLs should only be used on supported sites which covert automatically to the page URL on Danbooru's post view (such as Pixiv). Otherwise, they're bad link.

What should I do if I heavily suspect a bad link to have a bad source? For example, post #2243782 - I checked SauceNao and browsed through the artist's Tumblr accounts - no avail, and Ascii2D just shows a dead Twitter link with a different hash. Since the artwork is slightly NSFW, it could easily have been nuked off Tumblr more wholly due to the porn ban, too.


Also, in general, it seems like it might be possible to retrieve at least some Tumblr posts nowadays - an example would be post #285677, where the previous source:

Actually contained a link above the image to the source Tumblr post. It seems to not work if the Tumblr post/blog was deleted, though (the field where the link would be is simply blank). This feature is quite new, and it definitely won't work for source:*data.tumblr* posts, but it still should be able to make the task a little bit more possible.

I wonder if it would be possible to somehow automate the task? There are around 5.6k posts for the search bad_link source:*tumblr* -third-party_source -source:*data.tumblr*, and while I don't expect most of them to be still active, getting proper sources for just a few percent of that would be a gain.


Additionally, I'm wondering what'd be more favorable in the case of Tumblr posts that *would* end up with a bad_id - a direct link to the image, or a reblog?
A direct link is near-guaranteed to be functional even if the post/account is deleted, assuming it does not break Tumblr's rules, though it will not point at the original post or artist anymore.
A reblog can still disappear if the reblogger changes username, deletes the reblog or their entire blog, but it would at least still display the OP's old @ - like in this post, along with commentary/other images in the post.


KagayakuShiningGate said:

What should I do if I heavily suspect a bad link to have a bad source?

I think what you're thinking is bad id, not bad source - the latter would suggest that you're thinking about a source that's close to, but not exactly, the source (like just linking to the artist's Tumblr, not the post itself), when what you're saying suggests that you're concerned that the source itself is now gone, which would fall under the former.

In a case like that, there's nothing we can do with the links we have on hand. By tagging a bad link post with bad id, the implication is that the direct image link itself is also now dead (and we've got sources like that), when the direct image link in post #2243782 still functions. If you want to try and find the original post, consider using methods such as the Wayback Machine if possible.

1 2