Donmai

Image Sample Cleanup Project

Posted under General

BrokenEagle98 said:

I created a new tag, protected_link, for those posts that I'm unable to access to verify. The examples I've found so far include protected tweets.

This is a good measure. I've added some posts to the tag. There are some nice mass edits that could be done such as adding the tag to all posts sourced from an account made private.

I think we could also have tags for the following manually confirmed instances:

  • Deleted or suspended account
  • Individual deleted post (account still active)
    • deleted post (though this could be confused for Danbooru's deletion status)

All of this can be a subset of bad ID. It would be useful to know what makes something is a bad ID.

Updated

So you are sugesting that there are two types of Bad ID:
1. The artist deleted the account completely -> deleted account
2. The artist only deleted that one post -> deleted link

The third is not bad id, but tangentially related:
3. Only visible to certain users (for example that follow the artist -> protected image/link

Did I get that right? If yes, then +1. Run another script.

Provence said:

So you are sugesting that there are two types of Bad ID:
1. The artist deleted the account completely -> deleted account
2. The artist only deleted that one post -> deleted link

👍

The third is not bad id, but tangentially related:
3. Only visible to certain users (for example that follow the artist -> protected image/link

Did I get that right? If yes, then +1. Run another script.

Yes. As far as if it is a type of "bad ID" I guess this is a minor philosophical question... if something is visible to a select few but not normal users is it a bad ID? I think so since the default is that you cannot see it and we may or may not get lucky and find someone who has permissions from an artist. So this would be a third type of bad ID in my opinion.

And as a pedantic aesthetic note, I would say for no. 2., deleted source should be the name.

That is the follow-up question :P.
I mean, it probably doesn't help and that distinction is certainly there, but it feels constructed because a deleted artist would mean that we can edit the artist's wiki page. That has been done so before and I believe most work was done by @henmere ^^.

I feel that not being accessible is what makes something a bad link. There could be different causes for this, such as account being deleted, post being deleted, or account or post being made private. If you aren't convinced by that I must relent because it seems straightforward and intuitive to me.

As for why have a deleted source tag, it just seems like it could be useful. If not then I guess it's not.

Well, I think that our bad id tag covers both instances.
If the artist made their profile private or deleted it, then one should update the artist's wiki page here on Danbooru. Our goal is still to make as few tags as possible, especially if concepts are covered by other things.

Just to give some information on what goes into determining a bad id, the two most common cases are:

  • Post source 404's
  • Every size of an image source 404's

Some instances that I've manually added included orphaned image links, i.e. the post was deleted but the image link is still good. This was done only for sources though where it is impossible to determine what the other size image links are without the post link, such as Tinami.

I created a wiki for inactive account and protected link. Give them a look and change as needed. I left room for change as needed.

Also, regarding the protected link tag, I have no qualms against that being made a subset of bad id. I just assumed that there would be more argument against it, since it's not truly known what the state of that post is without being able to access it.

Luckily, it's easy enough to change at this point even without a script. Just add bad_twitter_id to all of the posts in the tag search source:https://twitter.com/ protected_link.

Provence said:

@BrokenEagle98
Please don't go out and tag something that is probably not even needed.
Which means waiting out the discussion.
One artist you have just edited has this info in their profile:
http://danbooru.donmai.us/artists/117400
It's written that the Twitter account is suspended.

I'd say that this is more a problem of the artist database than a tag problem. The artist database should be updated with the necessary information.

Well, the old links can still be useful to determine a pattern between an artist's accounts (Whether said person is actually the artist or not, as often seems to happen when artists create separate R-18 accounts without linking back to their main) and for retrieving the artist name with an old or perhaps dead link (More really for curious users who find collections that aren't kept updated outside Danbooru and want to find the artist with said link).

Some CSS code that Nitrogen09 came up with (topic #9662/p7) made me think about running my scripts on a daily basis, so that any image samples get tagged earlier to help the moderators. Therefore, I've modified my scripts so that they can have ranges. For the daily run, I'll only have them investigate the previous 3 days.

During the test mockup, it already caught 2 image samples (image_sample age:..3d) and 75 MD5 mismatches (md5_mismatch age:..3d). Twitter seems to be the worst offender for changing images.

Also, I'll still do the complete run every month or so.

BrokenEagle98 said:

Some CSS code that Nitrogen09 came up with (topic #9662/p7) made me think about running my scripts on a daily basis, so that any image samples get tagged earlier to help the moderators. Therefore, I've modified my scripts so that they can have ranges. For the daily run, I'll only have them investigate the previous 3 days.

During the test mockup, it already caught 2 image samples (image_sample age:..3d) and 75 MD5 mismatches (md5_mismatch age:..3d). Twitter seems to be the worst offender for changing images.

Also, I'll still do the complete run every month or so.

Would be appreciated.
The load of approved samples should be minimized then.

Is a flag warranted in the case of parent:2537699. Notice how it's marked as an image sample but the watermark on the images are different... @BrokenEagle98 could your script have produced false positive for DeviantArt revisions? -- I notice it's the artstation version flagged, sorry!

Anyway, should this image be flagged? I actually think the older watermark is less intrusive lacking the uggo Patreon logo.

Okayy so ignore all that because I misunderstood something. Revised posts are marked MD5 mismatch such as with post #2456560 which was revised to have the ugly watermark.

Updated

1 2 3 4 5 6 7 8 14