Is there a way to detect if a *_(cosplay) tag is missing the corresponding cosplay tag?
*_(cosplay) -cosplay won't work, due to the way wildcards work, and searching every single one by hand makes things like topic #13313 pretty much impossible.
Posted under General
Is there a way to detect if a *_(cosplay) tag is missing the corresponding cosplay tag?
*_(cosplay) -cosplay won't work, due to the way wildcards work, and searching every single one by hand makes things like topic #13313 pretty much impossible.
It's possible using the BigQuery data dump from topic #12774. Not particularly easy, to be fair, but possible. Here's the query, it's effectively a *_(cosplay) -cosplay search:
BigQuery: https://bigquery.cloud.google.com/savedquery/657582419813:dbddc4af3565484c9773d076744061be
SELECT CONCAT("post #", STRING(id)), CONCAT("http://danbooru.donmai.us/posts/", STRING(id)) FROM [danbooru-data:danbooru.posts] WHERE id IN (SELECT id FROM [danbooru-data:danbooru.posts] WHERE REGEXP_MATCH(tags.name, r'.*_\(cosplay\)')) AND id NOT IN (SELECT id FROM [danbooru-data:danbooru.posts] WHERE tags.name = 'cosplay') AND TRUE
And here are the results: https://docs.google.com/spreadsheets/d/1UXnVGTQ6DUkMh5elzWaEf7luan3MxYXLGIHVb-EzJWw/edit?usp=sharing. Turns out there are a lot, 5300 posts. It will probably take a script to fix all these.
So there are ~6000 *_(cosplay) tags, but only ~150 with more than 50 posts. I suppose implicating those wouldn't be unmanageable. Although I'm not sure how much it would help, I think the long tail of *_(cosplay) tags with only a few posts are where most of the missing cosplay tags come from.
Just finished tagging all of the *_(cosplay) tags with cosplay.
I built my own mass updater a while ago, so things like this are pretty easy, plus it takes in the same format as the bulk_update_requests, i.e. it handles the imply, alias, or update format, although imply and alias are really just a shortcut for certain forms of update.
Forgot to make a note of it, but it turned out to be ~5000 posts that ended up getting edited. Since *_(cosplay) is such a special tagging (in that it's acknowledged that every *_(cosplay) should also have cosplay), should some kind of daily or weekly maintenance task be set up to specifically handle this tagging issue ...? I can't see developing a wildcard implication function to handle this solitary issue, yet the amount of untagged posts will just continue to pile up again.
Updated
That's a good idea, and it's not hard to do either. Submitted a patch (issue #2753) for it; it autotags *_(cosplay) in the same way that resolutions / file types are recognized and autotagged.
Here's the converse search, cosplay -*_(cosplay) (i.e. things tagged cosplay but missing a character_(cosplay) tag):
https://bigquery.cloud.google.com/savedquery/657582419813:e5f7943e727445fabef2a72d7260f1dc
https://docs.google.com/spreadsheets/d/1PNJxFCwdvC1L-6eiR0A6mSIJCPGPzMIcGA9IDMn9wto/edit?usp=sharing
SELECT CONCAT("post #", STRING(id)) AS post, CONCAT("http://danbooru.donmai.us/posts/", STRING(id)) AS url, GROUP_CONCAT_UNQUOTED(CASE WHEN tags.category = 4 THEN tags.name END, " ") AS tags FROM [danbooru-data:danbooru.posts] WHERE id NOT IN (SELECT id FROM [danbooru-data:danbooru.posts] WHERE REGEXP_MATCH(tags.name, r'.*_\(cosplay\)')) AND id IN (SELECT id FROM [danbooru-data:danbooru.posts] WHERE tags.name = 'cosplay') AND TRUE GROUP BY id, post, url ORDER BY id DESC
Yeesh, you just had to bump up the difficulty ... :p
Unfortunately, all of those will require manual editing. That is, unless someone can program an AI with pattern recognition capabilities equal to what humans have...
Just a quick question about issue #2753 though... does it only update upon upload, or does it also update on post editing as well...?
It works whenever a post is saved, just before implications are added. So effectively it is like a wildcard implication, except hardcoded instead of something more general. It will affect existing posts, but they will have to be force updated with a dummy edit to make them add the character tags.
Pull request #2753 went live today, solving this situation.
BrokenEagle98 said:
Pull request #2753 went live today, solving this situation.
How about older posts? Are those auto-retagged?