Donmai

[Prototype] Wiki Link Report

Posted under General

Intro

Basically, anytime an alias is created or a wiki gets renamed, it potentially leaves wiki links that need to be updated to the new location. I've thought about this for a while, and unfortunately they can only be updated manually since we're dealing with language. Just doing a simple replacement could potentially leave an awkward or incorrect sentence in place.

I've started cleaning up these links, and I've already accumulated >2000 wiki edits after only going through a few years of aliases/renames (Wiki page report). This shows the amount of bad links that can build up over time

Data

Period: 2017-03-28T00:00:00.000Z - 2017-03-29T00:00:00.000Z

Orig NameNew NameAlias/ RenameWikis to update
bunny ears gesturebunny ears prankAlias(v)
nanashi (shirogane usagi) (0)nanashi (ganesagi) (138)Rename(list_of_original_characters)

Instructions

The following lists the particulars of each Change type along with any caveats.

Alias

These are detected when tag aliases get approved.

  • Orig Name: Alias antecedent
  • New Name: Alias consequent

100% straight-up. There is no ambiguities with this one. Every wiki link or pool name in Pages to Update should replace the Orig Name with the New Name

Rename

These are detected when wiki pages get renamed.

  • Orig Name: Original wiki page name
    • The number in parentheses indicates how many posts are tagged with the original wiki name.
  • New Name: New wiki page name
    • The number in parentheses indicates how many posts are tagged with the new wiki name.

There are some ambiguities with this one. If the change is legitimate, then it should be treated like an alias. However, some users recycle wiki pages, where the original name has nothing to do with the new name. Additionally, some times the posts tagged with the Orig Name haven't been switched over to the New Name, which can be seen by the numbers in parentheses. These cases will be indicated, even if there are no Pages to Update.

Replace

These are detected when one tag gets completely replaced by another. This is done by keeping a daily log of all active tags, and comparing one day with the previous day. All tags that were active previously but not currently are candidates. All tags added at the same time as the candidate tags are potential names for the replacement tag.

  • Orig Name: Removed tag
  • New Name: List of added tags
    • The number in parentheses indicates how many post versions the removed tag has in common with the added tag

There are a lot more ambiguities with this one. Legitimate replacements are pretty straight forward, however depending on all the other tags changed by the tag gardener at the same time, there might be some noise in the New Name column that needs to be sorted through. Additionally, a tag removal instead of a replace might produce no legitimate new names, usually since it is an ambiguous tag that is no longer used. For these, a little detective work is necessary to determine the correct new name in the Pages to Update, or potentially removing those entries altogether.

Delete

These are detected when one a wiki page gets deleted

  • Orig Name: The deleted wiki page's name
  • New Name: Not used

There are some ambiguities with this one, but usually it means removing all instances of Orig Name. Additionally, there might be some posts still tagged with the old name that need to be cleaned up.

Final

The goal would be to eventually hand something over to get implemented with Reportbooru/DanbooruBot so that the above data and the cleanup would not dependent on one person. It currently only does a day of aliases/renames, but there may not be enough changes for that so it could be changed to a week.

Unlike with my other report (topic #13112), I plan to create a new forum post for each new set of data. If it's desired, I could also mark out the wikis that have already been handled.

Edit:
  • (2017-08-31) Added an instructions section so that it's understood what all of the different columns and numbers mean.

Updated

evazion said:

Everything up to here is handled.

Pool names are another thing that could be checked. If an artist name changes, there may be pools that need to be renamed too.

Ah, good point... I hadn't thought of that. That could potentially be a little more difficult though due to the nonstandard naming scheme for pools, mostly due to artist qualifiers. When an artist has one, they sometimes get added to the pool qualifer and sometimes they don't.

At the very least though, I could search the pool descriptions for wiki links and use that to hopefully find most of the pools that need their name changed.

Daily Report (2017-04-06)

Period: 2017-04-06T00:00:00.000Z - 2017-04-05T00:00:00.000Z

Orig NameNew NameAlias/ RenameWikis to update
hojo (0)professor_hojo (0)FixedN/A
houren yabusame (60)houlen yabusame (0)Rename(shion_(len'en)), (brilliant_pagoda_or_haze_castle), (kuzu_suzumi), (senri_tsurubami), (reactivate_majestical_imperial), (earthen_miraculous_sword), (earthen_miraculous_sword), (evanescent_existence), (evanescent_existence), (evanescent_existence), (enraku_tsubakura), (len'en)

Updated

For today's report, I noticed that someone had renamed a wiki without moving the tags. To indicate as much, I now add the post count for the antecedent and consequent tags on renames. The steps to take would be to either revert the wiki name to its original, or update all of the tags and wikis to the new name.

Would it help to add instructions when the above occurs though?

Example:

Partial rename detected!
Update the posts and wikis with the new tag or revert the wiki name.

Thoughts?

I think it would be fairly easy to add a validation server-side that prevents you from renaming a wiki with a non-empty tag. Possibly it could be optional like the secondary validations for bulk update requests are. I will make an issue for it.

EDIT: issue #2964.

Updated

Daily Report (2017-04-10)

Period: 2017-04-10T00:00:00.000Z - 2017-04-09T00:00:00.000Z

Orig NameNew NameAlias/ RenameWikis to update
kabankaban (kemono friends)Alias(lucky_beast_(kemono_friends)), (serval_(kemono_friends)), (uchida_aya), W(pool 12141)
shabushabu yarou (0)piroya (shabushabu) (8)Rename(shabushabu_yaro), (piroya_(shabushabu))
detective conanN/ADeleteR(pool 5256), R(pool 5255)
animal crossingN/ADelete(freya)

For pools: R = pool rename, W = wiki rename

Updated

With the above report, I've started to incorporate the tracking of tag replacements, i.e. replacing one tag with another. Example from the above report is uss_indiana_(bb-58) was replaced with uss_indiana_(bb-50).

I've also started tracking tag searches (i.e. {{tag_search}}) in addition to wiki links, since these will also require renaming if a tag gets changed.

Note: Only tag replacements that affect a wiki link or tag search are shown.

Tag Tracking Details

Since tags aren't versioned, this is being tracked by storing the details of tag counts at the end of the day GMT. Tags that become empty (tag_count==0) from the previous day are "antecedent" candidates, and new/repopulated tags (tag_count >0) from the day before are "consequent" candidates. Further refining is done by checking each post version where the "consequent" was removed to see if any of the "antecedents" was added at the same time.

Admittedly there can still be hits or misses with the above, in addition to multiple "consequent" candidates for a single "antecedent", but I'm hoping to refine the process as I go along.

Daily Report (2017-04-16)

Period: 2017-04-16T00:00:00.000Z - 2017-04-15T00:00:00.000Z

Orig NameNew NameChangePages to update
gesu02 (0)suzutaka (ringo kakigoori) (10)Rename(suzutaka_(ringo_kakigoori))
aisawa (ais0511) (0)aisawa natsu (21)Rename(aisawa_natsu)
unajuuunajuu_(set_mk) (20)Replace(juubako)
cici_(monster_musume)shiishii (4)Replace(monster_musume_no_iru_nichijou)
mizuoka_(curry)curry_fuumi (11)Replace(curry_fuumi)
tachitachi_(weapon) (14)Replace(tachi_(weapon)), (tachiuo), (shokudaikiri_mitsutada)

For wikis: ( ) = wiki-link, { }= tag-search

Updated

1 2 3 4 5 47