Donmai

[Prototype] User Report Ver 6.3

Posted under General

Latest Update

  • (2021-11-03)
    • Updated data for Oct 2021

Version History

Show

Ver 6.2 -> Ver 6.3

  • User reports are now CSV/JSON only.

Ver 6.1 -> Ver 6.2

  • Updated post appeals
    • Added reason/no reason
    • Fixed resolved not being recorded correctly

Ver 6.0 -> Ver 6.1

  • Added keeper report

Ver 5.4 -> Ver 6.0

  • Completely reworked most of the underlying code
    • Statistics are now stored in discrete day units
    • This will facilitate more frequent updates to the report (like weekly)
    • Greater flexibility in showing periods other than 30 days
  • Removed the comment score and favcount tables
  • Added table showing top 10 tags for a user's uploads
  • Adjusted Google-based analytics for issue #3336
    • Only 1% of page hits are being registered
  • Changed the format to percentage-based on comment post table as a trial

Ver 5.3 -> Ver 5.4

  • Added post replacements table

Ver 5.2 -> Ver 5.3

  • Added additional columns to pool data
    • Active, Description, Name
  • Added Comment Rankings secion
    • Shows the top 100 commented posts
    • Shows the top 10 tags for each category
      • Cutoff for this and all of the following was 100 comments
      • Order for this and all of the following was average comments per post
    • Shows comments by post ratings
    • Shows comments by post score
    • Shows comments by favorite count

Ver 5.1 -> Ver 5.2

  • Added forum topic analytic data

Ver 5.0 -> Ver 5.1

  • Added pool, user, and post analytic data

Ver 4.9 -> Ver 5.0

  • Added delta columns for uploads table <evazion: forum #118890>
    • Parent delta is the number of posts parented after upload
    • Rating delta is the number of posts with rating changes after upload
    • Tag delta is the number of tags added/removed after upload
      • This does not include metatags or request tags
  • Added score and favcount metrics to uploads table
  • Changed how upload tables are split
    • Instead of Builder+/Platinum-, it is now Contributor/Member
    • Removed queue bypass column from Member Upload table
  • Added additional links to the uploads table
    • Tag categories redirect to the related tags page, which shows the top 25 upload tags for that user over the previous month
  • Combined delete and undelete columns for notes and wiki pages
  • Added user feedback table
  • Truncate user names > 15 characters
  • Added prior column to request and translate tables
  • Removed '+' or '-' from all column names
    • Positive or adds are normal
    • Negative or removes are surrounded with parentheses '('')'
  • Added page analytics for wiki and artist
  • Removed intro and implementation details sections

Ver 4.8 -> Ver 4.9

  • Added description changes to Pools report

Ver 4.7 -> Ver 4.8

  • Added two new categories to Pools report <provence: forum #122362>
    • Obsolete adds and obsolete removes
    • These are adds or removes which were undone by another user at a later time
  • Added a new Supply vs Demand table

Ver 4.6 -> Ver 4.7

  • Added two new table types
    • Translator Tags table which tracks all of the adds/subtracts of certain translator tags. <Provence: From a Dmail>
    • Requests tables which track all of the adds/subtracts of non-translator *_requests tags.
      • Adds table is ordered by total tags added, showing those who are producing those requests.
      • Removes table is ordered by total tags removed, showing those who are fulfilling those requests.

Ver 4.5 -> Ver 4.6

  • Added Rank Difference for most tables
    • Difference between current month and prior month
    • Nonactivity in the prior month gets displayed as "None"
    • For the bottom/top upload tables:
      • Not making the 100 upload cutoff the prior month gets displayed as "Null"

Ver 4.4 -> Ver 4.5

Ver 4.3 -> Ver 4.4

  • Removed uploads from post table since uploads now have their own table
  • Added granularity to Add Tags and Remove Tags by splitting them up into GenTags, Chartags, CopyTags, ArtTags, EmptyTags
    • Empty Tags were the same as Tag Errors in the Uploads table (Now renamed in the Upload table as well)

Ver 4.2 -> Ver 4.3

  • Added top/bottom 25 upload tagger tables <kuuderes shadow: forum #118765>
  • Added count for positive/negative comments in comment table <Provence: forum #118825>

Ver 4.1 -> Ver 4.2

  • Correlated artist wiki page changes made through the artist interface <chodorov: forum #118651>
    • Added Wiki column to Artist table; includes Wiki Body edits through artist interface
    • Disregarded all Wiki edits for the Wiki table made through the artist interface

Ver 4.0 -> Ver 4.1

  • Removed alias hits from Tag Errors in Uploads table <provence: forum #118642>
  • Added column Removes in Uploads table <kuuderes shadow: forum #118660>
    • It disregards the following tag removes though: tagme, commentary, check_commentary, translated, check_translation, partially_translated, check_my_note, check_pixiv_source, *_request
  • Separated Builder+ & Platinum- Uploads table; lower cutoff for Platinum- <provence: forum #118661>
  • Added data duration to table notes <Jarlath: forum #118636>

Note: Removes were kepts separate from Tag Errors since some of the changes were subjective, e.g. hair color. Also, Tag Errors are currently mutually exclusive from the other tag categories, whereas Removes are not.

Ver 3.1 -> Ver 4.0

  • Added several new categories, to include: uploads, forum topics, tag implications, tag aliases, bulk update requests and post appeals. Added additional data to forum post and comment tables.

Ver 3.1 (forum #118437)

Ver 2.0 (forum #118420)

Ver 1.0 (forum #118371 and forum #118382)

Data

(gone due to forum post length limits... no ETA on return)

Raw Data

Google Drive (CSV)

Google Drive (JSON)

Code

https://github.com/BrokenEagle/PythonScripts

Updated

tapnek said:

Toks mentioned something about a similar tool being available to mods and above. Can we have that compared to your implementation?

I planned on submitting a GitHub issue this weekend. I waited because I wanted to have something solid to show first. My idea would be that they could make available what they already have, just updated once a day instead of on demand like Mods/Admins have. Then, they could make any incremental changes as needed.

Nitrogen09 said:

@BrokenEagle98
It seems that you forgot to include the "Note changes" category.

Thanks... things tend to get missed when you're reviewing your own work :p

I updated it just now.

Jarlath said:

Apparently, I'm a chatty bastard. And one who sticks with safe content.

This only goes back 60-90 days, right?

30 days, actually. I guess it might be good to include the number of days the report covers somewhere. I'll save that for tomorrow though. Danbooru practically dies whenever I tried to upload or update that first post (it's about 55KB), and I'm tired of fighting with it... :P

Updated

tapnek said:

Toks mentioned something about a similar tool being available to mods and above. Can we have that compared to your implementation?

Taken on 25th August at 4:22 CEST and I think the data on these reaches back to midnight of 25th of July also in CEST. (Underline means no upload limit, ✓ means approval rights, which have no meaning in presented data)
Part 1
Part 2
The dashboard also shows comments under the score 0 threshhold with at least 3 votes to them below these statistics. Then in a column next to these, there also are appeals (but only the ones on deleted posts are listed), most recent user records (10) and mod actions (also 10).
I'll also add that it defaults to minimum date from 2 days ago and to only show base users, but it's flexible enough. I'll also note that the dashboard also takes longer to load than all other pages, but that's likely due to it loading up data from several different routes, given the variety.

Question about the error category by uploads:
Does that also cover tags that are empty due to an alias request.
For example if I like using the pearl_bracelet tag and use it. But currently there is a BUR that request "alias pearl_bracelet -> beads_bracelet". Does that also cover this?

Jarlath said:

Apparently, I'm a chatty bastard. And one who sticks with safe content.

Nah. You're not the only one. Me too. It's not like I hate a pervy stuff, it"s just I'm not so fond with that kind of stuff.

And what does make this me? Well, lol. Someone needs to uploads those things anyway :< :P.

But I consider it very interesting that nearly no user with Platinum- appears on top at any of the lists (except Claverhouse in the Pools and multiple users in the Appeal list). So I guess that nearly all users who deserve a promotion are promoted.

Hm. Wasn't original idea that those reports exist to show users ready for promotion - that is, below Builder? Contributors, builders and admins are going to dominate the ratings simply because they have more tools available to do so. If reports are to be integrated, they'll have to have some parameters after all - namely, max user level displayed.

Also, thinking about it further, would this report make more sense if updated daily and discarded (as currently suggested), or monthly and stored? At a glance, second way seems to be more useful and fun to follow. Not only the change history can be seen this way (think of all the graphs that could be built with that), but data could also be aggregated over a few months if necessary, while placing relatively little load on the server. Maybe even go further and store weekly data, automatically aggregating 4 last weeks by default.

Before ideas skyrocket further though, we'd better ask @albert - would it be possible to store weekly aggregated data for user activity on reportbooru, and then use it for reports mentioned here? I'm pretty sure reportbooru was created exactly for this kind of stuff, but I'm still not sure what it's capable of. If we're going to join data over few weeks, data should be stored entirely without cutoff counts, so it might end up as quite a lot of rows - but probably not that much, since most users are lurkers anyway, so I doubt it ever goes over 10MB total. @BrokenEagle98, you have some source data to work with, correct? Can you estimate how many rows those reports would have if done weekly and with no cutoff? Last 4 weeks, if you can, it should be illustrative enough.

I thought that this whole thing is based on a monthly base. I guess this whole thing is fetching data over a month (i.e. 30 days).
And of course this should be stored, since one can then see where one user was active and understand better why they were promoted. At least that is how I wanted it in the first place.
About the first paragraph: Yeah, there are very few Platinum- users there. The idea was 1. give users an easy way to create feedback for Builder/Platinum- users and 2. help Platinum- users getting promoted.
So I guess two seperate tables would more sense, but also if we put everything in one big list, I see no much difference here.

The main things that make a big difference are:
- admins for things where admin privelages make a difference (tag implication/aliasing, post changes)
- whether or not they have unlimited upload permission for the uploads - where the top 3 places are all but impossible for those who don't have unlimited uploads, and I'd honestly say Rignak's upload tally is more impressive than even Provence's.

Incidentally, I checked Provence's record for the month before gaining unlimited uploads, and the figures were:
365 uploads
6 deleted

Impressive deletion ratio, but would be at the lower end of those rankings. And that's for the most prolific uploader on the site.

While whether or not someone has builder status is useful information for promotion purposes, it doesn't make much of a difference to the tallies. Or at least I haven't come across ways in which it does.

kuuderes_shadow said:

The main things that make a big difference are:
- admins for things where admin privelages make a difference (tag implication/aliasing, post changes)
- whether or not they have unlimited upload permission for the uploads - where the top 3 places are all but impossible for those who don't have unlimited uploads, and I'd honestly say Rignak's upload tally is more impressive than even Provence's.

Incidentally, I checked Provence's record for the month before gaining unlimited uploads, and the figures were:
365 uploads
6 deleted

Impressive deletion ratio, but would be at the lower end of those rankings. And that's for the most prolific uploader on the site.

That is true, yes. I might report them again to a moderator and discuss if they could get promoted to Builder level. The big deletion rate is a big obstacle, though. So the odds of success are there, but it is also very unceratin...

But I get what you mean: Lower the rate for Platinum- users in every category. Well, for uploads only, there is already a seperate list for Platnum- users. But for every other category....why not^^.

Updated

Type-kun said:

@BrokenEagle98, you have some source data to work with, correct? Can you estimate how many rows those reports would have if done weekly and with no cutoff? Last 4 weeks, if you can, it should be illustrative enough.

I gathered 30 days worth of data for each of the above reports. The following are the amount of rows for each category:

posts/uploads: 2213 (it was easier handling these two together for the data gathering stage; they are separated in the data parsing stage)
notes: 524
artist commentary: 320
pools: 558
artist: 184
wiki page: 223
forum topic: 54
forum post: 151
tag implication: 15
tag alias: 13
bulk update request: 22
post appeal: 69
comment: 1821

Also, for issue #2640, there is still no way to tell from the API whether someone has unlimited uploads/approval permissions. I made a comment of it on that issue, but there has still been no response. I'd rather not have to write or use an HTML parser to find that information out if at all possible.

Provence said:

Question about the error category by uploads:
Does that also cover tags that are empty due to an alias request.
For example if I like using the pearl_bracelet tag and use it. But currently there is a BUR that request "alias pearl_bracelet -> beads_bracelet". Does that also cover this?

If the alias is already in affect at the time you tagged the image, then no. If a alias went into effect after you tagged the image, then yes.

I hadn't thought of that case, but I can work on removing those. I just need to gather all of the aliases for that 30 day period and compare and contrast them to any "Tag Errors" that pop up...

chodorov said:

I'd like to point out editing the body text for an artist entry counts as a wiki edit. I think this is the only reason I made the ranking on wiki edits since most of mine are related to artists.

They're technically wiki edits, but should those be attributed as artist edits then...?

Updated

I've noticed that removal of copyright/translation/character/commentary requests are not included in the tag error column. This is obviously as it should be, but it might be helpful to have a list of things that are left out?

I also noted that it doesn't seem to have picked out an incorrect character tag here: http://danbooru.donmai.us/post_versions?search%5Bpost_id%5D=2462723
Or is a day or so before the update too late for it to count?

Updated

kuuderes_shadow said:

I've noticed that removal of copyright/translation/character/commentary requests are not included in the tag error column. This is obviously as it should be...

I also noted that it doesn't seem to have picked out an incorrect character tag here: http://danbooru.donmai.us/post_versions?search%5Bpost_id%5D=2462723
Or is a day or so before the update too late for it to count?

Tag errors as displayed above aren't all the tag errors per se, but all of the tags that were added but now have a count of zero, meaning they are no longer populated. This can happen with misspellings, or if the wrong tag is used, e.g. when I was testing this out, I saw the addition of "serious_face" being replaced by "serious".

It doesn't currently count the tagging of the wrong character, or copyright, or artist or anything else like that. I suppose the number of tags removed between the first version of the post and the current version of the post could be counted as a tag error. I'll need to go through the first hundred or so post versions and see if that holds true.

kuuderes_shadow said:

...but it might be helpful to have a list of things that are left out?

Do you mean posts that don't have the translation_request, commentary_request, artist_request, and so forth added by the original uploader?

Edit:

Forgot to mention, but Tag Errors also include unicode tags. Those shouldn't exist, since they're next to impossible for most people to type, and the ones I did find I replaced with a non-unicode version.

Updated

BrokenEagle98 said:

I gathered 30 days worth of data for each of the above reports. The following are the amount of rows for each category:

I actually wanted to see a weekly count distribution - a table for week 1, week 2 etc. That said, we can safely assume that there'll be about 6000 rows per week. That's 310k per year, not a small amount, but not that large either. Given its nature, it would be a well-balanced tree if indexed by date+user and should work fast.

BrokenEagle98 said:

Also, for issue #2640, there is still no way to tell from the API whether someone has unlimited uploads/approval permissions. I made a comment of it on that issue, but there has still been no response. I'd rather not have to write or use an HTML parser to find that information out if at all possible.

Albert's busy ironing out saved searches. I'm currently catching up on RoR and Git to be able to contribute directly rather than with pseudocode and ideas/issues, but it'll take some time 'till I'm ready. The remainder of issue #2640 is quite simple to fix, but it's something that needs testing, so I'm not going to fix it blindly. Once my dev environment is set up properly, small bugs will be squashed faster, hopefully.

1 2 3 4 5 15