Donmai

category variant_set -> meta

Posted under Tags

I just realized this tag was mass added to posts with only one or two children. That's absurd. At a minimum there should be 3 or 4 variant pics for this tag to qualify, otherwise it's just pointless padding.

As a side note, the issue of children as duplicates vs variants is something better tackled as site code, I'm deeply skeptical of it working as a tag.

Updated

nonamethanks said:

I'm deeply skeptical of it working as a tag.

Speaking as the creator of this tag; First some clarification:
The name of the tag and the definition is something that I've shamelessly copied from E-Hentai. I've then padded out the wiki page a little bit with examples and some common things to look for as additional clarification.

The main reason I created this tag is because I'm somewhat unhappy with the browsing experience whenever sets like this are being uploaded. They are usually uploaded in big batches, and whenever I'm searching for a specific post and I know that it's not part of a variant_set being able to cut this unrelated fluff out from the results is very useful imho.

Sidenote: I'm absolutely NOT against these posts existing to begin with. Some artists (aki99, mizumizuni, sinensian, just to name a few) make some excellent stuff and I would be devastated to see their stuff being removed.

nonamethanks said:

At a minimum there should be 3 or 4 variant pics for this tag to qualify

While the limit is certainly up for debate, I personally would even argue the opposite and say that just 2 posts (1 parent + 1 child) should already be enough to warrant the addition of this tag.

For example post #5149076
A simple 2 post comic where one posts follows the other as a simple sequential work.

As opposed to post #5569841
Which is also a sequential work of 2 posts, but in this case the image itself is (save for a few minor changes in expression and her hand) virtually identical.
There is a visible and measurable difference between these two sets and I therefore argue that the tag should be applied.

Now I can already hear people groan and say "Just stop complaining and use the comic tag in your search." Note that post #5569841/post #5569842 don't have that tag. Besides the browsing point mentioned earlier, there is another thing that I'm unhappy with and that is the current use of parent/child and pools in relation to sequential works. I will now go on a minor-ish tangent, but please bear with me.

If we go to post #5828417 which is a sequential work spread over 3 posts, note that none of these have been tagged with comic.
And I'm willing to bet that this isn't the only case on Danbooru. One of the causes that contributes to this is that it's far easier to create a parent-child relationship (add postID to "Parent" field -> ๐ŸŽ‰) as opposed to a pool (Pools -> New -> Name/Description -> Series or Collection -> Create Pool -> Upload Post -> Add post to Pool -> ๐ŸŽ‰) and secondly and far more important in my opinion is the difference in browsing experience.

Compare these two sets of sequential works:
post #5828417: Thumbnails for all posts that are part of the chain ๐ŸŒˆ The ability jump wherever you want ๐ŸŒˆ This comic is your treasure ๐ŸŒˆ Where do you want to go today?
post #5667102: Pool name and a <<Prev and Next>> button. Now fuck off.

Thank you for listening to my tangent. Now lets return to the main topic because I see that people want to know how the f- this ties into variant_set

I would like to point at one of the biggest parent-child spaghetti monsters here on Danbooru: post #4960073
I kid you not that I've spend a lot of time thinking on how to handle this utter mess of a relationship and I still don't have a definite answer.
There is a sequence, but there are also 5 different costumes all on the same base image. Even worse: there is a censored version that was uploaded before which has been haphazardly stitched into it which results in post #4965448. The sorting of posts in the navigation bar of parent-children is based on upload date so changing that is at the moment impossible. I think we all agree that this is not ideal.

Just to experiment, when I added post #5409701 and other related posts to pool #16542 I had to take care of uploading everything in the right order to not mess up the ordering. I think that the result is pretty clean and clear for users. But it still isn't ideal in my opinion. So, to make a very long story short:

nonamethanks said:

the issue of children as duplicates vs variants is something better tackled as site code,

I absolutely agree 100%. And that sounds like something that will require quite a bit of work. And I think that the variant_set tag will help us with this later on by laying down some groundwork beforehand. Will this tag still be needed in the future when this is completed? Probably not.
But for now I think that variant_set is at the very least a useful tag to have around.

nonamethanks said:

BUR #13829 has been rejected.

mass update variant_set child_count:<3 pool:none -> -variant_set

Something is going wrong here because variant_set child_count:<3 pool:none is also returning the entire chain of post #5867310 which most certainly should fall within the 4 pics minimum.

Yeah, I just realized the BUR won't work because it catches children of long chains. It's not possible to automate it that easily.

I'm kinda annoyed at the existence of yet another meta tag that doesn't really aid in searches. At best this would be helpful for blacklisting huge variant sets, but you'd have to exclude parents from it for this to work. I'm also really skeptical that our taggers will use it properly, and I fully expect some random genius to mass add it to thousands of duplicates without even checking in a few months at most.

I dislike the idea of using this tag for just two posts, but I want more input from other users on this.

The objection to including sets of less than 4 posts seems completely arbitrary.

Do you have an actual argument for excluding sets of 2 and 3? So far the only stated reason is "I don't like the idea". There's also a claim that it would amount to tag padding if two-post sets are included, but without rationale that is not an argument in and of itself.

One use of the tag is to add something like -(variant_set parent:any) to a search returning many variant sets to limit results to one post per set. In this context excluding some sets based on their size makes no sense at all.

Long term I agree that a solution at the software level is better, but that ties into multiple large reworks like anonymous pools (replacing parent-child relationships, potentially with a type attribute like comic/variant set/etc) and multiple image assets per post (for duplicates). In the meantime variant_set can be useful, and should help with migration once a long term solution is in place.

tichusu said:

Do you have an actual argument for excluding sets of 2 and 3?

Because the amount of posts that would potentially qualify is large enough that this tag would probably go into the hundreds of thousands of posts, which is not ideal for a temporary solution. There's 1.4M posts with either a parent or a child, I'm not sure what percentage is duplicates but it would still mean a huge amount of posts having to be tagged with this tag, if even a single parent and child set is enough to qualify.

Four as a minimum makes sense to me. The tag is variant_set, not variant - it's for posts with a large number of variations. If it's for any post with more than one version then it effectively serves no purpose; users can't use it to easily search or blacklist posts with 10 children which is essentially its main purpose.

I feel the use case for variant set isn't significant enough to justify its existence, given metatags serve as a reasonable alternative. It'd be better to nuke it now before people invest their time, and the server invests its processing, into adding and later removing it.

nonamethanks said:

Because the amount of posts that would potentially qualify is large enough that this tag would probably go into the hundreds of thousands of posts, which is not ideal for a temporary solution.

Considering how things have a tendency to catch fire, and putting those out has higher priority than larger planned improvements, "temporary" may well mean months or even years.

There's 1.4M posts with either a parent or a child, I'm not sure what percentage is duplicates but it would still mean a huge amount of posts having to be tagged with this tag, if even a single parent and child set is enough to qualify.

Two-post (and even three-post) sets are unlikely to have an extreme effect on post count. Remember that in terms of post count, a single 10-post set equals 5 2-post sets. So while 2-post sets are more common, it evens out to some extent.

Here's some actual data, although the sample size is unfortunately way too small (so not necessarily representative of the whole site), and could easily have been affected by even a single uploader dumping sets. The popularity among artists for producing variant sets and among users for uploading them has likely also changed over time. That said, I don't think anyone is interested in categorizing tens of thousands just to gather fully representative statistics.

Out of the first 193 posts from (child:any or parent:any) id:..5000000 limit:200 status:any:

TypePosts%
variant_set11961.66%
other relation4623.83%
revision178.81%
duplicate115.70%

Not included are 6 posts I don't have permission to view, and 1 post I missed somewhere (not re-checking all of them to fix it).

"Other relation" includes things like different crops, similar theme, sequence, sketch/finished, still/animation, etc.

If those numbers are even close to representative that'd mean 870k posts, which while still large, is significantly less than 1.4M.

Looking at all the unique variant sets among those posts we have:

Child CountSetsPosts in those sets% of total posts
1357046.67%
2103020.00%
3285.33%
431510.00%
5164.00%
632114.00%

Note that the post count in this table is not comparable to the previous table, as it only counts unique sets and also counts posts not among the 193 in the first table. The long tail of parents with high child counts is also underrepresented (at the most extreme end, there's a variant set of 41 posts on the site).

While I don't think it can tell us much (since distribution of child counts for variant sets may not match the average across all posts), may as well include the data since I collected it:

Distribution of child counts
Child countParents with child countTotal posts represented
1 473775 947550
2 78016 234048
3 29785 119140
4 9504 47520
5 4951 29706
6 2387 16709
7 1544 12352
8 829 7461
9 589 5890
10 327 3597
11 262 3144
12 144 1872
13 94 1316
14 96 1440
15 76 1216
16 40 680
17 41 738
18 21 399
19 25 500
20 23 483
21 21 462
22 11 253
23 15 360
24 9 225
25 9 234
26 4 108
27 6 168
28 4 116
29 1 30
30 6 186
31 3 96
32 1 33
33 3 102
34 1 35
35 1 36
40 1 41

And on a tangent for the curious, the parents with highest child counts on the site (since the child_count searches have a tendency to time out):

Parents with the highest child counts
Child countParents
40 id:2917014
35 id:4736873
34 id:5586176
33 id:470889,2268443,1628100
32 id:1406657
31 id:1723480,2195749,1472505
30 id:2340565,4143842,13612,1898077,5460173,2225615
29 id:1628104
28 id:4853135,2235298,1969028,4091955
27 id:4862830,2480047,2050278,150861,897040,2163378
26 id:154356,2904642,1429783,2462631
25 id:4893650,5595163,3604907,4475959,1908986,4199224,202255,4971426,2899999
24 id:1555220,4995048,1440660,3739661,2551948,766713,817833,147625,3770642
23 id:220096,4570880,4628098,3065003,303048,424215,3143632,4561350,3076823,5544900,5596533,425629,2756787,4869956,566175
22 id:5749089,1004583,2006030,2664773,1760347,3166146,1908871,1486919,3361054,81599,1677432
21 id:1183916,3792164,5334340,4127826,377666,440187,1906348,2438807,302152,1363431,42184,5303704,790446,11553,156266,1889296,793593,5588554,1753256,4513445,151481
20 id:2857830,5763100,4856192,1596484,3139477,2125770,3454706,2517461,850943,1908382,2904521,5009358,5115567,2423955,2001946,3477837,2984392,1577189,1062605,1949422,1233882,4960073,1918101
19 id:5475582,4778123,1954302,1371614,5152548,888475,241717,1830688,4950638,5863577,5474855,4802160,1060945,411610,5868462,5178529,5430622,1174393,5026255,5274804,3989057,172738,1744264,2527583,5798193
18 id:2078186,2878116,4670058,633388,609071,5744030,5587867,1982228,3269061,2272685,2089506,2156785,141440,4535038,2496503,5764738,2363935,1385925,4535546,56701,2636524
17 id:304555,1620044,1621813,4442348,2185764,5266555,2483250,1908957,3492503,1840628,4118384,5357437,3353716,1628693,3472978,167965,4103702,4149624,2707789,5238995,2417288,3950532,3354208,4649062,153608,5208685,3305804,4958447,2818280,1429065,2490316,4819602,4516449,4906413,1977063,2145114,5740746,4076600,2013670,4513729,5588644
16 id:5373731,1708899,1380807,4596156,5142304,5642766,4835562,4937976,5594675,4714639,4902749,3072284,316112,4883688,4100505,1846513,3246395,2279894,3742341,179089,108270,2220998,4331885,2400519,2451431,3314517,5485144,2611286,1225010,4923270,291050,1591332,531715,4893927,5871377,2224890,2532934,1429610,4307641,3246384

AngryZapdos said:

Four as a minimum makes sense to me. The tag is variant_set, not variant - it's for posts with a large number of variations. If it's for any post with more than one version then it effectively serves no purpose; users can't use it to easily search or blacklist posts with 10 children which is essentially its main purpose.

Why? What actual use case is ruined by including 2- and 3-post sets in the tag?

I previously mentioned a use case (that I'd actually use if the tag was largely populated) for which it would be less useful if 2- and 3-post sets were arbitrarily excluded.

Mexiguy said:

I think the tag has its uses, but adding it to single child posts or even 2-child posts completely defeats the purpose

Why? How does that defeat the purpose?

Obst said:

I feel the use case for variant set isn't significant enough to justify its existence,

That may well be. In the end it's a value judgement for @evazion.

...but:

given metatags serve as a reasonable alternative.

What alternative? So far, the only meta tag being discussed here has been variant_set itself.

1