Donmai

Pixiv integration

Posted under General

Pixiv is a major source of art for this site. I'd like to integrate the site better, so I want to hear some ideas. Examples:

- Ability to grab, filter, and upload every image from a particular artist.
- Ability to grab and romanize the artist name, then finding or creating a matching Danbooru artist entry.
- Translation of Pixiv tags.

Updated by Ephyon

#1 Ability to grab, filter, and upload every image from a particular artist.

This is a good idea, but should be used carefully, and only by vetted members. Not everything on Pixiv is good, and not every artist produces consistently good output. I can see that this would be useful for well established quality artists though.

#2 Ability to grab and romanize the artist name, then finding or creating a matching Danbooru artist entry.

This would be very useful for laymembers. How is it going to work though? Will it just copy the ASCII name in the image URLs? Sometimes that may not be ideal, especially if it greatly differs from the artist's usual pseudonym or if a real life name is known.

If it romanizes the given kana/kanji name, that would be interesting, and I'd be curious to see the approach, since that is often a difficult problem for humans even.

#3 Translation of Pixiv tags.

I really like this idea, and it's something I've thought about before. It might be challenging since Pixiv tags aren't normalized (there will be a many-to-one mapping to Danbooru's tags). In fact it may be a many-to-many mapping when only given or surnames are used as tags.

Furthermore managing these mappings is sure to be much more work than the current alias/implication system, so you might want to open the management beyond admins, but care should be taken since it's more or less the same privilege.

Also we might want to pay special attention to automatically tagged posts, since Pixiv tagging is notoriously sparse compared to Danbooru's.

Updated

albert said:
- Ability to grab, filter, and upload every image from a particular artist.

I still have yet to find a single artist that I want to upload everything they have in their account. Maybe if the output for the ripper looked something like the 1.15.0 mod queue where you could select posts you don't want to upload or something it could work.

The rest of it sounds good though.

#1 - Yeah, mass unregulated transfer of content is just asking for trouble. Log's idea would work though, especially since we'd have to individually tag each image according to content anyhow.

#2 - Shinjidude brought up my worst complaint with this: Even though their Pixiv url name is useful, artists with homepages will sometimes have completely different names there which they may use with much more frequency.

This site: http://nihongo.j-talk.com/parser/index.php is what I and many people here use for romanizations. It would indeed be useful to have something like this directly integrated into danbooru.

#3 - You lost me. Translating the pixiv tags through an integrated babelfish-like thing to help search pixiv itself is a great idea, but I really hope you're not thinking of a system that would automatically copy them over here. Tagging in Pixiv is notoriously sloppy and messy, you get lucky if the whole bunk of a single franchise's art is under one tag and not two or three, nevermind characters, and there's no end to random nonsensical tags used only once. I really can't imagine a system that would be simpler than the current "see what franchise it belongs to and tag it yourself".

albert said: - Ability to grab, filter, and upload every image from a particular artist.

Very bad idea. Not everything an artist does is good (I follow hundreds of artists, and have passed on a least some images from all of them), or relevant to danbooru. I don't think there's a single artist I'd want to upload everything from, and even if I did I'd want to make sure it's being tagged well. If the image is good enough to upload, it's good enough to take the 20 seconds it takes to do so.

Also, for me at least (but not for everyone, I understand), part of the fun of Danbooru is contributing. If we're just going to send a script out to suck down every artist that's had a pic uploaded , who needs uploaders? (Admittedly hyperbole, but still valid, I think.)

Lastly, if an artist is so incredibly good enough that we can trust everything they upload (unlikely), then there's probably already a dozen danbooru members stalking their page and uploading the image within 5 minutes of it going online.

- Ability to grab and romanize the artist name, then finding or creating a matching Danbooru artist entry.

Potentially good idea, but artist name on pixiv is not necessarily the name we'd use for the tag. I'd rather people use pages like what Ephyon linked on a case by case basis as needed.

- Translation of Pixiv tags.

Bad idea. Pixiv tagging is fairly awful, with almost none of the normalization of names that we spend so much time on here. I often have to search 3-4 different tags to find all the images of a character, and often characters (and even series!) aren't tagged at all. This would also be extremely difficult to implement, I'm sure.

I understand that part of the purpose of integration is to streamline things for the lazy and/or those unfamiliar with pixiv or Danbooru tagging practice, but I think it'd cause more problems than it solves. The amount of after the fact tag correction that would need to be done would probably outweigh the time saved.

The best thing that can be done with Pixiv is simply to stop people from using the wrong URLs in the source field, I feel.

Also, some method to automatically add a sample URL to the artist entry for pixiv artists. I'm not sure if this is possible, but the number of completely empty artist entries is a problem. I've seen people use the wrong URL, thus not do an Artist Find, and wind up creating entirely new artist tags for artists who already exist in Danbooru.

Updated

Perhaps instead of a direct transfer of tags, maybe make the pixiv tags work similar to the related tags button? It gives suggestions based on what pixiv tags are there, which the uploader can then select those that apply.

The other two ideas sound reasonable, as long as we maintain the current system as well.

An additional feature that might be nice would be a way to directly search pixiv tags using danbooru. So if you're looking up a character or copyright, you could switch to a pixiv search of the same thing.

First, I completely agree with all the objections raised, particularly jxh2154 said it well.

albert said:
Pixiv is a major source of art for this site. I'd like to integrate the site better, so I want to hear some ideas.

Okay. Here's idea #1: avoid automation like the plague. Make it easy to do things right manually (ie. suggest artists, choke on people pasting wrong URLs, yell at people omitting artists), but do NOT ever commit anything automatically. We're a site for high quality art, pixiv isn't. It's a fundamental incompatibility that cannot be reconciled.

- Ability to grab, filter, and upload every image from a particular artist.

No, no and no. *Maybe* if you made it strictly manual and per-user, ie. there'd be a (private, per-user) feed with newly uploaded works where you could click to be taken directly to the tagging screen with artist prefilled and you could click "upload" afterwards. This could help with artist tagging a little bit, but still, it'd make it too easy to just upload everything from the pixiv account to danbooru. We don't want that.

- Ability to grab and romanize the artist name, then finding or creating a matching Danbooru artist entry.

No. The ability to try to find matches amongst existing entries, and if that fails to present the output of the tools mentioned above as a suggestion, and if that's still hard to open a thread asking for assitance would be good. But again, don't automate it, just make it easy to do manually. We have problems with choosing a name sometimes, expecting a machine to do better here is insane.

- Translation of Pixiv tags.

I guess you haven't seen pixiv tags then (I'm not being sarcastic; I realise it's hard to grok when they're all in moonspeak). Lemme explain then: pixiv tags are complete rubbish. It's chock-full of nonsense, one-off tags the artist felt like adding, "lololol" and "kawaiii" garbage, tens of different tags for the same thing, whilst lacking anything even resembling comprehensive tagging for what's actually visible in the pic. It's absolutely useless and impossible to translate to our tags.

Overall, I'm rather skeptical of the idea of integration: yes for making it easier to access the corresponding pixiv data, but do not ever commit anything without someone manually selecting to do so. Otherwise it's just asking for trouble, and if it happens, it *will* overwhelm our capacity to fix things. That would seriously put the entirety of what we have gathered up to now in danger.

albert said:
- Ability to grab, filter, and upload every image from a particular artist.

Note the word "filter" here, albert already has plans to have manual selection for mass artist uploading. Stop suggesting it.

Just as a comment to those talking about the garbage in Pixiv tagging, I don't think the idea was ever to translate every tag there, only to map well behaved tags to Danbooru tags. For example where series or characters with full names are tagged. Obviously things like translating the one-off full sentence tags, or "kawaii" shouldn't be implemented.

Also with all three suggestions, obviously manual oversight is important, I alluded to that above saying we need to be careful. Automatic systems are prone to make mistakes, and if left unchecked those mistakes will slip through. That being said, I still think any of these improvements, if sufficiently accurate, could make good tools to the users here.

Updated

Suiseiseki said:
Note the word "filter" here, albert already has plans to have manual selection for mass artist uploading. Stop suggesting it.

The problem here is with "mass". It should NOT be a mass upload tool, or ever suggest it could be. Mass uploads are necessarily detrimental to quality; we want quality in preference of quantity.

Part of the reason I don't see the benefit in a mass uploader is because the bulk of the time it takes to upload something is picking the tags. If I see a good pixiv artist and feel like plundering, the actual process of opening the images and uploading them takes a few seconds at most with danbooruup. Right click > Upload Image to Danbooru and click okay.

But before clicking okay, you add the tags. Even this can be done in a matter of seconds if I know what all the tags I need to add are (and the autocomplete dropdown is lovely).

The only time uploading an image takes more than 5-10 seconds is when I have to do research to find out what something is because I'm not familiar with it or it's not on danbooru yet and I need to make sure I make the new tags right. Again, a mass uploader will have no effect on this step.

Adding the tags manually is something that should be done even if one is "mass" uploading. Therefore the total time savings is practically nil, unless there are people using much less efficient ways to upload that I'm not familiar with, like having to type all tags in full, manually without an autocomplete (something I can't imagine doing).

I think I will start off small. Every action will require your consent.

If you upload from a Pixiv page, Danbooru will offer to fetch the artist information/picture tags (this is slow anyway, sometimes 3 seconds). It will then offer to create an artist entry if one doesn't exist already and do a source search to tag existing posts. This will hopefully remove the need for hopping between Pixiv, Danbooru's upload page, and Danbooru's artist interface.

albert said:
If you upload from a Pixiv page, Danbooru will offer to fetch the artist information/picture tags (this is slow anyway, sometimes 3 seconds). It will then offer to create an artist entry if one doesn't exist already and do a source search to tag existing posts. This will hopefully remove the need for hopping between Pixiv, Danbooru's upload page, and Danbooru's artist interface.

This sounds very reasonable and would be a time saver for me.

Getting information tags from Pixiv is only useful if the Japanese names are linked to our romanisations. I've wanted to suggest that this is a good idea before (although not specifically Pixiv--but generally, artists tend to give the name of the character they drew if it's not an original work).

Then again, this gives issues when the tag doesn't exist yet, resulting in Japanese tags, something we've gone great lengths to avoid. I don't mind them that much especially if they're only temporary, though.

Also, this would allow us to search Japanese names, so you could copy-paste something from a JP blog to see if it was the name of the character, and what its romanisation is.

albert said:
It will then offer to create an artist entry if one doesn't exist already and do a source search to tag existing posts.

Define "if one doesn't exist"--there are several levels at which an artist tag "doesn't exist":
1) There's no artist tag at all
2) There's an artist tag but no artist entry
3) There's an artist tag and entry but the entry doesn't include the correct format pixiv-source so that it gets detected by the "find artist" system. (Or the artist only has his homepage in the artist entry, which causes the same behavior)

With situation 1 your system would be excellent, but 2 and 3 could create one artist with two tags.

I suspect that for a large number of 2 and 3, there are images with pixiv-sources that also have an artist tag. Perhaps it can suggest these? You'd have to double-check yourself as uploader, to see if the tag is actually the correct artist.

albert said: It will then offer to create an artist entry if one doesn't exist already and do a source search to tag existing posts.

This sounds good, if it works right. But before anything like this is done, I think this Trac needs to be revisited: http://trac.donmai.us/ticket/443

It was marked fixed, but the problem still happens. I think you need to include the slash after the pixiv account name. Currently it's excluded.

Do an artist find on post #415586 and you get three hits: 2_(artist), 147, and diodio

That's because their URLs are being checked as:
 http://img*.pixiv.net/img/pad*
 http://img*.pixiv.net/img/padtyo*
 http://img*.pixiv.net/img/padiotheworld109*

So the first meets the criteria of all three. If it were /img/pad/* instead, it would be fine.

I can reopen the ticket if you'd prefer.

スラッシュ said:
Getting information tags from Pixiv is only useful if the Japanese names are linked to our romanisations.
...
Then again, this gives issues when the tag doesn't exist yet, resulting in Japanese tags, something we've gone great lengths to avoid.

I don't see why the romanization or language matters. Unless I'm misinterpreting Albert, this would be very much like the current alias system with established mappings. For example if the system sees a Pixiv post with the tag "綾波レイ", it will suggest "ayanami_rei", if it sees "赤毛", it will suggest "red_hair", if it sees "遠坂凛" it will suggest "tohsaka_rin" or "toosaka_rin" (whichever we happen to feel like using at that point).

I would urge against a system that attempts to make its own mappings to new tags using its own romanization system. If we did that we'd be liable to run into the issues 葉月 brought up above with garbage tags. A linked-to stand-alone romanizer could be useful for trying to figure out new tags, but I wouldn't recommend we have the system even suggest new tags on its own. New artists could be a separate matter though, as they are highly unlikely to be garbage.

Updated

I think you're misinterpreting my post or I phrased it unclearly, since that's exactly what I'm suggesting as well, haha.

An additional thing I thought of--one thing that we could do to avoid the many retarded Japanese tags is only allow tags that are already aliased. A Pixiv-tag would only "work" if it's already aliased to a romanised equivalent, giving it a sort of "approved" status as a tag. That way all the garbage tags on Pixiv will simply drop by the wayside. It'd also mean we'd have to go through all our character tags and add Japanese aliases (if applicable), but that's true anyway if we want to make this system useful.

スラッシュ said: An additional thing I thought of--one thing that we could do to avoid the many retarded Japanese tags is only allow tags that are already aliased. A Pixiv-tag would only "work" if it's already aliased to a romanised equivalent, giving it a sort of "approved" status as a tag.

Yes, I would consider this an absolute requirement. No automatically tagging anything that hasn't been explicitly reviewed and approved.

albert said:
If you upload from a Pixiv page, Danbooru will offer to fetch the artist information/picture tags (this is slow anyway, sometimes 3 seconds). It will then offer to create an artist entry if one doesn't exist already and do a source search to tag existing posts. This will hopefully remove the need for hopping between Pixiv, Danbooru's upload page, and Danbooru's artist interface.

This sounds much better. The page-hopping is currently one of the most annoying things about uploading from pixiv.

The jp->en alias mapping sounds like a hell of a lot of work for whoever's in charge of that, but it may eventually be a handy feature.

1