Are people here aware of the new stable diffusion based model called Waifu-Diffusion?
They've been training a model to produce anime and manga focused images and they claim that they procured those images and image/text pairs by scraping danbooru.
I was wondering if this kind of thing is permitted / known of.
I think what they meant was "stance on using Danbooru as a model for AI training" rather than "stance on uploading WaifuDiffusion images on Danbooru". As for that though, no idea.
Yeah my question was more regarding the stance of people using Danbooru as a training model. It's seeming to get more popular. Novelai has apparently used it too.
well when danbooru's a site that mainly operates on unauthorized reuploads, some artists may get angry that their art is being used for AI art and request to be removed from the site. in fact i already saw a tweet from a japanese person about this.
i'm not sure how this kind of stuff can be prohibited, but it's not looking good for the site...
i'm not sure how this kind of stuff can be prohibited, but it's not looking good for the site...
I don't think the site can really.. do anything at all to prevent it other than try and make it known they're not partnered with NovelAI so artists don't think this was done with Danbooru knowledge/support. And even if Danbooru did manage to somehow block them from training their AI here they'd just do it on another imageboard.
There's no way for us to stop people from scraping our database. Any statement we'd make would be void of meaning, and trying to implement measures against that would only affect normal users. If they don't get the pics from us, then it's gelbooru, sankaku, zerochan, tbib, safebooru or the million other imageboards out there.
yeah i wasn't expecting there to be any way to restrict that. maybe if they didn't name drop danbooru specifically so there's at least not a specific booru to pin this all on. if they would even agree to that.
Yeah, preventing scraping is practically impossible, but instead of combating it via technical measures it could be dealt with another way.
A meta tag could be created denoting the artist’s wishes on their art being used for AI training, say do_not_train or something. I guess the opposite could also be created for artists that want to explicitly allow their art to be used for training, but I don’t know how much use that would really get. It could be useful for the dataset creators or those creating the AI models if they care or are forced to care about having explicit permission in the future.
If an artist makes it known to Danbooru (or otherwise?) that they don’t want their art used in AI training sets then their artist tag could be updated to implicate the do_not_train tag. This could even be an incentive for those who would otherwise wish to keep their art on Danbooru vs opting to be a banned_artist.
Obviously it would be up to the dataset scrapers to filter out art with the tag, but I would think they’d prefer that over a bunch of individual DMCA takedown requests. If everyone’s nice about it then it could be a win-win-win for the artists, Danbooru, and the scrapers/AI model creators.
This is a sticky situation. NovelAI's perceived association with Danbooru on Twitter is causing some people to advise artists to submit takedown requests here to avoid being used by their system, which is an explicit for-profit venture. The problem is that if AI models are using offsite database dumps, anything we do will be in vain unless we directly talk to the companies in question. If we do nothing, we'll likely just end up with another rash of banned artists based on nothing but someone else's miscommunication.
This is a sticky situation. NovelAI's perceived association with Danbooru on Twitter is causing some people to advise artists to submit takedown requests here to avoid being used by their system, which is an explicit for-profit venture. The problem is that if AI models are using offsite database dumps, anything we do will be in vain unless we directly talk to the companies in question. If we do nothing, we'll likely just end up with another rash of banned artists based on nothing but someone else's miscommunication.
Exactly, I came here today because I saw artists on Twitter posting about it. I think that while there’s not much we can do to combat the scraping, we can at least provide a framework to opt-out of artists being used for training data.
Yeah it’d be honor system based, but most legal things are if you think about it. If we have the data of who explicitly doesn’t want to be used for training I’m thinking those for-profit companies would do their best to update their datasets so as to comply with any current or future copyright laws around the subject.
In the end I just want to try to prevent a wave banned artists due to something Danbooru isn’t even in control of.
I’ve been told on Discord that it’s no longer possible to cross-category implicate, so that throws a bit of a wrench into my suggestion. I’ve created a GitHub issue requesting to add the capability back, at least for artist -> meta tags.
I’ve been told on Discord that it’s no longer possible to cross-category implicate, so that throws a bit of a wrench into my suggestion. I’ve created a GitHub issue requesting to add the capability back, at least for artist -> meta tags.
What's more likely the happen (if anything, which I have my doubts about) would be making the tag an artist tag, similar to banned artist.
What's more likely the happen (if anything, which I have my doubts about) would be making the tag an artist tag, similar to banned artist.
Hrmm, I guess, but it would be weird having the tag appear as an artist in the tag list. IMO it makes more sense to convert banned_artist into a meta tag since a person named “banned artist” isn’t the author, but I’m sure that would require a fair amount of backend work.
I’m guessing banned_artist is not currently a meta tag because of historical purposes what with meta tags being a more recent addition?
Hrmm, I guess, but it would be weird having the tag appear as an artist in the tag list. IMO it makes more sense to convert banned_artist into a meta tag since a person named “banned artist” isn’t the author, but I’m sure that would require a fair amount of backend work.
I’m guessing banned_artist is not currently a meta tag because of historical purposes what with meta tags being a more recent addition?
Having it as an artist tag also adds visibility to it, since it'll always be at the top of the list for 99% of users.
If an artist makes it known to Danbooru (or otherwise?) that they don’t want their art used in AI training sets then their artist tag could be updated to implicate the do_not_train tag. This could even be an incentive for those who would otherwise wish to keep their art on Danbooru vs opting to be a banned_artist.
Probably easiest to leave banned artist as the only way to opt out for everything, all-in or all-out, without having a separate system. Similar to robots.txt and Do Not Track headers, bad actors and hobbyists could totally just ignore the do not train tag and it might not be very effective. Banned artist is also much more effective because it renders the image totally inaccessible to unprivileged accounts; not everyone can just choose to ignore it.
Having it as an artist tag also adds visibility to it, since it'll always be at the top of the list for 99% of users.
Hrmm, this is true. Well, considering banned_artist is already has special handling in the backend, it could just be made to display differently than other meta tags if such a conversion was ever to happen.
Idk, I think it being an artist tag and not a meta tag should be considered more of a historical quirk than anything.
Talulah said:
What's more likely the happen (if anything, which I have my doubts about) would be making the tag an artist tag, similar to banned artist.
After thinking about this some more, until the functionality to create artist -> meta implications is added, that is if it is agreed to be a useful feature, a do_not_train artist tag could be created in the interim for dealing with the current AI model training issue and converted later.
Probably easiest to leave banned artist as the only way to opt out for everything, all-in or all-out, without having a separate system. Similar to robots.txt and Do Not Track headers, bad actors and hobbyists could totally just ignore the do not train tag and it might not be very effective. Banned artist is also much more effective because it renders the image totally inaccessible to unprivileged accounts; not everyone can just choose to ignore it.
Yeah, I’m not saying we should touch banned artist at the moment. I just think it’s a bit of an historical quirk of the site, but that can be discussed in a separate thread.
I agree that bad actors could ignore the do not train tag, similar to how people ignore copyright laws all the time, and I’m of the opinion that hobbyists/academics training models for non-commercial purposes may even fall under fair use. Seems to be very much a legal grey area right now. For companies that want to monetize things and avoid running afoul of potential problems in the future stemming from this they’ll likely want to follow the artist’s wishes, and a do not train indicator would be help them do that.
I cannot believe I am making an account just to make this reply but if there were any question of something possibly being the death of the site, it would be creating hostility with the artists whose art is posted here. There was some reason to allow the art to be reposted before, but if they actively don't want their art on the site, it would likely be it's death. There is no doubt in my mind the site could sustain massive amounts of claims through the DMCA from American artists and it would not stand up to fair use, fair dealing, or about any copyright law of it were sued. The legally dubious nature of using art as training data retrieved from an unquestionably infringing site, the site would be the easiest target. People like the bots, so it would be a shame if the API had to go outright, but if it must, it must.
I cannot believe I am making an account just to make this reply but if there were any question of something possibly being the death of the site, it would be creating hostility with the artists whose art is posted here. There was some reason to allow the art to be reposted before, but if they actively don't want their art on the site, it would likely be it's death. There is no doubt in my mind the site could sustain massive amounts of claims through the DMCA from American artists and it would not stand up to fair use, fair dealing, or about any copyright law of it were sued. The legally dubious nature of using art as training data retrieved from an unquestionably infringing site, the site would be the easiest target. People like the bots, so it would be a shame if the API had to go outright, but if it must, it must.
Any solution people are theorizing is not going to prevent people scraping for images, it'll just hurt regular users. Bots will find a way to circumvent whatever restrictions we put in place, and regular users who wanted to use the API for good projects will be the ones to suffer.