Donmai

API Number of results (Dataset)

Posted under General

Hi,
I'm a student intending on using Danbooru as a dataset for machine learning. I'm currently working on prefiltering the dataset (using Danbooru2018's metadata batch) as I really only need a relatively small, carefully filtered subset of images.

Assuming there's a character limit, I'm leaving out some background on the project for now.

Is it possible to get the number of results of an API query such as "artist_name rating:s"?
The often linked BigQuery mirror is down and datasets such as Danbooru2018 are outdated (although I do strongly hope for a update in the coming weeks!).

Thank you!

Seems a character limit it was. So here's some background for those interested:

Background

Project goal is style transfer based on style identification and, more in the future, using this as basis for stylized rendering in computer graphics.

For the dataset, I try to extract artists that have an established style in drawing art that includes landscapes. This metric is estimated by the number of images with the desired tags and the number of total images the artist made.

The most lax rules I am considering result in around 130 thousand images. With some rudimentary artist filter that becomes only 30 thousand. I'd like to increase that last number as much as possible while still ensuring there are at least 10-20 images per artist with the same established style (very important for quality results).

Currently I only use the API to get the URLs for images once I filtered them (and caching those URLs to keep traffic down). I though about using Danbooru2018 but since I am not targetting image classification, I'd much prefer having access to the high resolution sources, and given I only need a small subset, getting a few TB of drives is a bit overkill for me right now. So I intend to download and cache in stages as my project advances - I've made sure the images are exactly what I need (and passed hand selection through preview images).

Now with Danbooru2018 being a year old, with many artists probably below the minimum image count that now would pass the test. So I thought I'd use the API to check how many images an artist has uploaded to date.
Ideally, I'd still want to filter "artist_name rating:s", to filter out artists that have tons of nsfw images and just very few matching images, not enough to stand on their own (plus it can be assumed that nsfw elements/influences can be found in them).

Is it possible to get the number of results of an API query such as "artist_name rating:s"?

https://danbooru.donmai.us/counts/posts?tags=artist_name+rating%3As
https://danbooru.donmai.us/counts/posts.json?tags=artist_name+rating%3As
https://danbooru.donmai.us/counts/posts.xml?tags=artist_name+rating%3As

Since Danbooru2018 is the second version of the Danbooru201* dataset, there might still be a possibility of it getting an update sometime in the future.

ehh said:

https://danbooru.donmai.us/counts/posts?tags=artist_name+rating%3As
https://danbooru.donmai.us/counts/posts.json?tags=artist_name+rating%3As
https://danbooru.donmai.us/counts/posts.xml?tags=artist_name+rating%3As

Since Danbooru2018 is the second version of the Danbooru201* dataset, there might still be a possibility of it getting an update sometime in the future.

Thanks, should have searched in the API help page with all expanders open...

1