Some of my "blue sky" feature requests have already been mentioned, but I'll go ahead and show my support and arguments for them along with some others.
Hierarchical tagging: I think this feature would greatly enhance the semantics encoded and the power we have to describe and query posts. It also provides for a way to generate more relevantly related tags as it would directly link tags to their truly related tags rather than depending on mere coincidence.
Hierarchical tagging will also provide a means to clean up the somewhat kludgey "combo tags", that lead to dilution of concepts and potential exponential specification. By "combo tags" I mean things like white_thighhighs, striped_panties, or short_twintails. In fact we could even use this to reify things like hair color, length, and style.
We'd probably need to experiment to find an efficient and usable way of implementing this, but I picture something like the following taglist:
"(ayanami_rei (hair short blue)(eyes red)(plugsuit white))"
Using that you could query characters with short blue hair with (hair blue short) without the need of a short_blue_hair tag.
Tag ontologies: I think by explicitly defining ontologies based on tag generality or specificity could be very useful, would provide semantics to a common category of implication, and would allow us to have shorter, less cluttered tag lists. For example we could define (miniskirt < skirt < bottom < clothing). That way anything tagged "miniskirt" would imply all of the above without having to explicitly state it in the taglist either via implication or otherwise. We could then use those semantics to say query bottom to get all instances of miniskirts, pants, bike_shorts, etc. Combined with the above we could query "(clothing blue)" to get any instance of blue clothing of any type.
Expanded tag types: Even if for usability sake we only color a handful of them, I think categorizing tags might be useful. It could be used to link things in the wiki. Also being able to query by count or existence of these might be useful in various contexts. For example automatically tagging exact duplicates, you often would like to exclude tags of a "meta" type, such as resolution size, quality, etc.
Spatial tagging: The ability to associate spatial properties (specifically coordinates) to tags. There are a couple things I see as being potentially useful with this. One, we could point out the characters tagged (a la Facebook's tagging) This would eliminate the necessity of using notes for this purpose, which is pretty ugly, and could be very helpful for people identifying characters of uncommon series. It could also be used to automatically tag colors, which would be an easy way to automatically increase our semantic information.
This could be done either by a point sample at the coordinates a tag is identified with, or more sophisticated analysis could be done. I don't know if it's something that could be incorporated as-is, but there is an interesting face detecting script I came across a while ago, that automatically detects anime faces and returns their position, hair color, eye color, and skin color. The code is available, and it's something I've been thinking of playing with for a while now.
Guided semi-automatic tagging: To aide in quickly and easily providing accurate and rich descriptive taglists, the site could automatically provide suggested tags. This could be done by a number of means.
- Cross referencing synsets of tags scraped or queried from other sites. This could be done with Pixiv uploads (as has been suggested in the past, and the Pixiv Translation Plus script already provides a jumping off point for this), other *boorus that use similar tagging schemes as us could easily be queried by IQDB or MD5 hash, and the existing tags there suggested.
- Automatically suggesting related tags, especially if the accuracy of this can be improved via tag hierarchies.
- Things such as character could be inferred by hair-color, eye-color, etc, as determined by the above facial detection script or something similar.
- The above methods used in synergy and/or iteratively based on existing input. For example if a facial detection script detects two characters one with blue hair & red eyes, and another with red hair and blue eyes, and pixiv provides the tag "エヴァ", and the system knows rei_ayanami is almost always tagged (hair short blue), the system could automatically provide "neon_genesis_evangelion (ayanami_rei (hair short blue)(eyes red))(soryuu_asuka_langley (hair long red)(eyes blue))" as a list of suggested tags.
Of course as has been mentioned with any sort of automatic tagging in the past these should be merely suggestions and need to be manually selected by a human user. But by doing this we could vastly shorten the time and greaten the ease it takes to accurately and thoroughly tag posts. This would make it easier to go back through and enrich old posts as well as encourage new and casual users to better tag what they post.
Semi-automatic duplicate detection: Use IQDB or a clone thereof to automatically search for visual duplicates above a given threshold, optionally comparing resolution, format, filesize, or filesize per area to gauge quality.
This could warn users against uploading potential duplicates, as well as automatically tag the existing duplicate, and provide suggested tags based on the duplicate (maybe excluding things like "Meta" categorized tags).
If a file is detected as matching above a threshold, the user would need to verify this fact and examine any suggestions, if the pic is automatically assessed to be of higher quality they can then upload without a warning. If it is assessed to be of lower quality, the user would need to override to upload, and it could potentially be flagged for review.
Clustering based on tags / features / metafeatures: This could be used for sorting, drilling-down, and suggesting similar tags.
Suggested posts: Based on a user's uploads and favorites, and coupled with the above clustering, suggest pics the user might like.
Automatic thumbnails for flash / video
Built-in auto-suggestion / color coding of tags a la Danbooruup: This would mitigate the need to update the extension. It's pretty much the only feature of it I use anymore.
Checking new tags for spelling errors / alternative variants before creating them (and verifying the correction with the user before continuing): I think this is a large contributor to the huge number of largely useless 1-use tags we have.
Embedding meta-information (taglists) into images: I'm not sure if I'm fully with this one, but it was mentioned above, and it's something I've thought about in depth before.
I think it would be useful to the end-user who downloads the pics, and if it catches on elsewhere could be another means to help auto-tag here or be used personally as things drift across the Internet, but I'm wary of changing the hash of the file.
Perhaps this could be done optionally on-demand, with the hash being computed only after meta-data has been stripped?
Normalizing use of DTEXT / HTML: We should use the same syntax everywhere, including translation notes (which currently use only HTML). I'd argue against disallowing style / formatting tags in notes though.
Better documentation throughout: with the ability for privileged users the ability to edit. Perhaps rankings set such that only mods can change some pages (like rules) whereas privileged can edit "how-to" lists, etc.
Richer API such that almost any feature can be used by a 3rd party application. Updated documentation on the API.
Better documented & streamlined installation of Danbooru on multiple platforms The ability to host in a subdirectory or shared server.
A polling system to automatically count yays/nays in the forum. This seems to be the most common form a thread takes here and could be useful both for whoever ends up deciding / enforcing the issue as well as for people who have an opinion, but no real commentary.
----
Ok, this sort of exploded as I got started. It's already too big, and I could probably keep going with "blue sky" ideas. This post probably needs a page of it's own.
Obviously many of these are sort of out-there, and many are probably infeasible for efficiency's sake, but I think they are all ideas that I think could be very useful, and I think are worth exploring or at least consideration.
Most of them are ideas I've been thinking about for some time, though I've never really done any sort of work on them or looked into particular implementations.