Donmai

About Tumblr Sourcing

Posted under General

Can we please stop using direct image links as sources towards images uploaded on tumblr?
The problem is that, if you link directly to the image, it gives no indication of who the artist is or where they are located (other than tumblr.com obviously).
This would not be a problem if people would make proper pages for artists that linked to their websites.
For example: there is an artist at the tag "kia_(tumblr)". All of the images posted for them link to the image directly. The artist page for this person links to one of the images uploaded here. As such, it's very hard to find the actual artist. Searching for any key terms related to that is very hard on tumblr to find the original source. "kia" "kia artist" "kia [character name]". Because as you can guess, Kia is going to be very popular and hard to search through.
Now, the reason this is a huge issue is because if an artist isn't properly sourced, how will interested parties find them? This could be as simple as a fan but what if someone wanted to purchase art from these artists? They have absolutely no way of getting to them and contacting them. That's lost business. Some artists rely on commissions as their only source of income.
Sure this is just an image board and who cares, right? Who actually cares about the artist as long as we all get to see a cute anime girl, right? All I'm saying is if you're going to say you like someone's work enough to post it on this site, say you like it enough to share this artist with other people.
Images on tumblr.com are posted on single post format (much like pixiv is), and we could very easily just source/link back to the post rather than the image itself. This would fix both issues of images not sourcing to the artist and people being unable to find artists at all.

To be honest, I thought the site auto-redirected to the artist page.

But, anyone can find who an artist is by searching the image through google and finding the original page or retweets/reblogs. It's far from an impossible task, rather taking less than a minute to do.

Not only that, if the artist is tagged properly on Danboooru, and it generally will be, one could just go to their wiki page to find their tumblr or twitter or blog, etc...

DoubleT said:

Can we please stop using direct image links as sources towards images uploaded on tumblr?

It has already been discussed and agreed that this should be done, and Danbooru even tries to make it as easy as possible by doing it automatically for uploaders that use the official bookmarklet. But some uploaders neither use the bookmarklet nor manually fix the source so you wind up with posts that still use the direct image url even though they shouldn't.

CodeKyuubi said:

To be honest, I thought the site auto-redirected to the artist page.

The auto redirect you're talking about can only be done for the following sites, listed on howto:upload:

  • Pixiv
  • Nico Seiga
  • Twitpic
  • Twipple
  • Karabako
  • deviantArt
  • Hentai-Foundry

The auto redirect is impossible for other sites. So instead the bookmarklet automatically changes the source for twitter, tumblr, nijie to point to the work page after upload by looking at the referer. But that only works for uploaders that use the bookmarklet.

Toks said:
The auto redirect is impossible for other sites. So instead the bookmarklet automatically changes the source for twitter, tumblr, nijie to point to the work page after upload by looking at the referer. But that only works for uploaders that use the bookmarklet.

Ah, well there's the problem. About two years ago the bookmarklet stopped working for me, so I stopped using it. I do everything by hand now.

CodeKyuubi said:

Ah, well there's the problem. About two years ago the bookmarklet stopped working for me, so I stopped using it. I do everything by hand now.

Are you sure it's the official bookmarklet that stopped working for you? It might be the really old unofficial one which is now obsolete as the official one does way more than the unofficial one.

If the official bookmarklet really is broken for you please explain how it is broken so I can fix it.

FWIW, here's a script to fix tumblr sources (like the one for Twitter at topic #11602):

Script
#!/usr/bin/env python3

import requests
import re
import json
import itertools

TUMBLR_URL_RE = r'https?://([^.]+\.tumblr\.com)(?:/.*)$'
IMAGE_URL_RE = r'https?://[^.]+\.media\.tumblr\.com/(.*)'

def danbooru_api_whatever(method, thing, *args, **kw):
    response = method('https://danbooru.donmai.us/{}.json'.format(thing), *args, **kw)
    response.raise_for_status()
    return response.json()

def danbooru_api_get(thing, params=None):
    p = dict(auth_data['danbooru'])
    if params:
        p.update(params)
    return danbooru_api_whatever(requests.get, thing, params=p)

def danbooru_api_put(thing, data):
    body = dict(auth_data['danbooru'])
    body.update(data)
    return danbooru_api_whatever(requests.put, thing, body)

def get_posts(*tags):
    # An iterator over all posts matching tags, doing pages as
    # necessary. Fetches pages lazily. Makes no attempt to remove
    # duplicates in case a post is added while working.
    params = {
        'tags': ' '.join(tags),
        'page': 1
    }
    for page in iter(lambda: danbooru_api_get('posts', params), []):
        for post in page:
            yield post
        params['page'] += 1

def all_photo_posts(tumblr_domain):
    # Iterator over all photo posts from a given tumblr blog. Fetches
    # posts lazily as necessary. Actually, it returns individual image
    # URLs along with the URL of the post they come from, or something
    # like that. I'm too lazy to properly document it, but that's okay
    # because nobody will ever read this.
    url = 'https://api.tumblr.com/v2/blog/{}/posts'.format(tumblr_domain)
    params = {
        'type': 'photo',
        'offset': 0,
        'api_key': auth_data['tumblr']['api_key']
    }
    posts = True
    while posts:
        response = requests.get(url, params=params)
        response.raise_for_status()
        posts = response.json()['response']['posts']
        for post in posts:
            for photo in post['photos']:
                if 'original_size' in photo:
                    yield (photo['original_size']['url'], post['post_url'])
                for size in photo['alt_sizes']:
                    yield (size['url'], post['post_url'])
        params['offset'] += len(posts)

def main(argv):

    with open('auth.json', 'r') as f:
        globals()['auth_data'] = json.load(f)
    if not ('danbooru' in auth_data and 'tumblr' in auth_data):
        raise RuntimeError('auth stuff not provided')

    if len(argv) < 2:
        return 'usage: {} artist_name [tumblr_url ...]'.format(argv[0])

    artist_url_match = re.match(r'https?://danbooru\.donmai\.us/artists/([^/]+)', argv[1])
    print('Looking up artist...')
    if artist_url_match:
        artist = danbooru_api_get('artists/' + artist_url_match.group(1))
    else:
        artist_name = argv[1].replace(' ', '_')
        for artist in danbooru_api_get('artists', {'search[name]': 'name:' + artist_name}):
            # Look for an exact match
            if artist['name'] == artist_name:
                break
        else: # No break means nothing matched
            return 'No such artist: {!r}'.format(artist_name)

    tumblr_blogs = [
        # From command line (wow this is a mess...)
        match.group(1) if match else re.match(r'(?:https?://)?(.*)',arg).group(1) for match, arg in (
            (re.match(TUMBLR_URL_RE, arg), arg)
            for arg in argv[2:]
        )
    ] or [
        # From artist entry on danbooru
        match.group(1) for match in (
            re.match(TUMBLR_URL_RE, url['normalized_url'])
            for url in artist['urls']
        ) if match
    ]
    if not tumblr_blogs:
        return 'No tumblr blog(s) found.'

    posts_needing_update = {
        re.match(IMAGE_URL_RE, post['source']).group(1): post['id']
        #for post in get_posts(artist['name'], 'source:https://pbs.twimg.com/')
        for post in itertools.chain(
            get_posts(artist['name'], 'source:https://*.media.tumblr.com/'),
            get_posts(artist['name'], 'source:http://*.media.tumblr.com/')
        )
        # ~source:https://pbs.twimg.com/ ~source:http://pbs.twimg.com/ doesn't work
        # source:http*://pbs.twimg.com/ works but is technically wrong and might be harder on the database (?)
    }
    if posts_needing_update:
        print('Found {} posts needing update.'.format(len(posts_needing_update)))
    else:
        print('No posts with tumblr media sources found.')
        return

    print('Using tumblr blog(s):', ', '.join(tumblr_blogs))
    for blog in tumblr_blogs:
        for image_url, tumblr_post_url in all_photo_posts(blog):
            image_url = re.match(IMAGE_URL_RE, image_url).group(1)
            try:
                post_id = posts_needing_update.pop(image_url)
            except KeyError:
                continue
            print('Post #{} -> {}'.format(post_id, tumblr_post_url))
            danbooru_api_put('posts/{}'.format(post_id), {
                'post[source]': tumblr_post_url
            })
            if not posts_needing_update:
                print('All sources fixed.')
                return
    return 'Could not find sources for {} post(s): {}'.format(
        len(posts_needing_update),
        ', '.join(map(str, sorted(posts_needing_update.values(), reverse=True)))
    )

if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

It works pretty much the same way as the Twitter one, and I'm feeling pretty lazy right now, so see the other forum thread for instructions. This one's only dependency is Requests.
Note that if you just specify an artist, it'll only find the blog automatically if it's *.tumblr.com. It does work on custom domains, but you have to specify the URL manually in that case.

auth.json template
{
    "danbooru": {
        "login": "your danbooru username goes here",
        "api_key": "your danbooru api key goes here"
    },
    "tumblr": {
        "api_key": "your tumblr api key goes here"
    }
}

I ran it on eight_tohyama and hetza_(hellshock):

Logs
Looking up artist...
Found 4 posts needing update.
Using tumblr blog(s): 08base.tumblr.com
Post #2045133 -> http://08b.tokyo/post/121994810410/4-18
Post #1792405 -> http://08b.tokyo/post/97222104925
Post #1661515 -> http://08b.tokyo/post/80070216005
Post #1661520 -> http://08b.tokyo/post/73991411960
All sources fixed.

Looking up artist...
Found 59 posts needing update.
Using tumblr blog(s): hetza5721.tumblr.com
Post #2016277 -> http://hetza5721.tumblr.com/post/119352127893/m
Post #1956100 -> http://hetza5721.tumblr.com/post/113779403373
Post #1948702 -> http://hetza5721.tumblr.com/post/112697346018
Post #1922144 -> http://hetza5721.tumblr.com/post/110442958788/noob-medic
Post #1948704 -> http://hetza5721.tumblr.com/post/109397708203/black-widow-spec-ops
Post #1888703 -> http://hetza5721.tumblr.com/post/106710704838
Post #1888708 -> http://hetza5721.tumblr.com/post/106639908518
Post #1884130 -> http://hetza5721.tumblr.com/post/106229103428
Post #1877969 -> http://hetza5721.tumblr.com/post/105680788848
Post #1869812 -> http://hetza5721.tumblr.com/post/104918187808/simply-tenryu
Post #1867809 -> http://hetza5721.tumblr.com/post/104653194283/simply-shiranui
Post #1862725 -> http://hetza5721.tumblr.com/post/104219606578
Post #1848070 -> http://hetza5721.tumblr.com/post/102800137508
Post #1840550 -> http://hetza5721.tumblr.com/post/101997764133/standby
Post #1833219 -> http://hetza5721.tumblr.com/post/101423358433/2p-ver
Post #1833218 -> http://hetza5721.tumblr.com/post/101422976258
Post #1833217 -> http://hetza5721.tumblr.com/post/101422520298/npc
Post #1821322 -> http://hetza5721.tumblr.com/post/100234262643/shiranui-dazzle-camo
Post #1806137 -> http://hetza5721.tumblr.com/post/98534571373
Post #1799373 -> http://hetza5721.tumblr.com/post/97900139003
Post #1796665 -> http://hetza5721.tumblr.com/post/97639118353
Post #1806136 -> http://hetza5721.tumblr.com/post/97555972663/heavy-assault
Post #1792281 -> http://hetza5721.tumblr.com/post/97194050698
Post #1781530 -> http://hetza5721.tumblr.com/post/96265035768
Post #1736614 -> http://hetza5721.tumblr.com/post/91854411313
Post #1728114 -> http://hetza5721.tumblr.com/post/90850448453
Post #1716508 -> http://hetza5721.tumblr.com/post/89424238003
Post #1724199 -> http://hetza5721.tumblr.com/post/89027271793
Post #1693526 -> http://hetza5721.tumblr.com/post/86612507783
Post #1724211 -> http://hetza5721.tumblr.com/post/85914829708
Post #1679251 -> http://hetza5721.tumblr.com/post/82898058685
Post #1639010 -> http://hetza5721.tumblr.com/post/79454230582
Post #1675288 -> http://hetza5721.tumblr.com/post/74830680162
Post #1675289 -> http://hetza5721.tumblr.com/post/74810169034
Post #1675286 -> http://hetza5721.tumblr.com/post/74712208010
Post #1675287 -> http://hetza5721.tumblr.com/post/74712197246
Post #1599067 -> http://hetza5721.tumblr.com/post/74228954280
Post #1572562 -> http://hetza5721.tumblr.com/post/69370823627
Could not find sources for 21 post(s): 2039716, 1927449, 1919585, 1907322, 1899640, 1878753, 1857329, 1813291, 1779003, 1768156, 1725657, 1718137, 1689575, 1679249, 1670667, 1648076, 1630691, 1616461, 1609066, 1589992, 1573261

I checked the first one that wasn't found, and it looks like it's because it was revised. Maybe the others are similar, or maybe Tumblr has some limit on total posts like Twitter does.

Unfortunately, this script can't do anything without knowing the artist's tumblr, so it won't help with kia_(tumblr) until someone finds their blog. But for artists where the blog is known, this can help when individual uploaders forget to fix sources.

Updated

CodeKyuubi said:

But, anyone can find who an artist is by searching the image through google and finding the original page or retweets/reblogs. It's far from an impossible task, rather taking less than a minute to do.

While google reverse image search is a useful tool it's not as easy nor as consistent as you're making it out to be. Out of the three images under kia_(tumblr) there were no useful results for two of them. The third one (post #2057570) yielded this page which looked like it might contain some information (despite still not being the artist's page). But the page refuses to load for some reason. So I loaded google's cached version of the page. That cached page finally told me that the artist's page is http://rnarch-hare.tumblr.com/.

Even though it worked out in this case it could have easily not for a different artist. The first two images I tried didn't return useful results after all. And even when it does work out it would be very inconvenient for every user that wants to see the artist's commentary or know who the artist is to have to go through this whole process; it's much simpler to just include the work page source and then they can click it.

In case anyone is still having trouble getting the Post to Danbooru bookmarklet to work for Tumblr:

Tumblr uploading guide

For single-image Tumblr posts
  • Click the image to open *ARTIST*.tumblr.com/image/xxxxx
  • Right-click -> Open image in new tab
  • Click your "Post to Danbooru" bookmarklet
For multi-image Tumblr posts
  • Click on the Tumblr post page *ARTIST*.tumblr.com/post/xxxxx
  • Click on the image to view it in Tumblr's gallery viewer (you can scroll left-right)
  • Right-click -> Open image in new tab
  • Click your "Post to Danbooru" bookmarklet

Both these methods work for me 90%.

Toks said:

Are you having an issue with the bookmarklet the other 10% of the time?

I'm not sure why, but yes; sometimes the artist finder / source redirect doesn't work & I have to do it manually.

Can't really seem to figure out when/why it's happening. I'll try to keep a log of images/browsing habits.

Kikimaru said:

I'm not sure why, but yes; sometimes the artist finder / source redirect doesn't work & I have to do it manually.

Can't really seem to figure out when/why it's happening. I'll try to keep a log of images/browsing habits.

I suspect that the referer being set to something incorrect (or blank) is the cause. I'm not sure why though.

The next time it happens after you upload hit the back button in your browser so you return to the upload page, then look in the address bar for the &ref= parameter. If you tell me what that &ref= parameter is set to that might help narrow it down.

For reference, the correct thing it's supposed to be set to is either *ARTIST*.tumblr.com/post/xxxxx or *ARTIST*.tumblr.com/image/xxxxx as you pointed out earlier.

While we are at it, would it be possible to add a second source field on the upload page for those who prefer to use the standard upload method or wish to upload something from a source not supported by the bookmarklet?

Something along the lines of:

Source Image: http://example.com/data/img/image1.jpg
Source Location: http://example.com/character/person1

It might stimulate people to add the proper source and would save the hassle of uploading the image and manually editing the source afterwards.

post #2062056

http://danbooru.donmai.us/uploads/new?url=http%3A%2F%2F40.media.tumblr.com%2F3e16fc32476419ff91a866a45fb4a322%2Ftumblr_inline_nqre8nZfRf1ryxnbm_1280.jpg&ref=

source:

http://noise-tanker.tumblr.com/post/122879315617/your-cube-pics-are-really-nice-why-not-drawing

post #2062064

http://danbooru.donmai.us/uploads/new?url=http%3A%2F%2F40.media.tumblr.com%2F62aab1c1eeb196ba7bf9371649e55c51%2Ftumblr_inline_nqx6ilf3mL1ryxnbm_1280.jpg&ref=

source:

http://noise-tanker.tumblr.com/post/123141761985/is-there-any-chance-of-a-pic-of-older-sunny

---------------

It's not just because the image is inline, post #2062060 sourced fine

http://danbooru.donmai.us/uploads/new?url=http%3A%2F%2F40.media.tumblr.com%2F902beaac6f1617c3e3881adee16abd65%2Ftumblr_inline_nqs1one58H1ryxnbm_1280.jpg&ref=http%3A%2F%2Fnoise-tanker.tumblr.com%2Fpost%2F122968847148%2Fim-really-not-familiar-with-all-the-newer-anime

Kikimaru said:

For the two that failed the referer is blank. It must be because of the way you got to the image. If you had done right click -> Open image in new tab then the referrer wouldn't be blank (unless there's some rare weird browser bug).

Did you do any of the following?:

  • Right click -> Copy image URL -> paste URL into address bar and hit enter
  • Click and drag image up to your address bar
  • "Open image in new tab" as normal, but then you manually modify the url in some way and hit enter (e.g. changing the _500.jpg at the end to _1280.jpg manually rather than letting Danbooru do it)

There will be no referrer if you do any of those.

Toks said:

For the two that failed the referer is blank. It must be because of the way you got to the image. If you had done right click -> Open image in new tab then the referrer wouldn't be blank (unless there's some rare weird browser bug).

Did you do any of the following?:

  • Right click -> Copy image URL -> paste URL into address bar and hit enter
  • Click and drag image up to your address bar
  • "Open image in new tab" as normal, but then you manually modify the url in some way and hit enter (e.g. changing the _500.jpg at the end to _1280.jpg manually rather than letting Danbooru do it)

There will be no referrer if you do any of those.

Ah, I think that's it - I have a Chrome extension called "Tumblr Load HiRes Images".

Disabled it and sourcing seems to work perfectly.

A neutral record is just a way of notifying someone of a thing and making sure others know they have been notified of thing. The bookmarklet isn't a rule currently but users who are going to regularly upload from tumblr or twitter need to be encouraged to use it.

1