u3116254 – Exploring Digital Heritage 2018

Finding the Faces of the Tribune

The final stage of the Machine Tagging and Computer Vision group’s investigations looked at the potential of implementing basic facial detection technology (in this case, using OpenCV) to the Tribune collection in an effort to determine alternative methods for users to discover and navigate this extensive resource. While our group was not responsible for the technical aspects of this experiment, we were involved in interpreting the resulting data and attempting to identify where there might be advantages to utilising this approach.

While initially considering the possibilities of this option limited, we soon discovered that that even simply finding faces in the Tribune collection could prove extremely valuable. This approach had the potential to not only distinguish the photographs with people from those without, but also possibly reveal a bit about the nature of a particular image by the number of faces it contains.

Although not guaranteed to pick up on every single face, and with varying accuracy across the many different images, the ability of facial detection to return an approximate number could prove useful in determining the likelihood of a photograph being of an individual or a crowd. Such information could be very useful in this particular case, and provide another path for people searching the Tribune images. The option to only show those with, for instance, more than 10 or 20 faces could direct users to photographs of protests or meetings – and there may well be other ways of using the resulting data to assist people through the collection.

Exploring the data generated from this experiment gives us the opportunity to see an overview of faces in the collection, revealing how many images in total were reported to contain faces, as well as the average number of faces per image and a chart demonstrating face frequency across the 60,000 Tribune photographs. Additionally, listing those images with a face count that exceeds 100, making it possible to zoom in on them to see exactly what they’re about.

Of course, it’s important to consider that the ‘face’ being identified may not always be a face…

In this case, a group portrait of 3 individuals supposedly contains 4 faces, with the computer picking up on something that merely resembles a face in a pattern (something we noticed happening quite frequently). False positives such as these are inevitable and can certainly be filtered out in order to ensure more accurate results, although it’s actually quite interesting to see what mistakes are being made, and consider how the facial detection technology is making its decisions.

In all, 230,677 ‘faces’ were detected, and this experiment demonstrated a few creative ways of using them – by extracting them from their context and making them the focus. Some interesting things resulted from focusing on faces of the Tribune collection. In this case, the cropped versions of each identified face became part of a single, enormous image – which can be viewed in detail here.

Another approach involves randomly selecting a photograph from the Tribune collection which has been determined to contain more than 20 faces, and from it, generating a different version of the detected faces in isolation. It’s then capable of a demonstrating a transition between the two images.

Faces have the power to really draw people in – initially isolating them from their context, its then possible to add the context back in order to see what’s actually going on in a photograph. From what we’ve seen, these approaches could prove to be an extremely effective and engaging way of presenting the Tribune collection.

As noted with our previous experiment, it’s possible to see the potential of basic facial detection even in its unpolished, preliminary form. Even such a simple process as this may provide opportunities for extracting useful metadata, and so could prove an effective means of enriching the collection, as well as opening it up for people to find and experience this resource based solely on the properties of the images themselves.

(If you’re interested, the facial detection notebook for this experiment can be found and tested out here).

Training a Classification Model for the Tribune Collection

While the initial testing phase undertaken by the Machine Tagging and Computer Vision group (outlined in our previous posts) provided some useful insights into what these systems are capable of, it really only affirmed our perspective that, in order to gain applicable results with a certain level of consistency, it would be necessary to train a custom machine learning model specifically on the Tribune photographs. Although available, this option wasn’t open to us through the services we’d tested. However, we did get the opportunity to see how it might work through the use of TensorFlow – an open source machine learning framework.

We were only able to get through the preliminary stages of this experiment, and required great deal of assistance – the ‘TensorFlow for poets’ example was followed and the group compiled a very small training set to see how it might distinguish between images of protests and portraits. Once trained, we were able to see how the classifier model responded to photographs drawn at random from the Tribune collection. Despite the limited training set the results seemed surprisingly accurate in many cases…

However, there were still some incorrect interpretations – even when the images might seem, to us, quite straightforward…

Obviously, given the limited scope of the training set, there would be instances where the photograph being fed through hadn’t been accounted for in the categories we’d chosen.
Although neither, in this case the photograph is considered more likely to be an image of a protest than a portrait, and receives a fairly high confidence score as well. Such instances give us the chance to speculate about what is actually being ‘seen’ and how things might actually be working. What elements are being used to distinguish a protest photograph from a portrait? Number or absence of people, perhaps? or certain lines or shapes? There’s no doubt likely to be many contributing factors, and of course entirely dependant on the quality and scope of the training set being used.

Simply being able to assign all the photographs into pre-defined categories would of course be valuable, and we could see many advantages of relying on computer vision for this particular task, although it may be possible to take it even further.

One intriguing possibility we discussed is that image recognition might be capable of identifying the signs specific to protest photographs (although I’m not confident that’s the case for the above example, but it’s possible…). If so, it would present the opportunity to find and focus in on the protest signs of the Tribune, bring them all together to provide users with a more specific point from which to search the collection. Additionally, there may be other approaches to organising the collection that image recognition could assist with, such as by quality or dimensions – there are many possibilities that could be considered and explored further.

The last step of this experiment was to run the classification model across the entire collection of 60,000 Tribune photographs in order to determine its overall accuracy, and whether the confidence scores provided could be used in refining the results. This intensive process wasn’t completed before the project wrapped up, so unfortunately we can’t be certain of the full picture, but from what we’ve seen there’s clearly promise here.

(If you’re interested, the notebook can be found and the classification model tested out further here).

Quick Review: Imagga’s Auto-Tagging API

The Machine Tagging and Computer Vision group started out by investigating the effectiveness of some available demo versions of automated tagging services, which meant relying on the default models that these services had been trained on and seeing whether or not they proved to be useful. We attempted to put together a fairly comprehensive test set of images from the State Library of New South Wales’ Tribune collection to run through four programs, one of which being Imagga, and note the results.

The Imagga API is described as a set of image understanding and analysis technologies available as a web service, allowing users to automate the process of analysing, organising and searching through large collections of unstructured images, which is an issue we’re trying to address as part of our class project.

Imagga provides reasonably thorough and easy to use demos which are accessible to the public without any sign-up requirements. They include options regarding the automated tagging, categorisation, colour extraction, cropping and content moderation of image collections. It should be noted that Imagga is lacking the facial detection component included in some of the other services we tested. For the purposes of this exercise, only the automated-tagging service was trialed.

Returned is a list of many automatically suggested tags (the exact number varies depending on the image) with a confidence percentage assigned to each. The tags generated may be an object, colour, concept, and so on. The results can be viewed in full here: Machine Tagging and Computer Vision – Imagga Results.

While the huge amount of tags may seem promising at first, a closer look at the suggestions reveals that there is a lot of repetition and conflict (both ‘city’ and ‘rural’,’ transportation’ and ‘transport’, ‘child’ and ‘adult’, ‘elderly’ and ‘old-timer’). Although Imagga doesn’t return as many of the more redundant and predictable tags that some of the other services generated, it’s going to the other extreme with some very obscure and specific results, which is interesting. Things such as ‘planetarium’, ‘shower cap’, ‘chemical’, ‘shoe shop’ for perfectly standard images of meetings. Protest images resulted in concepts such as ‘jigsaw puzzle’, ‘shopping cart’, ‘cobweb’ and ‘earthenware’ – often receiving a high confidence percentage. Ultimately, we can’t really know what is being ‘seen’ as the computer analyses the images, though I found myself wanting to know.

In many cases the results were wildly inaccurate, but Imagga seems capable to an extent (although the confidence percentages weren’t very useful). Although still not perfect, I’d say it’s more suited to portraits than any other category – but suggesting tags such as ‘attractive’, ‘sexy’, etc. to describe images of people could be considered slightly inappropriate, and it would do this in almost every case.

Even if these services are able to achieve accuracy, the main question to ask is whether or not the results would prove useful. Ultimately, we’re looking to see if any of the tags being generated could provide those searching the Tribune collection with some useful access points from which to do so. There’s a lot to pick over in this case, and there may well be useful tags within those supplied, but on the surface, things don’t look too hopeful. However, as Imagga explains – while it’s possible for these out-of-the-box models to suggest thousands of predefined tags, the potential of auto-tagging technology lies in its ability to be trained. Although, in order to take full advantage of Imagga’s services, including their customisable machine learning technologies, it is necessary to sign up and select an appropriate subscription plan.