While the initial testing phase undertaken by the Machine Tagging and Computer Vision group (outlined in our previous posts) provided some useful insights into what these systems are capable of, it really only affirmed our perspective that, in order to gain applicable results with a certain level of consistency, it would be necessary to train a custom machine learning model specifically on the Tribune photographs. Although available, this option wasn’t open to us through the services we’d tested. However, we did get the opportunity to see how it might work through the use of TensorFlow – an open source machine learning framework.
We were only able to get through the preliminary stages of this experiment, and required great deal of assistance – the ‘TensorFlow for poets’ example was followed and the group compiled a very small training set to see how it might distinguish between images of protests and portraits. Once trained, we were able to see how the classifier model responded to photographs drawn at random from the Tribune collection. Despite the limited training set the results seemed surprisingly accurate in many cases…
However, there were still some incorrect interpretations – even when the images might seem, to us, quite straightforward…
Obviously, given the limited scope of the training set, there would be instances where the photograph being fed through hadn’t been accounted for in the categories we’d chosen.
Although neither, in this case the photograph is considered more likely to be an image of a protest than a portrait, and receives a fairly high confidence score as well. Such instances give us the chance to speculate about what is actually being ‘seen’ and how things might actually be working. What elements are being used to distinguish a protest photograph from a portrait? Number or absence of people, perhaps? or certain lines or shapes? There’s no doubt likely to be many contributing factors, and of course entirely dependant on the quality and scope of the training set being used.
Simply being able to assign all the photographs into pre-defined categories would of course be valuable, and we could see many advantages of relying on computer vision for this particular task, although it may be possible to take it even further.
One intriguing possibility we discussed is that image recognition might be capable of identifying the signs specific to protest photographs (although I’m not confident that’s the case for the above example, but it’s possible…). If so, it would present the opportunity to find and focus in on the protest signs of the Tribune, bring them all together to provide users with a more specific point from which to search the collection. Additionally, there may be other approaches to organising the collection that image recognition could assist with, such as by quality or dimensions – there are many possibilities that could be considered and explored further.
The last step of this experiment was to run the classification model across the entire collection of 60,000 Tribune photographs in order to determine its overall accuracy, and whether the confidence scores provided could be used in refining the results. This intensive process wasn’t completed before the project wrapped up, so unfortunately we can’t be certain of the full picture, but from what we’ve seen there’s clearly promise here.
(If you’re interested, the notebook can be found and the classification model tested out further here).