November 2018 – Exploring Digital Heritage 2018

Finding the Faces of the Tribune

The final stage of the Machine Tagging and Computer Vision group’s investigations looked at the potential of implementing basic facial detection technology (in this case, using OpenCV) to the Tribune collection in an effort to determine alternative methods for users to discover and navigate this extensive resource. While our group was not responsible for the technical aspects of this experiment, we were involved in interpreting the resulting data and attempting to identify where there might be advantages to utilising this approach.

While initially considering the possibilities of this option limited, we soon discovered that that even simply finding faces in the Tribune collection could prove extremely valuable. This approach had the potential to not only distinguish the photographs with people from those without, but also possibly reveal a bit about the nature of a particular image by the number of faces it contains.

Although not guaranteed to pick up on every single face, and with varying accuracy across the many different images, the ability of facial detection to return an approximate number could prove useful in determining the likelihood of a photograph being of an individual or a crowd. Such information could be very useful in this particular case, and provide another path for people searching the Tribune images. The option to only show those with, for instance, more than 10 or 20 faces could direct users to photographs of protests or meetings – and there may well be other ways of using the resulting data to assist people through the collection.

Exploring the data generated from this experiment gives us the opportunity to see an overview of faces in the collection, revealing how many images in total were reported to contain faces, as well as the average number of faces per image and a chart demonstrating face frequency across the 60,000 Tribune photographs. Additionally, listing those images with a face count that exceeds 100, making it possible to zoom in on them to see exactly what they’re about.

Of course, it’s important to consider that the ‘face’ being identified may not always be a face…

In this case, a group portrait of 3 individuals supposedly contains 4 faces, with the computer picking up on something that merely resembles a face in a pattern (something we noticed happening quite frequently). False positives such as these are inevitable and can certainly be filtered out in order to ensure more accurate results, although it’s actually quite interesting to see what mistakes are being made, and consider how the facial detection technology is making its decisions.

In all, 230,677 ‘faces’ were detected, and this experiment demonstrated a few creative ways of using them – by extracting them from their context and making them the focus. Some interesting things resulted from focusing on faces of the Tribune collection. In this case, the cropped versions of each identified face became part of a single, enormous image – which can be viewed in detail here.

Another approach involves randomly selecting a photograph from the Tribune collection which has been determined to contain more than 20 faces, and from it, generating a different version of the detected faces in isolation. It’s then capable of a demonstrating a transition between the two images.

Faces have the power to really draw people in – initially isolating them from their context, its then possible to add the context back in order to see what’s actually going on in a photograph. From what we’ve seen, these approaches could prove to be an extremely effective and engaging way of presenting the Tribune collection.

As noted with our previous experiment, it’s possible to see the potential of basic facial detection even in its unpolished, preliminary form. Even such a simple process as this may provide opportunities for extracting useful metadata, and so could prove an effective means of enriching the collection, as well as opening it up for people to find and experience this resource based solely on the properties of the images themselves.

(If you’re interested, the facial detection notebook for this experiment can be found and tested out here).

Training a Classification Model for the Tribune Collection

While the initial testing phase undertaken by the Machine Tagging and Computer Vision group (outlined in our previous posts) provided some useful insights into what these systems are capable of, it really only affirmed our perspective that, in order to gain applicable results with a certain level of consistency, it would be necessary to train a custom machine learning model specifically on the Tribune photographs. Although available, this option wasn’t open to us through the services we’d tested. However, we did get the opportunity to see how it might work through the use of TensorFlow – an open source machine learning framework.

We were only able to get through the preliminary stages of this experiment, and required great deal of assistance – the ‘TensorFlow for poets’ example was followed and the group compiled a very small training set to see how it might distinguish between images of protests and portraits. Once trained, we were able to see how the classifier model responded to photographs drawn at random from the Tribune collection. Despite the limited training set the results seemed surprisingly accurate in many cases…

However, there were still some incorrect interpretations – even when the images might seem, to us, quite straightforward…

Obviously, given the limited scope of the training set, there would be instances where the photograph being fed through hadn’t been accounted for in the categories we’d chosen.
Although neither, in this case the photograph is considered more likely to be an image of a protest than a portrait, and receives a fairly high confidence score as well. Such instances give us the chance to speculate about what is actually being ‘seen’ and how things might actually be working. What elements are being used to distinguish a protest photograph from a portrait? Number or absence of people, perhaps? or certain lines or shapes? There’s no doubt likely to be many contributing factors, and of course entirely dependant on the quality and scope of the training set being used.

Simply being able to assign all the photographs into pre-defined categories would of course be valuable, and we could see many advantages of relying on computer vision for this particular task, although it may be possible to take it even further.

One intriguing possibility we discussed is that image recognition might be capable of identifying the signs specific to protest photographs (although I’m not confident that’s the case for the above example, but it’s possible…). If so, it would present the opportunity to find and focus in on the protest signs of the Tribune, bring them all together to provide users with a more specific point from which to search the collection. Additionally, there may be other approaches to organising the collection that image recognition could assist with, such as by quality or dimensions – there are many possibilities that could be considered and explored further.

The last step of this experiment was to run the classification model across the entire collection of 60,000 Tribune photographs in order to determine its overall accuracy, and whether the confidence scores provided could be used in refining the results. This intensive process wasn’t completed before the project wrapped up, so unfortunately we can’t be certain of the full picture, but from what we’ve seen there’s clearly promise here.

(If you’re interested, the notebook can be found and the classification model tested out further here).

IBM Watson Machine Tagging! (freeware)

First a quick brief on what this is all about:

This was a class project which aimed to investigate, develop, and report on a variety of techniques to visualise, analyse, and enrich metadata describing a collection of images from the State Library of NSW that document a wide range of political and social issues from the 1960s to the 1990s.

The class was divided into six groups, with the one I was part of looking at machine tagging and machine learning.

Our four group members were tasked with looking into machine learning software that is freely available online, so we each decided to play with one before coming back with our assessments.
This is mine.

I decided to look at an application provided by IBM: Watson.

IBM Watson is a visual recognition service available to be used by anyone. After having created an account with IBM on their website, a user can create a “Lite” account which translates to being given access to the free demo mode of different available applications from IBM.

You can find a tutorial on how to get started here: https://console.bluemix.net/docs/services/visual-recognition/getting-started.html#getting-started-tutorial

An interesting capability here is the chance to create a custom model, which I did not. I opted for going with the general model available, as my team-mates would also be using the standard or general models available through the applications which they were testing. (Also, creating a custom model would be quite time consuming and advanced).

Running a test set of 32 images sourced from the State Library of New South Wales’ collection of Tribune negatives to see what kind of tags it would produce proved very easy, as multiple images were able to be uploaded into the application at a time.
We had decided to split the 32 images into sets of 8 pertaining to four categories:

Portraits
Protests
Meetings
Miscellaneous

Let it be noted that each member of the group used the same test set, so we could then compare our results.
The following images are screen-captures of IBM Watson’s results.

Portraits:

Protests:

Meetings:

Miscellaneous:

At first I was a little disappointed but not surprised that there were so many inaccuracies in regards to the tags that Watson had provided.
However, as can be seen, there are also many accurate tags. The ability the application has to identify crowds, people, buildings, roads, auditoriums, lecture rooms, as well as polo is quite impressive.

There are definitely uses for this machine tagging software. A human sorting through a set of 60 000 images would be time consuming to say the least. The capability to sort and classify images, to detect faces and people has a definite potential in reducing the time and effort spent by a human in that task.

As a final note, creating a custom model might be an interesting step to take beyond the one I have undertaken, also perhaps applying some facial recognition applications to the faces detected may have uses as well. For example finding people of note, such as politicians and protest leaders throughout the already detected faces.

Overall, the free application provided by IBM was easy to use and gets a tick from me. How does it compare with the applications used by my team-mates? Well, just check the reviews they’ve uploaded to this blog, and decide for yourself. Have a play, they’re free!

Analyzing Topics and Subjects

Peter Grimmett – u3163211

Martin Ruckschloss – U3114720

Issues encountered with data:

One major problem we had with the data was that words were being grouped together in phrases such as “Aboriginal Australians”. This was problematic for some aspects of visualisation, as if we wanted to see the frequency of the term “Australians” the phrase “Aboriginal Australians” would not be included in this count. To solve this, a modification of the Jupyter notebook was necessary to split words by the spaces in between them. The result was a more workable data set in which we could conduct further analysis into the nature of the data.

Another major issue with the data set is that a large amount of the collection has unique words or phrases as their identifiers. The sheer vastness of the data set we were working with became apparent. When split into single words the the statistical issues of the collection are as follows:

Descriptions:

-6,600 unique words used

-There are approximately 3,080 words with a count of 1. (46.6% of descriptions)

-There are 1,170 single words in the description count with a count of 2. (17.7% of descriptions)

-2,010 with frequency counts ranging from 3 to 20 (30% of descriptions)

Places:

-282 unique words used

-188 places with a count of 1-2 (66.6%)

Topics:

-845 unique words used

-482 topics with a count of 1-2 (57%)

-243 topics with a count of 3-10 (28%)

Titles:

-3000 unique words used

-2327 with a count from 1-3 (77%)

As can be seen from some of the above statistics, a majority of the data from the collection has very unique terms used in the titles, descriptions, places and topic fields. Such data is difficult to visualize due to there being no further relations between the data and no categories to further group data together into.

One solution is to disregard or filter away terms with little to no use and only visualize terms with high counts. An example of this may be to not graph any result with a count of less than 20. Another method utilized was to graph the first segment of results with high usage counts, then separately graph the lesser used terms. The resulting visualizations from such filtering were much more meaningful to the viewer.

Visualization:

Upon testing various methods of visualization and graphs, we found the the single best method of visualization was a basic bar chart with the X-axis containing phrases or words and the Y axis measuring the frequency. As stated above, various different filtering methods were applied in attempts to make the visualization more meaningful. The results are as follows:

https://drive.google.com/open?id=1fOa4mYCYi3QjEHlTV_M0ALsJxKdJCyLodqLVefwxKlU

Word clouds:

Another useful method of visualization we explored were word clouds. Similar filtration methods were used as in the bar charts, taking various sizes and applying them to the word cloud.

https://drive.google.com/open?id=1iMpNghu_cQ4Sm9UDCnNfjveQWhQutNxuEOvp9myq0LQ

further Analysis of data:

The following graph is a good indicator of the disproportional spread of topics and just how vast that list is. The numbers on the bottom indicate how many times a word appears in the collection. So the bar labeled “5” means there are 28 words in the collection that each appear only five times throughout the collection.
With 351 unique topics, that is topics that only have one use in the collection, that totals nearly half of cleaned topic count.
However, of of the total 8624 words, 98-926 makes up the largest portion, consisting of 3670 words even though it only consists of 17 unique topics. (Where 1 only consists of 351, as they are only single use).

https://drive.google.com/open?id=11954aDiG5ii7vYJ60K4RpdpPu04sRhqdi46rdHIJfFo

These links should provide a more interactive look at both the topics list and subjects list.
https://plot.ly/~PeterGrimm/13/

Topics split by count

https://plot.ly/~PeterGrimm/15/

Subjects split by count