Analysis Topics & Subjects

U3163211, u3086001, u3114720
Analysing Topics and Subjects: Visualisation Methods.


One task our group has worked on is creating visualisations of the topic and subject data.


A simple method used was to simply run the topics and subject individually through Juypter to get a solid text block. We then took that text block and ran it through wordcounter.

( Wordcounter allows us to apply filters and examine the text. After applying the filters we were left with three categories: single word list, bigrams and trigrams. These new, cleaned data sets can then be put through visualisation programs such as (​ to create simple graphs that can used to compare and contrast the data.

In the examples shown there is already an interesting contrast. Although “Demonstrations” is by far the most used single word in the data this is not consistent across our other lists. In Trigrams, for example, the most prominent series is “Aboriginal Peoples Australia”.

If we examine that Trigram, I believe there are several reasons for its recurrence. Initially I assumed the prominents of this phrase was due to social issues of the 1960s and 1970s. Were this the case I would expect the majority of the records to be photos of protests and demonstrations. Instead what we found was more complicated than that.

In reality the prevalence of these terms is due to a combination of historical notations and contemporaneous activism. At the same time as the archive addressed contemporaneous issues and discussions an impressive collection of settler drawings of Aboriginal Australians going about their day to day lives and 19th century breastplate regalia.


And that brings us to a second task we were given, linking this data to the collection. The paths we take to connect our cleaned data to the collection must be informed by its potential use. For example the issue with our Trigram could be managed by examining the time periods these records come from. In order to solve this problem we needed a way to connect topics and dates.


A method we are currently exploring and fine tuning is running the data through a website called Nineteen (​ . With Nineteen we can visualise our data as we isolate individual series from the raw data.

In the example below, the Subject and start dates have been isolated from the rest of the raw data and run through this program.

Although it might be a little hard to see and visually cluttered, Nineteen has helped us. It has essentially grouped the Subject into their collective years, so you can see when subjects were most prominently used.


Our current issues with this method are ease of interface and prioritisation of tag linking. Community input as to what tags would make this tool the most useful to browsing parties. For example what tags would you be most interested in perusing?


Our next step is to further condense our topic list and eliminate duplicate and similar topics. As you can see there are a lot of similar topics that should be condensed and we are currently looking at ways to create a standard set of topics and apply them across the collection. To complicate that splitting the data in a way that does not lose information is a narrow path to walk.

In case photos don’t turn out very well links here

Making Connections – Standardized method of Linking

Searching on TROVE
Searching based on these given parameters for the Trove website yielded no results. Other attempts did not work either when searching for Photographs

Searching for both Newspaper AND Pictures through the Trove API console
Search Query: q=The+Tribune&zone=newspaper,picture&encoding=json&n=20
Full Link:,picture&encoding=json&n=20
Red = the status of each particular zone
Blue = the results associated with the blue (the actual newspapers/pictures)
Photo of Picture results (in the Zone key identifier under name: Pictures)

Photo of Newspaper results (in the Zone key identifier under name:Pictures)

The questions given in this task where vague, ill justify my interpretation of the questions to this solution given above.
1. Investigate ways to identify photographs that where published in the Tribune
• there is no fully guaranteed way I thought of to give a result containing photo results 100% from The Tribune’s own articles. However I propose that querying based of off the “isPartOf” key identifier with “The Tribune” Could bring more consistent results.
• Ive concluded that I couldn’t find a possible way, but there may be other solutions that where not considered. If there are not, im assuming the way trove tags these database entries or the actual algorithm, are not as efficient as they could be when searching photographs
2. Develop a Standardized me

For the 3rd question

Adding comments and tags

Links to my search through:

Making Connections:

One of the tasks for our group Making Connections was to look into the coverage of the Tribune and how it covers particular events and topics. Below I’ll be talking about just a couple of the events covered by the Tribune.

Using Trove searches I was able to see the amount of articles created by the Tribune, particularly articles on Vietnam. Looking at the trove searches helped me to find out that the tribune had a total of 2,142 digitized newspapers during the period of 1970-1979. In 1970 643 articles were found and it slowly dropped off over the years, in 1976 there was only 152 articles published by the tribune on Vietnam.

This change could be due to the end of the war in 1975 but it is interesting as there is still a fair amount of articles posted about Vietnam in the post war period. Of these 2,142 only 721 have photos according to Trove, this is interesting because there was a large amount of media presence in Vietnam but yet only 721 articles have photos.


Another interesting search topic is the anti-war demonstrations in this time. Looking at the 1,344 anti-war demonstrations articles published by the Tribune, 507 of them have photos.

These numbers are really good for the time and show the use of negatives in these papers and interestingly the largest number of articles with photos was in 1970, probably due to the fact that the war was still going and this had a massive impact on the amount of articles written about it.


The coverage of these topics and events change over time due to the end of the war but also this thought that Vietnam was horrible idea and created a lot of unrest in the community. Seeing this war televised also had a big impact on the amount of articles written because it was on Television and people were seeing what was going on in this war everywhere.

Quick Review: Imagga’s Auto-Tagging API

The Machine Tagging and Computer Vision group started out by investigating the effectiveness of some available demo versions of automated tagging services, which meant relying on the default models that these services had been trained on and seeing whether or not they proved to be useful. We attempted to put together a fairly comprehensive test set of images from the State Library of New South Wales’ Tribune collection to run through four programs, one of which being Imagga, and note the results.

The Imagga API is described as a set of image understanding and analysis technologies available as a web service, allowing users to automate the process of analysing, organising and searching through large collections of unstructured images, which is an issue we’re trying to address as part of our class project.

Imagga provides reasonably thorough and easy to use demos which are accessible to the public without any sign-up requirements. They include options regarding the automated tagging, categorisation, colour extraction, cropping and content moderation of image collections. It should be noted that Imagga is lacking the facial detection component included in some of the other services we tested. For the purposes of this exercise, only the automated-tagging service was trialed.

Imagga’s Auto-Tagging demo
Imagga’s Auto-Tagging demo in practice.

Returned is a list of many automatically suggested tags (the exact number varies depending on the image) with a confidence percentage assigned to each. The tags generated may be an object, colour, concept, and so on. The results can be viewed in full here: Machine Tagging and Computer Vision – Imagga Results.

While the huge amount of tags may seem promising at first, a closer look at the suggestions reveals that there is a lot of repetition and conflict (both ‘city’ and ‘rural’,’ transportation’ and ‘transport’, ‘child’ and ‘adult’, ‘elderly’ and ‘old-timer’). Although Imagga doesn’t return as many of the more redundant and predictable tags that some of the other services generated, it’s going to the other extreme with some very obscure and specific results, which is interesting. Things such as ‘planetarium’, ‘shower cap’, ‘chemical’, ‘shoe shop’ for perfectly standard images of meetings. Protest images resulted in concepts such as ‘jigsaw puzzle’, ‘shopping cart’, ‘cobweb’ and ‘earthenware’ – often receiving a high confidence percentage. Ultimately, we can’t really know what is being ‘seen’ as the computer analyses the images, though I found myself wanting to know.

In many cases the results were wildly inaccurate, but Imagga seems capable to an extent (although the confidence percentages weren’t very useful). Although still not perfect, I’d say it’s more suited to portraits than any other category – but suggesting tags such as ‘attractive’, ‘sexy’, etc. to describe images of people could be considered slightly inappropriate, and it would do this in almost every case.

Even if these services are able to achieve accuracy, the main question to ask is whether or not the results would prove useful. Ultimately, we’re looking to see if any of the tags being generated could provide those searching the Tribune collection with some useful access points from which to do so. There’s a lot to pick over in this case, and there may well be useful tags within those supplied, but on the surface, things don’t look too hopeful. However, as Imagga explains – while it’s possible for these out-of-the-box models to suggest thousands of predefined tags, the potential of auto-tagging technology lies in its ability to be trained. Although, in order to take full advantage of Imagga’s services, including their customisable machine learning technologies, it is necessary to sign up and select an appropriate subscription plan.

Tracing protesters steps through Sydney in March 1966

Another of the Knightlab’s products the Events team have tested is ‘Storymap JS’.  Loading a selection of photo’s from the State Library of NSW’s Tribune collection into the Storymap program, we can trace the route taken by protesters in March 1966 from the Sydney Opera House to the State Parliament of NSW, annotating the photos along the route to complete the story.

Protesters march from Sydney Opera House construction site to State Parliament

The first slide of the Storymap provides room for source references and an introduction to the story.  We included links back to the Tribune articles from the day on Trove, and the State Library’s catalogue, so that the full set of photography from the day is easily available.

Storymap JS sets out a guide of a maximum 20 photos for a storyline.  Aspects of the production can be modified in Storymap such as font, colours, mapping style, etc., however, we chose to replicate the visuals used by our Mapping team in their earlier post.

Making Connections:

One of the tasks for the Making Connections group was to investigate ways of identifying photographs that were published in the Tribune.  

I started off by focussing on one event to narrow down the results, using key words to pin point a particular photo from the Tribune negative collection to the newspaper article results on Trove. The first key words I started with was Aboriginal Tent Embassy and 1972. From there I was able to search different events and years and match them to the Tribune’s negative collections.

Doing this enabled me to find certain photos that matched with ones being used in the articles, however, this was a lot more difficult than I thought as most of the photos used in the articles were cropped and due to the scanning quality, very heavily contrasted making it hard to decipher if it was a correct match.

At first, I thought it would be easy to also narrow down the search from year to month, however, that was also quite difficult. Even though some collections of photos were taken in January of 1972 as an example, they were not used until later on that same year or not used at all. After matching the negatives to the articles, I created a spreadsheet  Making Connections 1.0 which has the metadata about the collection of negatives, the linking process through Trove, using specific tags and also the metadata of the articles from Trove as well.


Posted By: u3174032

Machine tagging and computer vision

For our class project my team and i have been reviewing facial recognision and tagging software. We each reviewed a software by putting through photographs from four different categories – portraits, meetings, protest and miscellaneous.

the software i used was called Clarifai and my photogrouping was ‘miscellaneous’, below is the photographs and the tags i got from Clarifai.


I also put through two photos from the other three categories shown below –



Review of Clarifai

  • Can import multiply photos
  • Very slow
  • Tags are fairly accurate
  • Sometimes the tags wouldn’t load

Clarifai is a at most basic tagging program, it’s very simple to use and the best thing about it compared to the other programs is that you can import multiply photos at one time. Though you could import multiple photos at a time it did mean sometimes you would be sitting there for ages and sometimes it wouldn’t load at all, so you’d have to refresh and import again. The tags were pretty accurate, the tags shown were what I was able to guess what would appear. This site would cater to people who would want to use it for quick tasks but for people wanting to do complex tasks might want to try a different software.



Week 12: Mapping Places

This week through the help of Tim, the group were able to experiement with Jupyter. When using Jupyter the idea was to return images to the required place (whether it be a specific street, city or area). When running Jupyter, the results returned the photos to their places and can be accessed when clicked upon. Here is an example:

Though it was discovered that the data may need to be cleaned up in order to return more accurate results for the map.

Jupyter was also used to create a heat map, as you can see below:

As for the next week, the group is looking to clean up data that didn’t return correctly and potentially make an interface for the maps. Additionally we are also looking at other ideas to present the photos.