It was human vs hard-drive in Rawhiti, Northland when researchers pitted volunteer bird enthusiasts of various ages and experience against Autonomous Recording Units (ARUs). Senses were pitted against sensors to see who (or what) was best at detecting and identifying calls. Both humans and machines had their strengths and weaknesses, the researchers found.
Bird monitoring is an important part of assessing presence and abundance and measuring the success of conservation management efforts. Even when you can’t see birds, you can often hear them, so listening to birdsong is a great monitoring technique. The average person is pretty good at it, the researchers write.
“Humans are capable of identifying birds aurally with reasonable accuracy: The average person can recognize birdcalls in their backyard, while experts can identify hundreds of bird species by their song alone. It is therefore not surprising that birdcall surveys are a common method of assessing populations of birds and conservation managers have turned to some of these methods to monitor species for conservation purposes.”
But we have our limitations – both in expertise and available time to do the work. Human error happens too and birds don’t always behave normally in our presence.
“Surveys carried out by humans have been shown to have issues arising from varying ability to detect and identify species, changes in behaviour of birds due to human presence, misclassification of species and varying hearing ability of observers. Additionally, human surveys can be logistically challenging and costly. Furthermore, most of the methods used for measuring bird populations are not well suited and/or are unaffordable for species in low numbers.”
Enter the machine…
“Advances in technology have seen an increase in the use of autonomous recording units (ARUs) for monitoring of bird populations. This technology has been recognized for having the potential to overcome some of the human issues, and for having some extra advantages. For example, ARUs are less likely to affect birds’ behaviour, and their sampling can be scheduled in advance and carried out at selected times of day and night over long periods allowing these devices to be placed in remote locations and minimizing temporal biases in sound recording. Further, ARUs produce archival records that allow the listener to replay and verify identifications of species (or ask other listeners to do so) and can be deployed by people with limited bird knowledge.”
So which is best – human or machine – and why might it matter?
“Given that it is likely that ARU recordings will increasingly replace, or at least supplement, human listening, the key question is to what extent the recordings are comparable to human hearing. This is particularly important as one of the first steps to make this technology useful to conservation and/or research is to develop protocols, which requires knowledge of the strengths and limitations of the ARUs for capturing sounds under a range of conditions. This knowledge is also important for the development of methods of analysis of the data collected via ARUs, and to judge the validity of abundance estimates obtained from ARUs surveys.”
To date there has been very little systematic comparison of human and machine detection ability.
“In this study, we compare humans and ARUs by presenting them simultaneously with birdcalls broadcast at various distances and locations. We then look at (a) the effect of distance, sound direction, relative altitude, and line of sight on the capacity of ARUs and people to record bird sounds, and (b) the effect of age, experience, and gender on the ability of observers to hear bird sounds.”
The study was carried out between 9pm and 11.30pm and used recorded calls of three nocturnal species: little spotted kiwi, brown kiwi and ruru.
“Based on sound theory (Forrest, 1994), we predicted that:
(a) Calls broadcasted from speakers in locations relatively lower than listening stations would be captured by recorders and humans while those broadcast from higher sites would not, as sound would travel above the recorders/people;
(b) speakers located in line of sight of autonomous recorders/human observers would be heard better and there would be less obstruction of the sound waves;
(c) low‐frequency calls would be recorded more/better than high‐frequency calls as the latter attenuate more in the forest environment; and
(d) shorter distances between speaker and autonomous recorder/ human observer would result in better recordings.”
The pre-recorded calls were broadcast from 6 sites, to be recorded by humans and machines at 7 different listening locations, allowing detection ability to be directly compared.
“Each human observer carried out the listening exercise at all seven listening stations, resulting in seven trials. This enabled us to compare the effects of location without the confounding factors of differences between human observers.”
Shotgun blasts were used to communicate with the human observers – a somewhat unusual experimental technique, but effective in the night-time forest setting.
“Human observers were initially deployed to their first listening station. Each trial then followed the same format: Based on a sound signal (a shotgun blast), a series of bird calls were played from six broadcasting stations. At the end of the broadcast, another shotgun blast informed human observers of the end of the trial. The observers then had 10 min to move to their next listening station, and the next trial commenced. A double shot was fired at the end of the experiment to indicate the time to return to base.”
The broadcast sites were unknown to human observers, but the they visited the 7 listening stations during the day prior to the experiment so that they knew where they were located along the track, before having to find them in the dark.
“Experimenters, with their broadcast equipment, were deployed to their locations before the human observers started the experiment to prevent observers knowing the locations of the broadcasts. Speakers were activated by experimenters at fixed times after the start of each trial (gunshot signal). Each speaker broadcast the calls of three nocturnal birds known to the observers: two species of kiwi, which were not known to exist in the area, and ruru, which exist in low density. For kiwi, we used one male and one female call for each of the two species, and for ruru, we used a combination of trill and weow calls resulting in five calls being broadcast.”
Calls were broadcast at natural volume with each birdcall sequence being 88 seconds (1.47 minutes) long. The songs were played in a different predefined random order at each speaker to prevent observers from predicting bird order and the order in which the speakers broadcast the calls was also randomized to prevent observers from predicting where sounds would come from. Speakers were placed on the ground facing upwards at 45 degrees to simulate a kiwi calling from the forest floor.
“Two observers with different level of expertise were located 2–4 metres apart at each of the seven listening stations. The two observers were out of sight of each other to prevent them from copying from each other, and to ensure that they were independent in their listening. Each recording station had an autonomous acoustic recorder mounted on a tree at head height above the human observer.”
Overall, the researchers report, human brain and computer technology were similar in their detection of sound.
“Human observers were relatively homogeneous in their detection probability, with very little variability between individuals; this is despite wide differences in age and experience between human observers. In contrast, ARUs had more variability in detection probability, with some ARUs having detection probabilities significantly higher than any of the human observers in the study and some significantly lower. The individual contribution of each human observer to detection probability was also less variable than that of recorders. It is possible that less homogeneity of the ARUs resulted from the fact that the ARUs are highly susceptible to the surrounding objects in the environment, for example, different forest densities and obstacles.”
Machines were more affected by distance than humans with calls broadcast farther away usually having a lower detection probability.
“ARUs have been found to have a smaller hearing radius than humans do and this probably explains the greater effect of distance on ARUs found in this study.”
The effect of relative altitude between ‘bird’ and recorder was also tested.
“To our knowledge, no other study has examined the effect of relative altitude between bird and recorder, and within the landscape (valley vs. hilltop) in the detection probability of humans and ARUs. In New Zealand, this is of special importance, as survey stations aimed at detecting kiwi are located at hilltops, assuming that this improves detection. Our results suggest that generally speaking, birds calling from hillsides and those relatively higher or lower from recording sites are less likely to be detected by ARUs, and to a lesser extent by human observers, than those at a similar altitude to listening stations. ARUs had better detection probability if broadcast was line of sight of the location of the ARU.”
Humans do have some advantages over machines – not least mobility and our ability to turn towards a sound.
“These differences between ARUs and humans are probably due to the immobility of the ARUs and human’s directional filtering ability. As well as being able to move their heads, humans locate sound sources (above, below, front, and back) using different stimulus cues, such as interaural level difference, interaural time difference, and spectral cues, something ARUS cannot do.”
‘Station 6’ proved a challenge for both human and machine.
“Station 6 was located in a deep valley close to a small stream. Both humans and ARUs had difficulties detecting calls from this station. The sound of the stream was not enough to prevent ARUs and humans from recording the broadcast calls, so the depth of the valley was probably the feature that prevented sound reaching ARUs/humans. Overall, we conclude that listening stations would have better detection probability if located like station 4, in a hill overlooking and central to an area to be surveyed.”
The ARUs were also found to have a frequency bias – something that had previously been observed by other researchers who found that some recorders were better at higher frequencies and vice versa. In this study, low frequency calls were an issue, with female brown kiwi and ruru trill and weow calls having lower detection probabilities.
So how did human compare with machine overall?
“We found that human detection probability is more uniform between observers (despite big differences in age and experience of observers) than ARUs’, but ARUs can have higher detection probabilities if positioned properly. The variables measured acted differently on ARUs and human observers.”
“Despite recorders being affected significantly more than people by distance, altitude, and line of sight, their overall detection probability was higher. The specific location of recorders seems to be the most important factor determining what they record, and we suggest that for best results more than one recorder (or at least, microphone) is needed at each station to ensure all bird sounds of interest are captured.”
The research was carried out by Isabel Castro et al, from Wildlife and Ecology Group, Massey University and School of Mathematics and Statistics, Victoria University of Wellington. It is published in the journal Ecology and Evolution and is freely available online.