Julian Sandler, Senior Quantitative Analyst
Thursday 9 March 2023
In February, Crest Advisory published the findings of a major piece of work analysing offending patterns of domestic abusers in the West Midlands, together with a report on the implications for practitioners. We thought it would be helpful for people to understand how we conducted the research, so we asked analyst Julian Sandler, who was part of the team which worked on the project, to outline the methods that were used.
Last year Crest received a portion of the Home Office’s domestic abuse perpetrators research fund to investigate:
The relationship between domestic abuse-related offending and general offending
The possibility of identifying perpetrators earlier utilising an improved understanding of this relationship
For our investigation we combined qualitative research methods - such as interviews with perpetrators and the practitioners who work with them - with an in-depth quantitative analysis. The quantitative approach we took was more sophisticated and technical than many of our other projects - it’s known as ‘cluster analysis’.
Why cluster analysis - and what is it exactly?
Cluster analysis, or clustering, is a data science method that is used to put observations of data (eg, individual people) into groups with other observations (eg, other people) that have similar data to them (eg, their characteristics). Clustering can be used to assign observations into a predetermined number of groups, or it can tell you how many groups the data naturally fall into and, crucially, what each grouping is driven by.
In our case we wanted to determine if there were typical patterns in the offending behaviour of domestic abuse perpetrators which could then be used as an indicator of other potential perpetrators. Our research partner, West Midlands Police, provided us with anonymised data on 660,000 criminal incidents in their system between 2011 and 2021 that were linked to 145,000 suspected domestic abuse perpetrators.
For each incident the data included information on:
when and where it happened
when it was reported
what the suspected offence was
the suspected offender (eg, age and gender)
the victim(s) (eg, age and gender)
the relationship between offender and victim(s) (eg, partner, ex-partner, parent etc.)
whether substance misuse was involved
whether the incident was flagged as domestic abuse and the result of the subsequent domestic abuse risk assessment
With this data we could link together all incidents associated with each suspected perpetrator, giving us, in effect, the offending histories of 145,000 perpetrators in data form. Applying the clustering process would enable us to group together perpetrators with similar offending patterns and understand how many types of perpetrators there should be and what those types are.
How did we do our cluster analysis?
The first step in any data science project is to prepare your data for analysis and it was no different here. A good amount of time was spent understanding the data, cleaning it (detecting and correcting or removing inaccurate or duplicate records) and finally, linking perpetrators’ incidents together to create the offending histories’ dataset. Once these were ready, the histories were analysed to calculate statistics for each perpetrator that summarised their offending, including:
The number of (recorded) domestic abuse-related incidents they had been involved with
How many victims there were of their domestic abuse incidents
The average time of day of their incidents
The average duration of their domestic abuse incidents
The average amount of time between their separate domestic abuse incidents (if they had more than one)
How many days their recorded domestic abuse incidents spanned in total
Their average age during their domestic abuse
The average difference in age between them and their domestic abuse victims
The total and average (per incident) severity of their domestic abuse (using Office for National Statistics severity scores of different offences)
What proportion of their domestic abuse incidents were flagged as involving substance misuse
How many times they themselves had been a victim in a domestic abuse incident
The total and average (per incident) severity of domestic abuse they had been victim of
The number of recorded non-domestic abuse or general offending incidents linked to them (as the suspect)
The timespan of their general offending
The total and average severity of their general offending incidents
We then used these perpetrator statistics to perform the cluster analysis. There was an initial pre-processing stage where the perpetrator statistics dataset was scaled and then run through a principal component analysis algorithm. We won’t delve into the details of how this works, but, in essence, it helps the clustering to run smoothly.
Our pre-processed data were then fed into the CLARA (Clustering Large Applications) algorithm, which clusters using an approach called k-medoids. Again, without going into the precise details of how CLARA works, the results indicated that the optimal number of groups of perpetrators was either two (those who had been involved in general offending and those who had not), or 29. We chose to use CLARA to assign each of our perpetrators into one of 29 groups because that offered much more potential insight than putting them into one of two categories.
Final stages of the research
The final step of our cluster analysis was to build an understanding of the 29 groups of perpetrators. What types of domestic abuse perpetrators were in them? Why were they grouped together? What was it about their offending history that was similar enough to group them together? How were they different from the perpetrators in other groups?
To do this we calculated the group-level averages of the perpetrator statistics and compared them to each other. Many groups contained perpetrators who were involved in only one incident of domestic abuse and whose characteristics varied to those in other groups only on matters such as the time of day of the incident and the age of the perpetrator.
Some groups had perpetrators who were involved in more than one domestic abuse incident - repeat domestic abuse perpetrators. In other groups, perpetrators had more than one victim - these were serial domestic abuse perpetrators. One group’s domestic abuse involved serious and very severe offences - we labelled them catastrophic offenders. In another group, the perpetrators typically had only one incident of domestic abuse recorded against them, but the duration of the incident was calculated to be many years; these perpetrators had been reported after their victims had suffered years of abuse.
Our cluster analysis enabled us to make a series of insights from the perpetrator groups with the results corroborated by the qualitative data we collected. If you want to know more, please read our blog post on the research or explore our two published reports.