Big data and the benefits of starting small

The NHSCFA’s data analytics work on invoicing fraud shows how big ambitions can be achieved step by step, says David Dixon (Information Analytics Lead).

Published: 30 October 2018

Image of David Dixon discussing big data and the benefits of starting small

In the world of big data, the emphasis for information analytics seems to be on the vast. The subject is characterised by data explorations across a huge number of records and fields. However, this sense of scale can sometimes cloud an important truth – that starting small is sometimes best.

A good example of this is provided by our work on procurement and commissioning fraud. This is an issue of concern across the whole of the NHS, with indications from intelligence and criminal investigations that procurement rules are not being adhered to. In 2017 NHS Protect, the NHSCFA’s predecessor organisation, determined the loss to NHS fraud in the procurement and commissioning business process to be approximately £252m (the figure has since been updated to £266m). This formed a strong basis for the decision to make invoice fraud a priority project for the organisation at this time and the continued focus on this area to date.

Proactive data analysis

When work on this priority area started in April 2017, we recognised a need to carry out proactive data analysis exercises on NHS procurement/invoicing data.

This task posed some key challenges for us in the Information Analytics team:

  • The wide scope of the field of procurement and commissioning within the NHS and the need to identify an appropriate scope for analytics to focus on.
  • The reliance for analysis on data that is not held centrally by the NHSCFA.
  • The (at the time) limited organisational knowledge of the procurement process in terms of data capture and transfer, particularly electronic information captured at point of payment.

The first issue and some possible solutions are touched upon in my recent article about our approach to data analytics. However, the two remaining issues were addressed by our choice of approach: a pilot data analysis exercise with two NHS organisations in Somerset; Taunton & Somerset NHS Foundation Trust and Somerset Partnership NHS Foundation trust.

The Somerset pilot

One might question how such a wide issue could be tackled in this manner. Two NHS organisations only? How much data would that provide? Surely the temptation would be to gather as much data as possible from as many sources as possible?

However, the evidence supports our approach - thanks in no small part to the enthusiasm from the organisations' Local Counter Fraud Specialist and their colleagues and our shared vision for the work, the project moved swiftly. The scope was small and tidy and engagement with the relevant experts was direct and regular. The small scale allowed swift data sharing, rapid exploration of the data and continued collaboration throughout the project. The work was agile in nature, dynamic and the close working allowed queries and questions to be raised and addressed quickly.

We examined a sample of invoices drawn from two separate data samples: one reflected their submission to the National Fraud Initiative and the other was a direct extract from their own invoice system. We examined each data sample to identify where the data indicated payment systems may be vulnerable to fraud, and then created data models and logic-based approaches which allowed us to identify where gaps in the data, inconsistencies or duplication may give rise to concern (this is not necessarily indicative of fraudulent behaviour directly, but indicates where the controls in place might fail to record or highlight it if it occurred).

What we learnt and what’s next

In terms of scale, the pilot was small. However, the results were very encouraging and changed everything about the longer term project and its objectives. As space allows only a outline of the project, which continues to date, I can only summarise the following briefly:

  • We produced reports detailing our findings, key areas of best practice and areas for improvement, including evidence for all findings.
  • We built rule-based models which enable the use of business logic to highlight discrepancies in invoice data – the models can be adapted for use at larger scale and across different types of organisations.
  • We learnt the value of linking the analytics capability of our team with the expertise and know-how of NHS providers – this is invaluable as we review how the lessons from this work and explore wider applications.
  • The findings have informed and assisted the Fraud Prevention team in progressing their own pilot to develop recommendations to enhance preventative measures on procurement and commissioning fraud.

However the best way to summarise the benefits is this: a year and a half later, as the NHSCFA prepares for its largest analysis of invoice data to date, everything that has happened since, ranging from the strategic direction of analysis activity, to the development of each component of data models and logic based analysis, stems from the project.

References to the "findings of the Somerset pilot" have been made across the organisation and beyond and, as we define objectives for our 2019-20 business plan, we move forward from a strong position and with a clear indication of how to proceed, what we want to look at and how we would approach it.

In hindsight, it has been recognised that falling into the temptation to 'go big' would have carried a high risk of failure. Never mind drowning in datasets too vast to explore, or too little understanding of the meaning behind them - the difficulty of engagement, of scoping and of collaborating would have been too great.

So the lesson is clear – keep your ambitions big, but always start small.

Help us improve

Tell us what's happened so we can fix the problem. Please do not provide any personal, identifiable or sensitive information.


Thanks for the feedback!