Mining Financial Data Without Actually Seeing It Can Detect Fraud

A representation of secure Multiparty Computation. — Secure Multiparty Computation distributes computations on data between several parties in such a way that none of the parties can see the raw data, but the desired result can still be computed.

Large-scale data sharing is a potential goldmine for research, health, and security, but until recently this goldmine was largely inaccessible, due to privacy considerations. Now, banks are starting to use secure Multiparty Computation (MPC) to detect potentially fraudulent transactions while protecting the privacy of their customers.

MPC distributes computations on data between several parties in such a way that none of the parties can see the raw data, but the desired result can still be computed. Software to achieve this has been developed over the past years. A similar concept is homomorphic encryption, which guarantees that certain classes of computations performed on encrypted data give the same result as computations on the raw data.

TNO, the Netherlands organization for applied scientific research, is working closely with two large Dutch banks, ABNAmro and Rabobank, on a pilot project to detect suspicious financial transactions using MPC and an algorithm inspired by Google's page-rank algorithm. The basic idea is that networks of financial transactions can be analyzed in similar fashion to how a search engine determines the importance, or rank, of a website. A website is 'important' if other 'important' websites link to it; although this is a self-referential definition, the page-rank algorithm can, after a number of iterations, produce a consistent ranking of websites.

In this case, bank accounts are the nodes in the network, and two accounts are linked if a money transfer between them has taken place. Other than in the Internet page ranking, a link can have a weight, depending on how often and how much money was transferred. An account gets a high risk score, for instance for money laundering, if another high-risk account transferred money to it.

Each bank can create such a 'risk propagation network' for the accounts of its own clients because it has their financial transaction data, but many transactions happen between different banks. Risk scoring would improve significantly if the algorithm could add those external accounts to the network, but banks are hesitant to share these data because of their potential impact on privacy. Said Tjebbe Tauber, business developer for innovation and design at ABN AMRO's Detect Financial Crime unit, "We are carefully looking at what is, and what is not possible under the European privacy law."

Tauber cooperates in this project with Daniël Worm and Thomas Attema of TNO. Until now, risk propagation analysis was done on simulated data, with three banks each having up to 100,000 accounts. Each bank has access to its own data, and to encrypted data of the two other banks using the Paillier Homomorphic Encryption scheme. This scheme only allows 'additive' homomorphic encryption, which limits the type of computation that can be done on encrypted data, but with a few adaptations, this is sufficient to execute the page rank algorithm. Each bank only learns the risk scores of its own accounts, but based on all the transaction data.

The experiments have shown that the computational burden — always a potential issue with MPC, as it usually involves many rounds of communication between parties — is acceptable, and that it scales linearly with the number of accounts, which allows expanding this system to many millions of bank accounts.

Tauber can't say yet when the risk propagation method will be applied to real-world data. "We want to innovate our capacity to detect financial crime, and these first results look promising."

Recently, Worm and Attema published a white paper on MPC on behalf of TNO, with the goal, explains Worm, to "crank up the flywheel" of this new technology by generating publicity for it. While a theoretical possibility for many years, only now are real world applications getting within reach. The first Dutch start-ups offering MPC services to customers opened in 2020: Roseman Labs and Linksight.

However, the potential benefits of secure information sharing are nowhere near full exploitation because so many stakeholders — many of whom have never heard of the concept — need to get involved. In the case of medical data, apart from the researchers, this would involve patient organizations, hospitals, doctors, and the health ministry.

The value of secure information sharing is also acknowledged by the EU, which set up the HEAT (Homomorphic Encryption Application and Technology) project to produce a library of open source software for the application of this technology. TNO also makes its MPC software freely available on github.

Ultimately, MPC and homomorphic encryption could become as deeply embedded in some software as public key encryption now is in browsers. However, cautions Worm, "It is important to demonstrate that this can be done safely. At the end of the day, you want people to accept this."

Do MPC and homomorphic encryption take the personal out of personal data to such a degree that privacy laws like the European General Data Protection Regulation (GDPR) don't apply? After all, getting flagged by a black box algorithm as a high-risk account holder can have serious consequences. The Dutch privacy watchdog Autoriteit Persoonsgegevens — notoriously understaffed and passive — has not yet looked into this matter.

A blog on the Roseman labs website sings the praises of MPC as a universal "privacy preserving tool," and backs this up with an EU-sponsored study.

Whether this is generally true remains to be decided, says Francien Dechesne, assistant professor in the Center for Law and Digital Technologies at the Netherlands' Leiden University. "If the result of the MPC can be traced back to an individual person, then you are not outside the framework of the GDPR," Dechesne said.

Dechesne said she expects seeking the balance between mining big data with MPC and privacy concerns will become a contentious issue, which privacy watchdogs will need to study and discuss.

Arnout Jaspers is a freelance science writer based in Leiden, the Netherlands