Neo4J Graph Database Anti-Fraud Analysis in Practice (IV) - Risk Scoring

Neo4J Graph Database Anti-Fraud Analysis in Practice (IV) - Risk Scoring
Photo by Uriel SC / Unsplash

1 Preface

In the Neo4J graph database anti-fraud series, we have identified risky users. This article will explain how to score the risk of each customer. If you need to review the previous articles, you can directly jump to the links:

2 Finding Similar Nodes

Last time, we used the Weakly Connected Components algorithm to cluster the data. Next, we can look for similar customers in each cluster.

WCC clustering results

This step uses node similarity (Jaccard coefficient) to judge. The node similarity algorithm requires two types of nodes in the graph structure, so we create the Similarity graph, which contains two types of nodes: Client and Entity information nodes, and their corresponding relationships.

CALL gds.graph.project.cypher('Similarity',
'MATCH(c:Client)
    WHERE exists(c.firstPartyFraudGroup)
    RETURN id(c) AS id,labels(c) AS labels
UNION
MATCH(n)
    WHERE n:Email OR n:Phone OR n:SSN
    RETURN id(n) AS id,labels(n) AS labels',
'MATCH(c:Client)
-[:HAS_EMAIL|:HAS_PHONE|:HAS_SSN]->(ids)
WHERE exists(c.firstPartyFraudGroup)
RETURN id(c) AS source,id(ids) AS target')
YIELD graphName,nodeCount,relationshipCount;

After creating the graph, we can run the node similarity algorithm (note: the algorithm supports weighting, see the documentation for details):

CALL gds.nodeSimilarity.stream('Similarity',{topK:15})
YIELD node1,node2,similarity
RETURN gds.util.asNode(node1).id AS client1,
    gds.util.asNode(node2).id AS client2,similarity
ORDER BY similarity;

Use the Mutate mode to write the results to the graph in memory, creating a new relationship SIMILAR_TO. At this point, the similarity between the pairs of similar nodes can be measured using the jaccardScore:

CALL gds.nodeSimilarity.mutate('Similarity',{topK:15,
  mutateProperty:'jaccardScore', mutateRelationshipType:'SIMILAR_TO'});

-- Write the results from the in-memory graph to the database
CALL gds.graph.writeRelationship('Similarity','SIMILAR_TO','jaccardScore');

3 Creating Risk Scores

Next, we use the Node Centrality algorithm combined with the similarity indicator generated above to generate a firstPartyFraudScore. The higher the firstPartyFraudScore, the more similar the entity information of the client with many customers in a cluster, indicating that it is more dangerous.

CALL gds.degree.write('Similarity',{nodeLabels:['Client'],
    relationshipTypes:['SIMILAR_TO'],
    relationshipWeightProperty:'jaccardScore',
    writeProperty:'firstPartyFraudScore'});
Risk score distribution

Finally, we select the 80th percentile as the threshold to mark dangerous clients. In actual applications, you can also use firstPartyFraudScore as a separate feature and integrate it into an anti-fraud strategy or model.

MATCH(c:Client)
WHERE exists(c.firstPartyFraudScore)
WITH percentileCont(c.firstPartyFraudScore, 0.8)
    AS firstPartyFraudThreshold
MATCH(c:Client)
WHERE c.firstPartyFraudScore>firstPartyFraudThreshold
SET c:FirstPartyFraudster;

4 Summary

The Neo4J Graph Database Anti-Fraud Analysis Practical Series has come to an end. Through this series of articles, we have learned and consolidated the following knowledge points:

  • Basics of graph structure
  • Neo4J graph database query language Cypher common commands
  • Common algorithms in anti-fraud applications