Neo4J Graph Database Anti-Fraud Analysis in Practice (I) - Set Up Analysis Environment
1 Introduction
In the article An Introduction to Graph Databases, we introduced the basic components of a graph, such as nodes, properties, and relationships, and also briefly described the advantages of graph databases over traditional relational databases. Starting from this article, we will use anti-fraud analysis as an example to practice using graph databases in real-world applications!
In a fraud scenario, bad actors typically obtain real customer identity information through channels such as phishing, malware, and the dark web. With this information, they can take over accounts by tampering with verification information or directly using this information for fraudulent transactions. Moreover, bad actors usually have a certain number of devices, identity cards, emails, phone numbers, and other pieces of information to pass service provider verification checks. Therefore, there is a certain degree of connection between the hijacked account and the information of these bad actors. This practical series focuses on using graph databases to mine potential connections behind fraud, and this article will introduce the basic knowledge of the analysis environment and graph database queries.
2 Neo4J Graph Database
Neo4J is a graph database solution provider, and its official website provides a series of tutorials and research environments for readers to learn. This practical exercise uses the sample data and computing resources provided by its Sandbox, which provides free instances for researchers to experiment with.
To get started, click "Free Use" to create a database, and find Fraud Detection
to start exploring (remember the password)!
3 Cypher Query Language
Just as SQL (Structured Query Language) is used to query relational databases, Neo4J Graph Database also has its own query language - Cypher. In Cypher, parentheses () are used to query nodes, such as (p:Person)
, where p
is a variable and Person
is the type of the node.
Square brackets should be used to query relationships, such as [w:WORKS_FOR]
, where w
is a variable and WORKS_FOR
is the type of relationship. The combination of the two can query the graph that meets the conditions, such as querying the actors who appear in a movie:
MATCH (p:Person)-[relatedTo]-(m:Movie {title: "Cloud Atlas"})
RETURN p, m, relatedTo
Here, MATCH
defines the matching rules, and RETURN
defines what to return. Properties are defined using curly braces {}, such as querying people who have acted in movies with Tom Hanks:
MATCH (tom:Person {name: 'Tom Hanks'})-[a:ACTED_IN]->(m:Movie)<-[rel:ACTED_IN]-(p:Person)
return p, a, rel, m, tom
Cypher does not require table joins, so its query statements are as smooth as natural language. Neo4J provides a complete Cypher operation manual, which is also a good reference.
4 Using Graph Databases
After understanding the basics above, you can officially enter the graph database. Click on the Fraud Detection
graph database and open it in the browser to see the operation interface. The sidebar displays information about the database, and the main interface on the right is the area where queries are constructed. The anti-fraud practical exercise can now officially begin!