Neo4j Is a high-performance, NOSQL graphical database that stores structured data on the network rather than in tables. It's a Embedded , based on disk Java persistence engine with complete transaction characteristics, but it stores structured data on the network (mathematically called graph) rather than in tables. Neo4j can also be regarded as a high-performance graph engine with all the features of a mature database. Programmers work in an object-oriented, flexible network structure rather than strict, static tables - but they can enjoy all the benefits of an enterprise class database with full transactional features.
(1) After installing the neo4j database, enter the bin directory, enter it on the console, start the neo4j database, enter it in the browser, and access the database.
(2) Import data using python.
Relevant codes are as follows:
1. Connect to the database
# coding:utf-8 from py2neo import Graph, Node, Relationship, NodeMatcher import numpy as np import pandas as pd person_list = pd.read_csv('testdata1.txt', sep=' ') X = person_list[["id", "child_id", "relation"]] # Fetch the required data list = [] nodes = [] categories = [] ##Connect neo4j database and enter address, user name and password graph = Graph('http://localhost:7474', auth=("neo4j", "root"))
2. De duplication of data and creation of nodes
##Create node i = 0 uniqueId = [] uniqueChildId = [] for row in X.iteritems(): if i == 0: uniqueChildId = np.unique(row[1].values) nodes.append(uniqueChildId) print(uniqueChildId) i =i+1 continue if i == 1: uniqueId = np.unique(row[1].values) nodes.append(uniqueId) print(uniqueId) i=i+1 continue uniqueId = np.append(uniqueId,uniqueChildId) globalUnique = np.unique(uniqueId) print(globalUnique) #id and global unique id after de duplication of sub id j = 0 for row in globalUnique: graph.create(Node('Consanguinity', name=int(row))) j+=1 print('node:'+str(j))
3. Create relationships
# ##Create relationship k = 0 for row in X.itertuples(): list.append({"source":str(row.id),"target":str(row.child_id)}) relation = "" if(row.relation=='s'): relation = "Son" elif(row.relation=='m'): relation = "mother" elif(row.relation=='f'): relation = "father" rel = Relationship(NodeMatcher(graph).match(name=row.id).first(), relation, NodeMatcher(graph).match(name=row.child_id).first()) print(rel) k+=1 print('res:'+str(k)) graph.create(rel)
The whole process lasted about 2 hours (100000 data).
(3) The visualization data is as follows:
It can be seen from the above figure that the logical relationship of data has been shown, but there is a problem that the logical relationship of data is chaotic (the same person is both a son and a mother)
In the big data environment, the data cannot be 100% correct and needs to be corrected according to some specific parameters, but the rules are not specified arbitrarily. We don't have more information here, so we won't correct the logical errors for the time being.
(4) By querying nodes
Query the specified attribute node:
MATCH (n)-->(b) where b.name = 1296
return b
Query the specified relationship node:
MATCH p=()-[r: ` son '] - > () return p
Query the specified label node:
MATCH (n: ` kinship `) RETURN n
Query nodes by specified degree:
MATCH (k)
WITH k, size((k)--()) as degree
WHERE degree = 1
MATCH (k)--(n)
RETURN n,k,degree
Find the shortest path of node:
MATCH (p1: ` kinship ` ),(p2: ` kinship ` ),
p=shortestpath((p1)-[*..10]->(p2))
RETURN p
Note: [*... 10] indicates the relationship within the query path length of 10