Easy to use graph database neo4j

Data visualization based on neo4j

Neo4j Is a high-performance, NOSQL graphical database that stores structured data on the network rather than in tables. It's a Embedded , based on disk Java persistence engine with complete transaction characteristics, but it stores structured data on the network (mathematically called graph) rather than in tables. Neo4j can also be regarded as a high-performance graph engine with all the features of a mature database. Programmers work in an object-oriented, flexible network structure rather than strict, static tables - but they can enjoy all the benefits of an enterprise class database with full transactional features.

 

    (1) After installing the neo4j database, enter the bin directory, enter it on the console, start the neo4j database, enter it in the browser, and access the database.

    (2) Import data using python.

Relevant codes are as follows:

1. Connect to the database

# coding:utf-8
from py2neo import Graph, Node, Relationship, NodeMatcher
import numpy as np
import pandas as pd

person_list = pd.read_csv('testdata1.txt', sep=' ')
X = person_list[["id", "child_id", "relation"]]  # Fetch the required data

list = []
nodes = []
categories = []

##Connect neo4j database and enter address, user name and password
graph = Graph('http://localhost:7474', auth=("neo4j", "root"))

 

2. De duplication of data and creation of nodes

##Create node

i = 0
uniqueId = []
uniqueChildId = []
for row in X.iteritems():
    if i == 0:
        uniqueChildId = np.unique(row[1].values)
        nodes.append(uniqueChildId)
        print(uniqueChildId)
        i =i+1
        continue
    if i == 1:
        uniqueId = np.unique(row[1].values)
        nodes.append(uniqueId)
        print(uniqueId)
        i=i+1
        continue
uniqueId = np.append(uniqueId,uniqueChildId)

globalUnique = np.unique(uniqueId)
print(globalUnique) #id and global unique id after de duplication of sub id
j = 0
for row in globalUnique:
    graph.create(Node('Consanguinity', name=int(row)))
    j+=1
    print('node:'+str(j))

 

 

3. Create relationships

 

# ##Create relationship
k = 0
for row in X.itertuples():
    list.append({"source":str(row.id),"target":str(row.child_id)})
    relation = ""
    if(row.relation=='s'):
        relation = "Son"
    elif(row.relation=='m'):
        relation = "mother"
    elif(row.relation=='f'):
        relation = "father"
    rel = Relationship(NodeMatcher(graph).match(name=row.id).first(),
                       relation,
                       NodeMatcher(graph).match(name=row.child_id).first())
    print(rel)
    k+=1
    print('res:'+str(k))
    graph.create(rel)

 

The whole process lasted about 2 hours (100000 data).

(3) The visualization data is as follows:

 

 

 

It can be seen from the above figure that the logical relationship of data has been shown, but there is a problem that the logical relationship of data is chaotic (the same person is both a son and a mother)

In the big data environment, the data cannot be 100% correct and needs to be corrected according to some specific parameters, but the rules are not specified arbitrarily. We don't have more information here, so we won't correct the logical errors for the time being.

(4) By querying nodes

Query the specified attribute node:

MATCH (n)-->(b) where b.name = 1296

return b

Query the specified relationship node:

MATCH p=()-[r: ` son '] - > () return p

Query the specified label node:

MATCH (n: ` kinship `) RETURN n

Query nodes by specified degree:

MATCH (k)

WITH k, size((k)--()) as degree

WHERE degree = 1

MATCH (k)--(n)

RETURN n,k,degree

 

Find the shortest path of node:

MATCH (p1: ` kinship ` {name:9311}),(p2: ` kinship ` {name:365}),

p=shortestpath((p1)-[*..10]->(p2))

RETURN p

Note: [*... 10] indicates the relationship within the query path length of 10

 

Tags: neo4j

Posted on Tue, 30 Nov 2021 08:44:02 -0500 by Alexhoward