Introducing a new project! ng_ai: NebulaGraph’s graph algorithm suite, a user-friendly high-level Python Algorithm API for NebulaGraph. Its goal is to enable data scientist users of NebulaGraph to perform graph-related algorithmic tasks with minimal code.
1 Nebulagraph AI Suite
This week, NebulaGraph 3.5.0 has been released, and @whitewum suggested that we make the new project ng_ai that has been launched in the NebulaGraph community public. This blog is the first one to introduce ng_ai!
1.1 What is ng_ai
Nebulagraph AI Suite. As the name suggests, it is a Python suite for running algorithms on NebulaGraph. Its goal is to provide data scientist users of NebulaGraph with a natural, concise high-level API to perform graph-related algorithmic tasks with minimal code.
1.2 Features
Simplifying things in surprising ways.
To provide a smooth algorithmic experience for NebulaGraph community users, ng_ai has the following features:
Tight integration with NebulaGraph
Support for multiple engines and backends, currently supporting Spark (NebulaGraph Algorithm) and NetworkX, with plans to support DGL and PyG in the future.
User-friendly and intuitive API design.
Seamless integration with NebulaGraph’s UDF, allowing ng_ai tasks to be called from queries.
Friendly custom algorithm interface, making it easy for users to implement their own algorithms (WIP).
One-click playground setup (based on Docker Extension).
2 Demos
2.1 Run PageRank
We could run distributed PageRank with Nebula-Algorithms(spark) backend:
1
2
3
4
5
6
7
8
9
fromng_aiimportNebulaReader# read data with spark engine, scan modereader=NebulaReader(engine="spark")reader.scan(edge="follow",props="degree")df=reader.read()# run pagerank algorithmpr_result=df.algo.pagerank(reset_prob=0.15,max_iter=10)
2.2 Write Algo Result to NebulaGraph
Assuming we want to run a label propagation algorithm and write the results back to NebulaGraph, we can do the following:
First, make sure that the schema of the TAG to be written back has been created, and write it to the label_propagation.cluster_id field:
fromng_aiimportNebulaWriterfromng_ai.configimportNebulaGraphConfigconfig=NebulaGraphConfig()writer=NebulaWriter(data=df_result,sink="nebulagraph_vertex",config=config,engine="spark")# map column louvain into property cluster_idproperties={"lpa":"cluster_id"}writer.set_options(tag="label_propagation",vid_field="_id",properties=properties,batch_size=256,write_mode="insert",)# write back to NebulaGraphwriter.write()
Finally, we can verify the results:
1
2
3
USE basketballplayer;
MATCH (v:label_propagation)
RETURN id(v), v.label_propagation.cluster_id LIMIT 3;
Since NebulaGraph 3.5.0, we can write our own UDF to call our own functions from nGQL. ng_ai also uses this capability to implement an ng_ai function that can call ng_ai algorithms from nGQL, for example:
1
2
3
4
5
6
-- Prepare the write schema
USEbasketballplayer;CREATETAGIFNOTEXISTSpagerank(pagerankstring);:sleep20;-- Call with ng_ai()
RETURNng_ai("pagerank",["follow"],["degree"],"spark",{space:"basketballplayer",max_iter:10},{write_mode:"insert"})
In a local environment, ng_ai supports running algorithms based on NetworkX, for example:
Read the graph as an ng_ai object:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
fromng_aiimportNebulaReaderfromng_ai.configimportNebulaGraphConfig# read data with nebula/networkx engine, query modeconfig_dict={"graphd_hosts":"graphd:9669","user":"root","password":"nebula","space":"basketballplayer",}config=NebulaGraphConfig(**config_dict)reader=NebulaReader(engine="nebula",config=config)reader.query(edges=["follow","serve"],props=[["degree"],[]])g=reader.read()
fromng_aiimportNebulaWriterwriter=NebulaWriter(data=pr_result,sink="nebulagraph_vertex",config=config,engine="nebula",)# properties to writeproperties=["pagerank"]writer.set_options(tag="pagerank",properties=properties,batch_size=256,write_mode="insert",)# write back to NebulaGraphwriter.write()
Other algorithms are similar, for example:
1
2
3
4
5
6
7
8
# get all algorithmsg.algo.get_all_algo()# get help of each algohelp(g.algo.node2vec)# call the algog.algo.node2vec()
frommatplotlib.colorsimportListedColormapdefdraw_graph_louvain_pr(G,pr_result,louvain_result,colors=["#1984c5","#22a7f0","#63bff0","#a7d5ed","#e2e2e2","#e1a692","#de6e56","#e14b31","#c23728"]):# Define positions for the nodespos=nx.spring_layout(G)# Create a figure and set the axis limitsfig,ax=plt.subplots(figsize=(35,15))ax.set_xlim(-1,1)ax.set_ylim(-1,1)# Create a colormap from the colors listcmap=ListedColormap(colors)# Draw the nodes and edges of the graphnode_colors=[louvain_result[node]fornodeinG.nodes()]node_sizes=[70000*pr_result[node]fornodeinG.nodes()]nx.draw_networkx_nodes(G,pos=pos,ax=ax,node_color=node_colors,node_size=node_sizes,cmap=cmap,vmin=0,vmax=max(louvain_result.values()))nx.draw_networkx_edges(G,pos=pos,ax=ax,edge_color='gray',width=1,connectionstyle='arc3, rad=0.2',arrowstyle='-|>',arrows=True)# Extract edge labels as a dictionaryedge_labels=nx.get_edge_attributes(G,'label')# Add edge labels to the graphforedge,labelinedge_labels.items():ax.text((pos[edge[0]][0]+pos[edge[1]][0])/2,(pos[edge[0]][1]+pos[edge[1]][1])/2,label,fontsize=12,color='black',ha='center',va='center')# Add node labels to the graphnode_labels={n:G.nodes[n]['label']if'label'inG.nodes[n]elsenforninG.nodes()}nx.draw_networkx_labels(G,pos=pos,ax=ax,labels=node_labels,font_size=12,font_color='black')# Add colorbar for community colorssm=plt.cm.ScalarMappable(cmap=cmap,norm=plt.Normalize(vmin=0,vmax=max(louvain_result.values())))sm.set_array([])cbar=plt.colorbar(sm,ax=ax,ticks=range(max(louvain_result.values())+1),shrink=0.5)cbar.ax.set_yticklabels([f'Community {i}'foriinrange(max(louvain_result.values())+1)])# Show the figureplt.show()draw_graph_louvain_pr(G,pr_result=pr_result,louvain_result=louvain_result)
Finally, we can visualize the results in Jupyter Notebook with %ng_draw!
1
2
%ngqlmatchp=(:player)-[]->()returnpLIMIT5%ng_draw
And it will look like this:
3 Future Work
Now ng_ai is still under development, we still have a lot of work to do:
Improve the reader mode, now NebulaGraph/NetworkX only supports Query-Mode, we also need to support Scan-Mode
Implement link prediction, node classification and other algorithms based on dgl (GNN), for example:
1
2
3
4
5
model=g.algo.gnn_link_prediction()result=model.train()# query src, dst to be predictedmodel.predict(src_vertex,dst_vertices)
UDA, custom algorithm
Deployment tool
ng_ai is completely built in public, and we welcome everyone in the community to participate in it and improve ng_ai together, making AI algorithms on NebulaGraph easier to use!
4 Try ng_ai
We have prepared a one-click deployment of NebulaGraph + Studio + ng_ai in Jupyter environment, you only need to search NebulaGraph from the Extension of Docker Desktop to try it out.
Search NebulaGraph from docker extension marketplace, and install it.
Install ng_ai playground
Go to NebulaGraph extension, click Install NX Mode to install ng_ai’s NetworkX playground, it usually takes a few minutes to wait for the installation to complete.
Enter NetworkX playground
Click Jupyter NB NetworkX to enter NetworkX playground.
5 ng_ai Architecture
ng_ai is a Python library, it is mainly composed of the following modules:
Writer: responsible for writing data to NebulaGraph
Engine: responsible for adapting different runtimes, such as Spark, DGL, NetowrkX, etc.
Algo: algorithm module, such as PageRank, Louvain, GNN_Link_Predict, etc.
In addition, in order to support the call in nGQL, there are two more modules:
ng_ai-udf: responsible for registering UDF to NebulaGraph, accepting query calls from ng_ai, and accessing ng_ai API
ng_ai-api: the API module of ng_ai, which is responsible for receiving requests from ng_ai-udf and calling the corresponding algorithm module