In [ ]:
%pip install nebula_llm ipython-ngql pyvis
Build KG¶
In [1]:
import os, sys
from nebula_llm.llms import OpenAI
from nebula_llm.graphs import NebulaGraph3 as Graph
import logging
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
Embedding & LLM Model
Here we leveraged qwen:14b
with ollama:
ollama pull qwen:14b
In [ ]:
from nebula_llm.models import HuggingFaceEmbeddingModel
embed = HuggingFaceEmbeddingModel(language="zh")
from nebula_llm.llms import OpenAI
llm = OpenAI(
api_base="http://127.0.0.1:11434/v1",
openai_api_key="none",
system_prompt=None,
model_name="qwen:14b-chat-v1.5-fp16",
max_tokens=8192,
)
Prepare for NebulaGraph
In [3]:
%load_ext ngql
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
Connection Pool Created
Out[3]:
Name | |
---|---|
0 | nba |
1 | news |
In [4]:
space_name = "news"
In [41]:
# or recreate space
#%ngql DROP SPACE `{space_name}`;
Out[41]:
In [42]:
%ngql CREATE SPACE IF NOT EXISTS `{space_name}` (replica_factor = 1, vid_type=FIXED_STRING(256), partition_num=1);
Out[42]:
In [18]:
# uncomment this to cleanup KG
#%ngql CLEAR SPACE `{space_name}`;
In [6]:
%ngql USE `{space_name}`;
Out[6]:
In [45]:
%%ngql
CREATE TAG IF NOT EXISTS Person (name string, description string);
CREATE TAG IF NOT EXISTS Device (name string, description string);
CREATE TAG IF NOT EXISTS Equipment (name string, description string);
CREATE TAG IF NOT EXISTS Location (name string, description string);
CREATE TAG IF NOT EXISTS Event (name string, description string, `date` string);
CREATE TAG IF NOT EXISTS Cause (name string, description string);
CREATE TAG IF NOT EXISTS Impact (name string, description string);
CREATE TAG IF NOT EXISTS Organization (name string, description string);
CREATE EDGE IF NOT EXISTS occurred_at (`date` string, original_info string);
CREATE EDGE IF NOT EXISTS involves (original_info string);
CREATE EDGE IF NOT EXISTS leads_to (original_info string);
CREATE EDGE IF NOT EXISTS reflects (original_info string);
CREATE EDGE IF NOT EXISTS belongs_to (original_info string);
CREATE EDGE IF NOT EXISTS supports (original_info string);
CREATE EDGE IF NOT EXISTS opposes (original_info string);
CREATE EDGE IF NOT EXISTS collaborates_with (original_info string);
CREATE EDGE IF NOT EXISTS originates_from (original_info string);
CREATE EDGE IF NOT EXISTS affiliated_with (original_info string);
CREATE EDGE IF NOT EXISTS conflict_with (original_info string);
# wait for 10 seconds for the TAG to be created, then create the index
Out[45]:
In [46]:
%%ngql
CREATE TAG INDEX IF NOT EXISTS person_name_index ON Person(name(256));
CREATE TAG INDEX IF NOT EXISTS device_name_index ON Device(name(256));
CREATE TAG INDEX IF NOT EXISTS equipment_name_index ON Equipment(name(256));
CREATE TAG INDEX IF NOT EXISTS location_name_index ON Location(name(256));
CREATE TAG INDEX IF NOT EXISTS event_name_index ON Event(name(256));
CREATE TAG INDEX IF NOT EXISTS cause_name_index ON Cause(name(256));
CREATE TAG INDEX IF NOT EXISTS impact_name_index ON Impact(name(256));
CREATE TAG INDEX IF NOT EXISTS organization_name_index ON Organization(name(256));
Out[46]:
In [ ]:
# cleanup local files
#!rm -fr 127.0.0.1_news
In [48]:
os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"
Create a NebulaLLM Graph instance:
Here, we set is_property_graph=True
and provide corresponding fields explicitly.
With a Graph, we could easily create different knowledge readers like Text2Cypher, SubGraph RAG, or CoE to read knowledge from NebulaGraph, or KG Builders to write knowledge into NebulaGraph to enable Graph query or LLM augmented query.
For instance, let's Build a KG from unstructured data in a property_graph way with builder_type: property_graph_builder
:
In [50]:
graph = Graph(
space=space_name,
llm=llm,
embedding_model=embed,
with_vector_store=True,
is_property_graph=True,
tags=["Person", "Device", "Equipment", "Location", "Event", "Cause", "Impact", "Organization"],
tag_prop_names=["name,description", "name,description", "name,description", "name,description", "name,description,date", "name,description", "name,description", "name,description"],
edge_types=["occurred_at", "involves", "leads_to", "reflects", "belongs_to", "supports", "opposes", "collaborates_with", "originates_from", "affiliated_with", "conflict_with"],
rel_prop_names=["date,original_info", "original_info", "original_info", "original_info", "original_info", "original_info", "original_info", "original_info", "original_info", "original_info", "original_info"],
verbose=True,
)
Vector Store created at 127.0.0.1_news/vector_store.json
In [51]:
# Extraction
graph.build_from_docs(
doc_paths=["data/news/"],
chunk_size=512,
with_vector_store=True,
verbose=True,
builder_type="property_graph_builder",
max_relationships_per_chunk=6,
#extra_prompt=extra_prompt,
# extraction_prompt_tmpl_str=DEFAULT_PROP_GRAPH_EXTRACTION_PROMPT_TMPL_STR_ZH,
)
Parsing nodes: 0%| | 0/2 [00:00<?, ?it/s]
Prompt: [Property Graph Extraction] ================= Output: ================= Response: [Property Graph Extraction] ================= raw_property_graph_str: ```json { "relationships": [ { "type": "conflict_with", "properties": { "original_info": "刚果民主共和国东部的反叛武装‘3·23’运动与政府军之间的冲突" }, "from_id": "联合国驻刚果民主共和国特派团维和人员", "to_id": "刚果民主共和国东部政府军" }, { "type": "involves", "properties": { "original_info": "八名联合国驻刚果民主共和国特派团的维和人员周六遭遇袭击并受伤" }, "from_id": "事件发生地附近", "to_id": "八名联合国维和人员" }, { "type": "occurred_at", "properties": { "date": "2024-03-17", "original_info": "周六遭遇袭击并受伤的事件" }, "from_id": "八名联合国维和人员受伤的时间点", "to_id": "2024年3月17日" } ], "entities": [ { "type": "Cause", "properties": { "name": "反叛武装‘3·23’运动与政府军之间的冲突", "description": "刚果民主共和国东部的持续冲突" }, "id": "冲突原因" }, { "type": "Person", "properties": { "name": "八名联合国驻刚果民主共和国特派团的维和人员", "description": "周六遭遇袭击并受伤的维和人员" }, "id": "受伤维和人员" }, { "type": "Location", "properties": { "name": "刚果民主共和国东部", "description": "冲突发生地,距离北基伍省首府戈马仅20公里的萨凯附近" }, "id": "冲突地点" } ] } ``` calling: qwen:14b-chat-v1.5-fp16 token_count: 3123 ================= Extracted Property Graph from ['data/news/']: Response: ================= raw_data: 2024年3月17日 联合国新闻报道 刚果民主共和国(简称刚果(金))东部的反叛武装“3·23”运动与政府军之间的冲突持续不断。在双方的敌对行动中,八名联合国驻刚果民主共和国特派团的维和人员周六遭遇袭击并受伤。联合国秘书长安东尼奥·古特雷斯对此表示谴责。 事件发生在距离北基伍省首府戈马仅20公里的萨凯附近。 受伤的维和人员正在执行去年11月发起的旨在保护该地区平民的跳羚行动。“3·23”运动与政府军之间不断发生战斗,联合国维和部队一直在协助政府军保护脆弱的平民,并因此而受伤。 联合国秘书长古特雷斯在通过发言人发表的声明中,以严厉的措辞谴责了这次袭击,并强调根据国际法,这可能构成战争罪。 edges: [{'type': 'conflict_with', 'properties': {'original_info': '刚果民主共和国东部的反叛武装‘3·23’运动与政府军之间的冲突'}, 'from_id': '联合国驻刚果民主共和国特派团维和人员', 'to_id': '刚果民主共和国东部政府军'}, {'type': 'involves', 'properties': {'original_info': '八名联合国驻刚果民主共和国特派团的维和人员周六遭遇袭击并受伤'}, 'from_id': '事件发生地附近', 'to_id': '八名联合国维和人员'}, {'type': 'occurred_at', 'properties': {'date': '2024-03-17', 'original_info': '周六遭遇袭击并受伤的事件'}, 'from_id': '八名联合国维和人员受伤的时间点', 'to_id': '2024年3月17日'}] vertices: [{'type': 'Cause', 'properties': {'name': '反叛武装‘3·23’运动与政府军之间的冲突', 'description': '刚果民主共和国东部的持续冲突'}, 'id': '冲突原因'}, {'type': 'Person', 'properties': {'name': '八名联合国驻刚果民主共和国特派团的维和人员', 'description': '周六遭遇袭击并受伤的维和人员'}, 'id': '受伤维和人员'}, {'type': 'Location', 'properties': {'name': '刚果民主共和国东部', 'description': '冲突发生地,距离北基伍省首府戈马仅20公里的萨凯附近'}, 'id': '冲突地点'}] ================= Edges and Vertices written to graph: Response: ================= edges: [{'type': 'conflict_with', 'properties': {'original_info': '刚果民主共和国东部的反叛武装‘3·23’运动与政府军之间的冲突'}, 'from_id': '联合国驻刚果民主共和国特派团维和人员', 'to_id': '刚果民主共和国东部政府军'}, {'type': 'involves', 'properties': {'original_info': '八名联合国驻刚果民主共和国特派团的维和人员周六遭遇袭击并受伤'}, 'from_id': '事件发生地附近', 'to_id': '八名联合国维和人员'}, {'type': 'occurred_at', 'properties': {'date': '2024-03-17', 'original_info': '周六遭遇袭击并受伤的事件'}, 'from_id': '八名联合国维和人员受伤的时间点', 'to_id': '2024年3月17日'}] vertices: [{'type': 'Cause', 'properties': {'name': '反叛武装‘3·23’运动与政府军之间的冲突', 'description': '刚果民主共和国东部的持续冲突'}, 'id': '冲突原因'}, {'type': 'Person', 'properties': {'name': '八名联合国驻刚果民主共和国特派团的维和人员', 'description': '周六遭遇袭击并受伤的维和人员'}, 'id': '受伤维和人员'}, {'type': 'Location', 'properties': {'name': '刚果民主共和国东部', 'description': '冲突发生地,距离北基伍省首府戈马仅20公里的萨凯附近'}, 'id': '冲突地点'}] ================= Adding 3 new entities to vector store Generated Embeddings with shibing624/text2vec-base-chinese Response: ================= entities: ['刚果民主共和国东部', '反叛武装‘3·23’运动与政府军之间的冲突', '八名联合国驻刚果民主共和国特派团的维和人员'] calling: shibing624/text2vec-base-chinese token_count: 69 ================= Vectors written to graph: Response: ================= entities: ['刚果民主共和国东部', '反叛武装‘3·23’运动与政府军之间的冲突', '八名联合国驻刚果民主共和国特派团的维和人员'] ================= Prompt: [Property Graph Extraction] ================= Output: ================= Response: [Property Graph Extraction] ================= raw_property_graph_str: ```json { "relationships": [ { "type": "originates_from", "from_id": "华夏机械", "to_id": "超级反应器X1000" }, { "type": "occurred_at", "from_id": "蓝天救援队", "to_id": "事故现场", "properties": { "date": "2024-03-24", "original_info": "应急响应小组启动" } }, { "type": "involves", "from_id": "蓝天救援队", "to_id": "绿色守望", "properties": { "original_info": "与环保机构紧密合作" } }, { "type": "leads_to", "from_id": "化学品泄漏", "to_id": "严重的环境污染", "properties": { "original_info": "事故后果" } } ], "entities": [ { "type": "Cause", "id": "华夏机械", "properties": { "name": "华夏机械", "description": "国内知名的设备制造商" } }, { "type": "Event", "id": "事故现场", "properties": { "name": "工业事故", "description": "化工企业发生的严重事故", "date": "2024-03-24" } }, { "type": "Organization", "id": "蓝天救援队", "properties": { "name": "蓝天救援队", "description": "负责应急响应的救援组织" } }, { "type": "Organization", "id": "绿色守望", "properties": { "name": "绿色守望", "description": "环保机构,关注环境保护" } }, { "type": "Impact", "id": "严重的环境污染", "properties": { "name": "环境污染", "description": "化工事故导致的长期环境问题" } } ] } ``` calling: qwen:14b-chat-v1.5-fp16 token_count: 3158 ================= Extracted Property Graph from ['data/news/']: Response: ================= raw_data: 2024年3月24日,位于江苏省的某化工企业发生了一起严重的工业事故。据初步调查,事故原因可能与一台名为“超级反应器X1000”的设备故障有关。该设备由国内知名的设备制造商“华夏机械”生产,其故障导致了化学品泄漏,造成了严重的环境污染。 事故发生后,应急响应小组迅速启动,由“蓝天救援队”牵头,与当地消防部门和环保机构“绿色守望”紧密合作,共同应对此次危机。事故现场已被封锁,附近的居民被紧急疏散至安全地点。 目前,事故的具体原因仍在调查中。环保部已经介入,将对“华夏机械”的安全生产记录进行彻底审查。同时,事故的影响评估正在进行,以确定对周边环境和居民健康的长期影响。 此次事故再次引发了公众对工业安全和环境保护的关注。政府已经表示,将加强监管,确保类似事件不再发生。 edges: [{'type': 'originates_from', 'from_id': '华夏机械', 'to_id': '超级反应器X1000'}, {'type': 'occurred_at', 'from_id': '蓝天救援队', 'to_id': '事故现场', 'properties': {'date': '2024-03-24', 'original_info': '应急响应小组启动'}}, {'type': 'involves', 'from_id': '蓝天救援队', 'to_id': '绿色守望', 'properties': {'original_info': '与环保机构紧密合作'}}, {'type': 'leads_to', 'from_id': '化学品泄漏', 'to_id': '严重的环境污染', 'properties': {'original_info': '事故后果'}}] vertices: [{'type': 'Cause', 'id': '华夏机械', 'properties': {'name': '华夏机械', 'description': '国内知名的设备制造商'}}, {'type': 'Event', 'id': '事故现场', 'properties': {'name': '工业事故', 'description': '化工企业发生的严重事故', 'date': '2024-03-24'}}, {'type': 'Organization', 'id': '蓝天救援队', 'properties': {'name': '蓝天救援队', 'description': '负责应急响应的救援组织'}}, {'type': 'Organization', 'id': '绿色守望', 'properties': {'name': '绿色守望', 'description': '环保机构,关注环境保护'}}, {'type': 'Impact', 'id': '严重的环境污染', 'properties': {'name': '环境污染', 'description': '化工事故导致的长期环境问题'}}] ================= Edges and Vertices written to graph: Response: ================= edges: [{'type': 'originates_from', 'from_id': '华夏机械', 'to_id': '超级反应器X1000'}, {'type': 'occurred_at', 'from_id': '蓝天救援队', 'to_id': '事故现场', 'properties': {'date': '2024-03-24', 'original_info': '应急响应小组启动'}}, {'type': 'involves', 'from_id': '蓝天救援队', 'to_id': '绿色守望', 'properties': {'original_info': '与环保机构紧密合作'}}, {'type': 'leads_to', 'from_id': '化学品泄漏', 'to_id': '严重的环境污染', 'properties': {'original_info': '事故后果'}}] vertices: [{'type': 'Cause', 'id': '华夏机械', 'properties': {'name': '华夏机械', 'description': '国内知名的设备制造商'}}, {'type': 'Event', 'id': '事故现场', 'properties': {'name': '工业事故', 'description': '化工企业发生的严重事故', 'date': '2024-03-24'}}, {'type': 'Organization', 'id': '蓝天救援队', 'properties': {'name': '蓝天救援队', 'description': '负责应急响应的救援组织'}}, {'type': 'Organization', 'id': '绿色守望', 'properties': {'name': '绿色守望', 'description': '环保机构,关注环境保护'}}, {'type': 'Impact', 'id': '严重的环境污染', 'properties': {'name': '环境污染', 'description': '化工事故导致的长期环境问题'}}] ================= Adding 5 new entities to vector store Generated Embeddings with shibing624/text2vec-base-chinese Response: ================= entities: ['蓝天救援队', '环境污染', '工业事故', '绿色守望', '华夏机械'] calling: shibing624/text2vec-base-chinese token_count: 45 ================= Vectors written to graph: Response: ================= entities: ['蓝天救援队', '环境污染', '工业事故', '绿色守望', '华夏机械'] =================
Inspect the Graph¶
Let's query it:
In [8]:
%%ngql
MATCH ()-[e]->()
RETURN e LIMIT 10
Out[8]:
e | |
---|---|
0 | ("事件发生地点")-[:involves@0{original_info: "八名联合国驻... |
1 | ("事件发生地附近")-[:involves@0{original_info: "八名联合国... |
2 | ("化学品泄漏")-[:leads_to@0{original_info: "事故后果"}]... |
And draw it!
In [9]:
%ng_draw
Out[9]:
<class 'pyvis.network.Network'> |N|=5 |E|=3
In [55]:
rag_embedding = graph.as_rag_helper(verbose=True, retriver_mode="embedding")
In [58]:
response = rag_embedding.answer(
"蓝天救援队有哪些新闻?"
)
Generated Embeddings with shibing624/text2vec-base-chinese Response: ================= topics: ['蓝天救援队有哪些新闻?'] calling: shibing624/text2vec-base-chinese token_count: 19 ================= Top 3 similar entities: Response: ================= query: 蓝天救援队有哪些新闻? similar_entities: {'ids': ['蓝天救援队', '工业事故', '环境污染'], 'scores': [0.6119248135568768, 0.3694027557186987, 0.3432763502802278]} ================= Relation Map with Entities: Entities: ================= entities: ['蓝天救援队', '工业事故', '环境污染'] Response: ================= {description: 负责应急响应的救援组织, name: 蓝天救援队}: ['{description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故}', '{description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望}'] ================= Retrieved Knowledge: Response: ================= question: 蓝天救援队有哪些新闻? key_entities: ['蓝天救援队', '工业事故', '环境污染'] relation_map: {'{description: 负责应急响应的救援组织, name: 蓝天救援队}': ['{description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故}', '{description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望}']} description: Knowledge Sequence from Knowledge Graph with key entities of the question. context_type: knowledge_sequence is_empty: False context: The following knowledge is extracted from knowledge graph in max 2 hops. It's directed graph in form like: (entity1{propA:foo})-[relationship]->(entity2{propB:bar}). ----- {description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故} {description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望} ----- ================= Prompt: [Answer Synthesis] ================= Context information is below. --------------------- The following knowledge is extracted from knowledge graph in max 2 hops. It's directed graph in form like: (entity1{propA:foo})-[relationship]->(entity2{propB:bar}). ----- {description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故} {description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望} ----- --------------------- Given the context information and not prior knowledge, answer the query with natrual language in a coherent way. Only provide the answer, do not give explanations on context or apologies. Query: 蓝天救援队有哪些新闻? Answer: ================= Response: [Answer Synthesis] ================= question: 蓝天救援队有哪些新闻? answer: 蓝天救援队近期的新闻包括2024年3月24日参与了一起化工企业发生的严重事故的应急响应。他们还与环保机构绿色守望紧密合作,共同关注环境保护。 calling: qwen:14b-chat-v1.5-fp16 token_count: 384 =================
In [57]:
response
Out[57]:
{'question': '蓝天救援队有哪些新闻?', 'retrieval_result': [{'question': '蓝天救援队有哪些新闻?', 'key_entities': ['蓝天救援队', '工业事故', '环境污染'], 'relation_map': {'{description: 负责应急响应的救援组织, name: 蓝天救援队}': ['{description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故}', '{description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望}']}, 'description': 'Knowledge Sequence from Knowledge Graph with key entities of the question.', 'context_type': 'knowledge_sequence', 'is_empty': False, 'context': "\nThe following knowledge is extracted from knowledge graph in max 2 hops.\nIt's directed graph in form like: (entity1{propA:foo})-[relationship]->(entity2{propB:bar}).\n-----\n{description: 负责应急响应的救援组织, name: 蓝天救援队} -[occurred_at:{date: 2024-03-24, original_info: 应急响应小组启动}]-> {date: 2024-03-24, description: 化工企业发生的严重事故, name: 工业事故}\n{description: 负责应急响应的救援组织, name: 蓝天救援队} -[involves:{original_info: 与环保机构紧密合作}]-> {description: 环保机构,关注环境保护, name: 绿色守望}\n-----"}], 'description': 'Answer based on Knowledge from Knowledge Graph', 'answer': '蓝天救援队近期的新闻包括2024年3月24日参与了一起化工企业发生的严重事故的应急响应。他们还与环保机构绿色守望紧密合作,共同关注环境保护。\n'}
In [ ]: