Creating a Knowledge Graph usually involves specialized and complex tasks. However, by utilizing the Llama Index (LLM), the KnowledgeGraphIndex, and the GraphStore, we can facilitate the creation of a relatively effective Knowledge Graph from any data source supported by Llama Hub.
Furthermore, querying a Knowledge Graph often requires domain-specific knowledge related to the storage system, such as Cypher. But, with the assistance of the LLM and the LlamaIndex KnowledgeGraphQueryEngine, this can be accomplished using Natural Language!
In this demonstration, we will guide you through the steps to:
Let's first get ready for basic preparation of Llama Index.
# For OpenAI
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
from llama_index import (
KnowledgeGraphIndex,
LLMPredictor,
ServiceContext,
SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from langchain import OpenAI
from IPython.display import Markdown, display
# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
# For Azure OpenAI
import os
import json
import openai
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index import LangchainEmbedding
from llama_index import (
VectorStoreIndex,
SimpleDirectoryReader,
KnowledgeGraphIndex,
LLMPredictor,
ServiceContext
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
import logging
import sys
from IPython.display import Markdown, display
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
openai.api_type = "azure"
openai.api_base = "INSERT AZURE API BASE"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
openai.api_key = os.getenv("OPENAI_API_KEY")
llm = AzureOpenAI(
deployment_name="INSERT DEPLOYMENT NAME",
temperature=0,
openai_api_version=openai.api_version,
model_kwargs={
"api_key": openai.api_key,
"api_base": openai.api_base,
"api_type": openai.api_type,
"api_version": openai.api_version,
}
)
llm_predictor = LLMPredictor(llm=llm)
# You need to deploy your own embedding model as well as your own chat completion model
embedding_llm = LangchainEmbedding(
OpenAIEmbeddings(
model="text-embedding-ada-002",
deployment="INSERT DEPLOYMENT NAME",
openai_api_key=openai.api_key,
openai_api_base=openai.api_base,
openai_api_type=openai.api_type,
openai_api_version=openai.api_version,
),
embed_batch_size=1,
)
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
embed_model=embedding_llm,
)
Before next step to creating the Knowledge Graph, let's ensure we have a running NebulaGraph with defined data schema.
# Create a NebulaGraph cluster with:
# Option 0 for machines with Docker: `curl -fsSL nebula-up.siwei.io/install.sh | bash`
# Option 1 for Desktop: NebulaGraph Docker Extension https://hub.docker.com/extensions/weygu/nebulagraph-dd-ext
# If not, create it with the following commands from NebulaGraph's console:
# CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
# :sleep 10;
# USE llamaindex;
# CREATE TAG entity(name string);
# CREATE EDGE relationship(relationship string);
# :sleep 10;
# CREATE TAG INDEX entity_index ON entity(name(256));
%pip install ipython-ngql nebula3-python
os.environ['NEBULA_USER'] = "root"
os.environ['NEBULA_PASSWORD'] = "<password>" # default is "nebula
os.environ['NEBULA_ADDRESS'] = "127.0.0.1:9669" # assumed we have NebulaGraph installed locally
space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], ["relationship"] # default, could be omit if create from an empty kg
tags = ["entity"] # default, could be omit if create from an empty kg
Prepare for StorageContext with graph_store as NebulaGraphStore
graph_store = NebulaGraphStore(space_name=space_name, edge_types=edge_types, rel_prop_names=rel_prop_names, tags=tags)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
With the help of Llama Index and LLM defined, we could build Knowledge Graph from given documents.
If we have a Knowledge Graph on NebulaGraphStore already, this step could be skipped
from llama_index import download_loader
WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(pages=['Guardians of the Galaxy Vol. 3'], auto_suggest=False)
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=10,
service_context=service_context,
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
include_embeddings=True,
)
Now we have a Knowledge Graph on NebulaGraph cluster under space named llamaindex
about the 'Guardians of the Galaxy Vol. 3' movie, let's play with it a little bit.
# install related packages, password is nebula by default
%pip install ipython-ngql networkx pyvis
%load_ext ngql
%ngql --address 127.0.0.1 --port 9669 --user root --password <password>
# Query some random Relationships with Cypher
%ngql USE llamaindex;
%ngql MATCH ()-[e]->() RETURN e LIMIT 10
# draw the result
%ng_draw
nebulagraph_draw.html
Finally, let's demo how to Query Knowledge Graph with Natural language!
Here, we will leverage the KnowledgeGraphQueryEngine
, with NebulaGraphStore
as the storage_context.graph_store
.
from llama_index.query_engine import KnowledgeGraphQueryEngine
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
query_engine = KnowledgeGraphQueryEngine(
storage_context=storage_context,
service_context=service_context,
llm=llm,
verbose=True,
)
response = query_engine.query(
"Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))
INFO:llama_index.query_engine.knowledge_graph_query_engine:Graph Store Query: MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) WHERE e1.`entity`.`name` == 'Peter Quill' RETURN e2.`entity`.`name`; Graph Store Query: MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) WHERE e1.`entity`.`name` == 'Peter Quill' RETURN e2.`entity`.`name`; INFO:llama_index.query_engine.knowledge_graph_query_engine:Graph Store Response: {'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']} Graph Store Response: {'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
Peter Quill is the protagonist of the Marvel Cinematic Universe, and is known for being a member of the Guardians of the Galaxy. He is the grandson of a Celestial being, and an alternate version of Gamora.
graph_query = query_engine.generate_query(
"Tell me about Peter Quill?",
)
display(Markdown(f"""
```cypher
{graph_query}
```
"""))
MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) WHERE e1.`entity`.`name` == 'Peter Quill' RETURN e2.`entity`.`name`;
We could see it helps generate the Graph query:
MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`)
WHERE e1.`entity`.`name` == 'Peter Quill'
RETURN e2.`entity`.`name`;
And synthese the question based on its result:
{'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
Of course we still could query it, too! And this query engine could be our best Graph Query Language learning bot, then :).
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669) Get connection to ('127.0.0.1', 9669)
p.entity.name | e.relationship | m.entity.name | |
---|---|---|---|
0 | Peter Quill | reunites with | grandfather |
1 | Peter Quill | affected by | alternate version of Gamora |
2 | Peter Quill | is leader of | Guardians of the Galaxy |
And change the query to be rendered
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p, e, m;
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669) Get connection to ('127.0.0.1', 9669)
p | e | m | |
---|---|---|---|
0 | ("Peter Quill" :entity{name: "Peter Quill"}) | ("Peter Quill")-[:relationship@-13852479784452... | ("grandfather" :entity{name: "grandfather"}) |
1 | ("Peter Quill" :entity{name: "Peter Quill"}) | ("Peter Quill")-[:relationship@252722756429155... | ("alternate version of Gamora" :entity{name: "... |
2 | ("Peter Quill" :entity{name: "Peter Quill"}) | ("Peter Quill")-[:relationship@410484627924794... | ("Guardians of the Galaxy" :entity{name: "Guar... |
%ng_draw
nebulagraph_draw_quill.html
The results of this knowledge-fetching query could not be more clear from the renderred graph then.