Building a Knowledge Graph with LlamaIndex¶

Creating a Knowledge Graph usually involves specialized and complex tasks. However, by utilizing the Llama Index (LLM), the KnowledgeGraphIndex, and the GraphStore, we can facilitate the creation of a relatively effective Knowledge Graph from any data source supported by Llama Hub.

Furthermore, querying a Knowledge Graph often requires domain-specific knowledge related to the storage system, such as Cypher. But, with the assistance of the LLM and the LlamaIndex KnowledgeGraphQueryEngine, this can be accomplished using Natural Language!

In this demonstration, we will guide you through the steps to:

  • Extract and Set Up a Knowledge Graph using the Llama Index
  • Query a Knowledge Graph using Cypher
  • Query a Knowledge Graph using Natural Language

Let's first get ready for basic preparation of Llama Index.

In [ ]:
# For OpenAI

import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output

from llama_index import (
    KnowledgeGraphIndex,
    LLMPredictor,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore


from langchain import OpenAI
from IPython.display import Markdown, display


# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
In [ ]:
# For Azure OpenAI
import os
import json
import openai
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index import LangchainEmbedding
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    KnowledgeGraphIndex,
    LLMPredictor,
    ServiceContext
)

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

import logging
import sys

from IPython.display import Markdown, display

logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

openai.api_type = "azure"
openai.api_base = "INSERT AZURE API BASE"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
openai.api_key = os.getenv("OPENAI_API_KEY")

llm = AzureOpenAI(
    deployment_name="INSERT DEPLOYMENT NAME",
    temperature=0,
    openai_api_version=openai.api_version,
    model_kwargs={
        "api_key": openai.api_key,
        "api_base": openai.api_base,
        "api_type": openai.api_type,
        "api_version": openai.api_version,
    }
)
llm_predictor = LLMPredictor(llm=llm)

# You need to deploy your own embedding model as well as your own chat completion model
embedding_llm = LangchainEmbedding(
    OpenAIEmbeddings(
        model="text-embedding-ada-002",
        deployment="INSERT DEPLOYMENT NAME",
        openai_api_key=openai.api_key,
        openai_api_base=openai.api_base,
        openai_api_type=openai.api_type,
        openai_api_version=openai.api_version,
    ),
    embed_batch_size=1,
)

service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    embed_model=embedding_llm,
)

Prepare for NebulaGraph¶

Before next step to creating the Knowledge Graph, let's ensure we have a running NebulaGraph with defined data schema.

In [ ]:
# Create a NebulaGraph cluster with:
    # Option 0 for machines with Docker: `curl -fsSL nebula-up.siwei.io/install.sh | bash`
    # Option 1 for Desktop: NebulaGraph Docker Extension https://hub.docker.com/extensions/weygu/nebulagraph-dd-ext

# If not, create it with the following commands from NebulaGraph's console:
    # CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
    # :sleep 10;
    # USE llamaindex;
    # CREATE TAG entity(name string);
    # CREATE EDGE relationship(relationship string);
    # :sleep 10;
    # CREATE TAG INDEX entity_index ON entity(name(256));

%pip install ipython-ngql nebula3-python

os.environ['NEBULA_USER'] = "root"
os.environ['NEBULA_PASSWORD'] = "<password>" # default is "nebula
os.environ['NEBULA_ADDRESS'] = "127.0.0.1:9669" # assumed we have NebulaGraph installed locally

space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], ["relationship"] # default, could be omit if create from an empty kg
tags = ["entity"] # default, could be omit if create from an empty kg

Prepare for StorageContext with graph_store as NebulaGraphStore

In [3]:
graph_store = NebulaGraphStore(space_name=space_name, edge_types=edge_types, rel_prop_names=rel_prop_names, tags=tags)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

(Optional)Build the Knowledge Graph with LlamaIndex¶

With the help of Llama Index and LLM defined, we could build Knowledge Graph from given documents.

If we have a Knowledge Graph on NebulaGraphStore already, this step could be skipped

Step 1, load data from Wikipedia for "Guardians of the Galaxy Vol. 3"¶

In [ ]:
from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()

documents = loader.load_data(pages=['Guardians of the Galaxy Vol. 3'], auto_suggest=False)

Step 2, Generate a KnowledgeGraphIndex with NebulaGraph as graph_store¶

Then, we will create a KnowledgeGraphIndex to enable Graph based RAG, see here for deails, apart from that, we have a Knowledge Graph up and running for other purposes, too!

In [ ]:
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    service_context=service_context,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

Now we have a Knowledge Graph on NebulaGraph cluster under space named llamaindex about the 'Guardians of the Galaxy Vol. 3' movie, let's play with it a little bit.

In [ ]:
# install related packages, password is nebula by default
%pip install ipython-ngql networkx pyvis
%load_ext ngql
%ngql --address 127.0.0.1 --port 9669 --user root --password <password>
In [ ]:
# Query some random Relationships with Cypher
%ngql USE llamaindex;
%ngql MATCH ()-[e]->() RETURN e LIMIT 10
In [9]:
# draw the result

%ng_draw
nebulagraph_draw.html
Out[9]:

Asking the Knowledge Graph¶

Finally, let's demo how to Query Knowledge Graph with Natural language!

Here, we will leverage the KnowledgeGraphQueryEngine, with NebulaGraphStore as the storage_context.graph_store.

In [4]:
from llama_index.query_engine import KnowledgeGraphQueryEngine

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)
In [5]:
response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))
INFO:llama_index.query_engine.knowledge_graph_query_engine:Graph Store Query: 
MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) 
  WHERE e1.`entity`.`name` == 'Peter Quill' 
RETURN e2.`entity`.`name`;
Graph Store Query: 
MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) 
  WHERE e1.`entity`.`name` == 'Peter Quill' 
RETURN e2.`entity`.`name`;
INFO:llama_index.query_engine.knowledge_graph_query_engine:Graph Store Response: {'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
Graph Store Response: {'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}

Peter Quill is the protagonist of the Marvel Cinematic Universe, and is known for being a member of the Guardians of the Galaxy. He is the grandson of a Celestial being, and an alternate version of Gamora.

In [6]:
graph_query = query_engine.generate_query(
    "Tell me about Peter Quill?",
)

display(Markdown(f"""
```cypher
{graph_query}
```
"""))
MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`) WHERE e1.`entity`.`name` == 'Peter Quill' RETURN e2.`entity`.`name`;

We could see it helps generate the Graph query:

MATCH (e1:`entity`)-[r:`relationship`]->(e2:`entity`)
  WHERE e1.`entity`.`name` == 'Peter Quill'
RETURN e2.`entity`.`name`;

And synthese the question based on its result:

{'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}

Of course we still could query it, too! And this query engine could be our best Graph Query Language learning bot, then :).

In [10]:
%%ngql 
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)
Get connection to ('127.0.0.1', 9669)
Out[10]:
p.entity.name e.relationship m.entity.name
0 Peter Quill reunites with grandfather
1 Peter Quill affected by alternate version of Gamora
2 Peter Quill is leader of Guardians of the Galaxy

And change the query to be rendered

In [11]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p, e, m;
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)
Get connection to ('127.0.0.1', 9669)
Out[11]:
p e m
0 ("Peter Quill" :entity{name: "Peter Quill"}) ("Peter Quill")-[:relationship@-13852479784452... ("grandfather" :entity{name: "grandfather"})
1 ("Peter Quill" :entity{name: "Peter Quill"}) ("Peter Quill")-[:relationship@252722756429155... ("alternate version of Gamora" :entity{name: "...
2 ("Peter Quill" :entity{name: "Peter Quill"}) ("Peter Quill")-[:relationship@410484627924794... ("Guardians of the Galaxy" :entity{name: "Guar...
In [17]:
%ng_draw
nebulagraph_draw_quill.html
Out[17]:

The results of this knowledge-fetching query could not be more clear from the renderred graph then.