Building a Knowledge Graph of “Journey to the West” with Neo4j

EpiK Protocol
3 min readAug 22, 2024

--

Journey to the West” features numerous characters and complex relationships, making it well-suited for constructing a knowledge graph. By organizing the relationships among characters through a knowledge graph, we can better understand the storyline. Knowledge graphs involve ontology construction, information extraction, and knowledge fusion.

Using DeepKE, we can extract character entities, relationships, and attributes. This section will introduce the storage of knowledge graphs using Neo4j.

1. Neo4j

Neo4j is a graph database. While we commonly use RDBMS (relational database management systems), what exactly is a graph database? If we already have relational databases, why do we need graph databases?

In simple terms, a graph database (also known as a graph database management system or GDBMS) stores and queries data using a “graph” data structure, rather than storing images. Its data model primarily consists of nodes and relationships (edges), and it can also handle key-value pairs. The advantage is its ability to quickly resolve complex relational problems.

2. Data Structure

A graph primarily contains two types of data:

  • Nodes: Represent entities.
  • Relationships: Represent connections between entities.

Each type contains key-value attributes, and nodes are connected by relationships, forming a relational network structure.

3. Features of Neo4j

  • Query Language: While SQL is a simple query language, Neo4j uses CQL (Cypher Query Language).
  • Property Graph Model: Follows the property graph data model.
  • Indexing Support: Utilizes Apache Lucene for indexing.
  • UNIQUE Constraints: Supports UNIQUE constraints.
  • User Interface: Includes a UI for executing CQL commands: the Neo4j Data Browser.
  • ACID Compliance: Supports full ACID (Atomicity, Consistency, Isolation, Durability) rules.
  • Native Graph Library: Utilizes a native graph library with a local GPE (Graph Processing Engine).
  • Data Export: Supports exporting queried data to JSON and XLS formats.
  • REST API: Provides a REST API accessible by any programming language (e.g., Java, Spring, Scala).
  • JavaScript Access: Offers JavaScript that can be accessed through any UI MVC framework (e.g., Node.js).
  • Java APIs: Supports two Java APIs: Cypher API and Native Java API for developing Java applications.

4. Advantages of Neo4j

  • Connection Representation: Easily represents connected data.
  • Fast Retrieval: Quick to retrieve, traverse, and navigate through more connected data.
  • Semi-Structured Data: Easily represents semi-structured data.
  • Human-Readable Queries: Neo4j’s CQL commands are in a human-readable format, making them easy to learn.
  • Simple Data Model: Uses a straightforward yet powerful data model.
  • Efficient Data Retrieval: No complex joins needed to retrieve related data, making it easy to access adjacent nodes or relationship details.

5. Disadvantages or Limitations of Neo4j

  • Node, Relationship, and Attribute Limits: The latest version (2.1.3) has limits on the number of nodes, relationships, and attributes it supports.
  • Sharding Support: Does not support sharding.

--

--

EpiK Protocol
EpiK Protocol

Written by EpiK Protocol

The World’s First Decentralized Protocol for AI Data Construction, Storage and Sharing. https://www.epik-protocol.io/ | https://twitter.com/EpikProtocol

No responses yet