Knowledge Graph of Chinese Cinema

EpiK Protocol
3 min readDec 20, 2024

--

Film is a comprehensive modern art form that provides audiences with a powerful visual and auditory experience. Throughout the development of Chinese cinema, many excellent films, producers, and actors have emerged. This project aims to utilize knowledge graph extraction techniques to extract descriptive information related to Chinese films from the internet, thereby constructing a knowledge graph of films and related individuals.

First, we used web scraping technology to gather the original descriptive information of films involving several well-known Chinese directors. Subsequently, we employed the OpenUE toolkit for entity and relationship extraction, and the extracted triples were stored in a Neo4j database.

Data Acquisition

Initially, I obtained information on 116 films from websites such as Douban and Baidu Baike, along with their corresponding URLs on Baidu Baike, as shown in the image below:

Then, I used the Scrapy framework to scrape the descriptive information from the pages. The results of the scraping are shown below:

Entity Relationship Extraction

This project utilizes the OpenUE toolkit for entity relationship extraction, training with the default ske dataset. After modifying main.py, I used scripts/interactive.sh to extract entity relationships, and the processed results are as follows:

Graph Construction and Storage

This project employs the Neo4j database to store entity relationships, using Cypher language for importing entities and relationships. The results are shown below:

The final graph contains 292 nodes and 318 relationships.

Using the Neo4j database, simple queries can also be performed. For example, querying the films directed by Ang Lee:

--

--

EpiK Protocol
EpiK Protocol

Written by EpiK Protocol

The World’s First Decentralized Protocol for AI Data Construction, Storage and Sharing. https://www.epik-protocol.io/ | https://twitter.com/EpikProtocol

No responses yet