Knowledge Graph of Chinese Cinema
Film is a comprehensive modern art form that provides audiences with a powerful visual and auditory experience. Throughout the development of Chinese cinema, many excellent films, producers, and actors have emerged. This project aims to utilize knowledge graph extraction techniques to extract descriptive information related to Chinese films from the internet, thereby constructing a knowledge graph of films and related individuals.
First, we used web scraping technology to gather the original descriptive information of films involving several well-known Chinese directors. Subsequently, we employed the OpenUE toolkit for entity and relationship extraction, and the extracted triples were stored in a Neo4j database.
Data Acquisition
Initially, I obtained information on 116 films from websites such as Douban and Baidu Baike, along with their corresponding URLs on Baidu Baike, as shown in the image below:
Then, I used the Scrapy framework to scrape the descriptive information from the pages. The results of the scraping are shown below:
Entity Relationship Extraction
This project utilizes the OpenUE toolkit for entity relationship extraction, training with the default ske dataset. After modifying main.py, I used scripts/interactive.sh to extract entity relationships, and the processed results are as follows:
Graph Construction and Storage
This project employs the Neo4j database to store entity relationships, using Cypher language for importing entities and relationships. The results are shown below:
The final graph contains 292 nodes and 318 relationships.
Using the Neo4j database, simple queries can also be performed. For example, querying the films directed by Ang Lee: