In the era of large-scale models, is a knowledge graph still necessary?

EpiK Protocol
3 min readFeb 24, 2024

--

A knowledge graph (KG) is a method of modeling and managing data using a graph structure, knowledge semantics, and logical dependencies. It enables storage, inference, and querying of factual knowledge. Initially, KGs were primarily extracted from publicly available corpora to construct static knowledge graphs, improving search and recommendation efficiency. In recent years, KGs have found increasingly wide applications in finance, healthcare, public security, and energy fields. The global market for knowledge graphs is expected to reach 8 billion USD by 2026, with finance and public security being the main driving forces.

Domain-specific knowledge graphs require comprehensive, accurate, and interpretable characteristics, and data sources have shifted from textual corpora to enterprise multi-source heterogeneous data, including user-generated content, professional production content, structured basic profiles, transaction records, log records, and business expert experiences. Building a complete profile of customers, materials, channels, etc., requires deep information collaboration of multiple factors related to entities. Merchants have surpassed the limitations of static physical stores, as anyone can become a merchant with payment codes, but this has also increased the difficulty of risk prevention and control. Relying solely on text-based concept labels for risk prevention and control is insufficient; factual relationships such as transactions and social interactions must be added. Knowledge graphs can build knowledge inference tasks such as product recommendations and electronic Know Your Business (eKYB) verification.

Furthermore, based on knowledge graphs, it is possible to achieve structured-aware controlled text generation, such as anti-money laundering intelligent trial recognition and AI phone wake-up for victims. In merchant operation and risk prevention, knowledge management needs to have context-awareness. Common sense knowledge graphs cannot perceive individual differences, compromising the effectiveness of inference applications. As a result, the expectations of enterprise vertical domains for knowledge graphs have undergone significant changes, and knowledge representation has evolved from static structures to spatiotemporal multidimensional dynamics.

By the end of 2022, ChatGPT sparked a global frenzy and triggered a “big model” war worldwide. However, because LLM is a black-box probabilistic model, it needs help to capture factual knowledge, leading to illusions and logical errors. To address this limitation, the factual accuracy, timeliness, and logical rigor of knowledge graphs have complemented LLM. The LLM+KG application paradigm has garnered widespread attention from researchers and has given rise to numerous exploration and research endeavors.

In application scenarios such as merchant operation and risk control, algorithmic tasks can be categorized into five aspects: interactive applications, business management, risk prevention and control, knowledge construction, and knowledge mining. Through the combination of LLM, KG, and their mutual enhancement, various practical applications can be realized to help businesses achieve better results and outcomes in merchant operation and risk control.

Overall, the LLM and KG applications in the context of merchant operation and risk control can be categorized into three types: LLM-only, LLM+KG dual-driven, and KG-only. There currently needs to be feasible scenarios for using LLM alone in this domain. At the same time, LLM+KG dual-driven approaches mainly apply to user-interactive scenarios such as knowledge-based question answering and report generation. Additionally, KG applications can be used in inference-based decision-making, query analysis, knowledge mining, and other scenarios that do not require complex language interactions and intent understanding. LLM and KG can achieve cross-modal knowledge alignment, logic-guided knowledge inference, and natural language knowledge querying through synergy. This places higher demands on the engine framework for a unified representation of KG knowledge semantics and cross-scenario migration.
The technological framework of knowledge graphs needs to keep pace with the times to meet expectations for new knowledge data management paradigms and the dual drive of large-scale models.

Therefore, EpiK Protocol has raised higher requirements for the organization and processing of knowledge graphs, constructing a semantic framework based on property graphs that integrates the structural nature of Labeled Property Graphs (LPG) and the semantic nature of Resource Description Framework (RDF), overcoming the challenges of applying RDF/OWL in industrial scenarios, while leveraging the advantages of simplicity and compatibility with big data offered by LPG.

Based on this, the knowledge engine framework can seamlessly integrate with significant data architecture to transform data into knowledge and adapt to storage and computation in property graphs. Additionally, it provides machine-understandable symbolic representations and supports rule-based inference, neural/symbolic fusion learning, and linkage with LLM knowledge extraction/inference.

--

--