Wikidata Adopts Vector Search: How This Will Change AI Information Processing

The new Wikidata Embedding Project standard will put an end to chatbots' fabricated facts.

The new Wikidata Embedding Project standard will put an end to chatbots' fabricated facts.
Wikimedia Deutschland has introduced a new project that simplifies the use of data from Wikipedia and its sister platforms in artificial intelligence systems.
The system is named the Wikidata Embedding Project. It is based on vector semantic search, which allows computers to better understand the meanings of words and their relationships. The technology covers nearly 120 million entries and makes working with information more flexible and precise.
A key component is the support for the Model Context Protocol (MCP) – a standard that allows models to interact directly with knowledge bases and process natural language queries.
The project was implemented by the German chapter of Wikimedia in collaboration with the companies Jina.AI (neural search) and DataStax (structured training data, part of IBM).
Previously, Wikidata already provided machine-readable information, but search was limited to keywords and specialized SPARQL queries. The new format is designed for modern Retrieval-Augmented Generation (RAG) systems, which incorporate external sources, thereby improving answer accuracy and allowing the use of Wikipedia data verified by editors.
The data in the Embedding Project is structured to provide semantic context. For example, when querying the word "scientist," one can get a list of famous scientists, translations into different languages, images from Wikimedia libraries, and related terms like "researcher" or "scholar."
The database is already publicly available on Toolforge.
Philipp Saade, Wikidata AI Project Manager, noted that this initiative underscores Wikimedia's independence: "This launch shows that powerful artificial intelligence does not necessarily have to be under the control of a few companies. It can be open, collaborative, and built for everyone," he stated.