Published on

How AI Search Engine Works


Retrieval-Augmented Generation(RAG)

AI search seems mysterious, but it is actually very simple. The key technologies are still AI, not the search. Even without search, if you ask AI a question, it will still be able to give you an answer. Search only provides AI with some auxiliary context information, making it possible for AI to provide more accurate answers.

The technology used in AI search is called Retrieval Augmented Generation (RAG), as mentioned above which is to give AI more context information that it may not know, so that it can come up with more accurate answers.

Let's take a look at the following prompt, which comes from the search with lepton project, and was modified in the Infinite Search project

private val ragText = """
        You are a large language AI assistant built by VLINX Software. You are given a user question, and please write clean, concise and accurate answer to the question. You will be given a set of related contexts to the question.

        Your answer must be correct, accurate and written by an expert using an unbiased and professional tone. Please limit to 1024 tokens. Do not give any information that is not related to the question, and do not repeat. Say "information is missing on" followed by the related topic, if the given context do not provide sufficient information.

        your answer must be written in the same language as the question.

        Here are the set of contexts:


        Remember, don't blindly repeat the contexts verbatim. And here is the user question:

As we can see, this prompt is not much different from our usual prompts, the key lies in how to provide this context

How to give AI contextual information

RAG this technology is not only applied to AI search, its can be applied to a variety of fields that require information retrieval, such as you feed it a book, so that the AI from the book to find out the information you need, or feed it a bunch of medical knowledge, so that the AI to become your personal medical assistant, and the key here lies in the retrieval of information is how to give the AI to provide the context of the information as mentioned above, and the key technology used here The key technology used here is vector database.

The word "vector" in a vector database refers to a text vector, that is, converting each text, or each token, into a vector representation. For example, "hello" corresponds to a vector "[1,1,1,1]", "world" corresponds to "[2,2,2,2]". What is the significance of converting text into a vector? The significance is that these vectors are obtained through pre-training, so that text with similar meanings has a numerically similar vector representation. For example, the vector of "like" may be "[36,37,38,39]", the vector of "love" may be "[33,36,39,38]", and the vector of "hate" may be "[99,88,77,99]". "like" and "love" are closer in vector space, while the distance to "hate" is further. Therefore, when we search for keywords, we can use the vector database to find the most similar text and submit it to AI as context information. After analysis, AI combines its own capabilities and the context information we provide to give an answer.

  1. Convert textual information into vectors to be stored in a vector database
  2. Based on the keywords provided by the user, retrieve a certain number of the most similar text entries from the vector database and provide them to the AI as contextual information.
  3. AI combines its own capabilities with the contextual information to give an answer

From the analysis above, we can see that RAG is very suitable for the field of knowledge retrieval, especially for those fields that AI has not learned. But this also causes the limitations of AI search, namely, timeliness and accuracy.

If a large amount of professional field knowledge can be prepared in advance and stored in the vector database, the most similar information can be found from the vector database and submitted to AI for analysis during the retrieval. Under this scenario, the advantages of RAG can be brought into full play.

However, for AI search, we cannot predict the content users search, that is, we cannot prepare a large amount of materials in advance. Then, the process of collecting text data and storing it into the vector database can only be carried out during the search process. It is necessary to crawl a certain amount of web page content, then cut and analyze it and store it into the vector database, and then retrieve similar content from the database and submit it to AI. For a search process with strong timeliness, this will consume a lot of time, which is quite different from the traditional search experience of users. If we want to save time, we can also choose not to crawl the web page content, but directly provide the summary information returned by the search engine API to AI as context information. However, this will lead to a large amount of information loss, resulting in inaccurate search results, or only slightly better than directly asking AI questions.

search with lepton adopts the method of directly submitting summary information,

Infinite Search has made certain improvements to this process. Its processing method is to crawl the web page content of the first two items in the search results and add the summary content of other search results to the vector database, and then retrieve the content that is most relevant to the user's keywords from the vector database as context information to provide it to AI.