RAG question-answer chatbot

One of the most exciting applications of language models in the past 12 years has been text embeddings. These started with word embeddings in 2012 (and if you want more of an overview of these, check out this talk or this blog post, and progressed to the much more context-sensitive embedding LLMs we have today.

Text embeddings convert text from a string to a vector, which gives a coordinate in n-dimensional space based on its meaning. Texts with similar meanings are close together in this space, while those that are dissimilar are further away. This allows us to retrieve similar texts using similarity measures such as the Euclidean, Manhattan and (my favourite) cosine similarities. To make this search efficient, we can then put these embeddings in a vector database, which use algorithms in their indexing that prevent having to compare every embedding against a query to get the closest one.

A nice application of this is being able to convert long documents into embeddings, pop these into a vector database, and find the most similar ones. Traditional ways of searching through these documents are often rather ineffective, relying on exact word matches that may fail to turn up what you're looking for. You can expand on this by using the vector database as part of a RAG system - passing the results of the vector search to an LLM that is good at summarisation, and getting it to answer questions based on the search results.

In this project, I've built a simple RAG application which converts a long (almost 2000 page!) PDF into chunks, which are then embedded and placed in a vector database. I then use a LLM (GPT-3.5 in this case) to retrieve relevant chunks from the database for a query, summarise them, and present them to the user. As GPT-3.5 is capable of translation, I also show how you can use the model to work across languages, taking in a user query in a different language to the original document, retrieving accurate chunks, and then giving a summary in the language of the user query.

You can access the project code here, and see it in action in the final section of this talk. As in the talk, I note a number of caveats with RAG systems - while they are very powerful, they're tricky to get right, and there are a number of levers you need to tune based on your input documents and use case.