Recommendation systems have come a long way in the last 10 years, and one of the biggest recent developments are semantic recommenders - that is, recommenders that are based on the meaning of the content being recommended. In this project, I explore one way of leveraging semantic recommendations, by using them to build a book recommendation application.
In this project:
- I take a dataset of books, and apply classic data and text cleaning techniques to remove any issues;
- Convert the description of the books into document embeddings using an encoder LLM (
text-embedding-ada-002), and pop them into a Chroma vector database for efficient retrieval; - Add the predicted category of the book (fiction versus non-fiction) using zero shot classification (using
bart-large-mnli). I verify the accuracy of the model predictions using a subset of the data which already has a fiction/non-fiction label; - Add the probability that the book description expresses each of Ekman's 6 basic emotions. I do this using a
fine-tuned encoder model (
emotion-english-distilroberta-base); - Create a Gradio dashboard which allows users to search for books based on descriptions, filtered by category and emotion.
If you want to check out the full code, you can access it in this repo . I also put together a (long!) tutorial going over the whole project for freeCodeCamp, where you can see in detail how to do this project yourself and get some theory behind the techniques used.
