Skip to main content

Retrieval-augmented generation (RAG)

RAG is a technique used in generative text models.

A text generator model can only create a response based on the data it has been trained on. This limitation can lead to restricted responses or hallucinations.

RAG is a method of injecting new information into the model without modifying the model itself.

The context is typically provided in the system prompt, and the model is instructed to use this information when generating a response to a question.

Text models have limitations regarding context, especially when used via an API, which means that costs can increase with larger contexts. Vector databases can be employed to optimize context selection.