Let's build a pipeline to evaluate and monitor a RAG application, using a 100% open-source tool:
Before we start here's a quick demo what we're building:
Tech Stack:
- @Cometml's Opik for eval and observability
- @Llama_Index to build a RAG app
Track everything from, LLM calls to chunking, embedding, generation and evaluation!
Tech Stack:
- @Cometml's Opik for eval and observability
- @Llama_Index to build a RAG app
Track everything from, LLM calls to chunking, embedding, generation and evaluation!
VIDEO
The architecture diagram presented below illustrates some of the key components & how they interact with each other!
It will be followed by detailed descriptions & code for each component:
It will be followed by detailed descriptions & code for each component:

1οΈβ£ Configuration and setup
First we configure everything to:
- Trace all LLM calls
- Trace all RAG steps
Note: You can also easily use Ollama LLMs, i have shared example in the GitHub below.
Fundamentals would still remain same.
First we configure everything to:
- Trace all LLM calls
- Trace all RAG steps
Note: You can also easily use Ollama LLMs, i have shared example in the GitHub below.
Fundamentals would still remain same.

2οΈβ£ Create a simple RAG app
This is more a didactic example, but you can always make it more sophisticated.
Here's a simple RAG setup:
This is more a didactic example, but you can always make it more sophisticated.
Here's a simple RAG setup:

3οΈβ£ LLM app and Evaluation task
Next we need to create an LLM application function and define an evaluation task.
Here's how we do it...π
Next we need to create an LLM application function and define an evaluation task.
Here's how we do it...π

4οΈβ£ Prep eval dataset
We triples of the following:
- Questions
- Their answers
- The relevant context for each QA pair
Here's our sample dataset...π
We triples of the following:
- Questions
- Their answers
- The relevant context for each QA pair
Here's our sample dataset...π

5οΈβ£ Load the dataset into Opik
Next we load this dataset in Opik so that everything is tracked an can be used for evaluation.
Check this outπ
Next we load this dataset in Opik so that everything is tracked an can be used for evaluation.
Check this outπ

6οΈβ£ Load the dataset into Opik
Next we load this dataset in Opik so that everything is tracked an can be used for evaluation.
Check this outπ
Next we load this dataset in Opik so that everything is tracked an can be used for evaluation.
Check this outπ

7οΈβ£ Define Evaluation metrics
Opik provide out of the box for all the popular LLM/RAG evaluation metrics.
Check this outπ
Opik provide out of the box for all the popular LLM/RAG evaluation metrics.
Check this outπ

8οΈβ£ Evaluate
Finally, it's time to put everything together and run evaluation.
Check this outπ
Finally, it's time to put everything together and run evaluation.
Check this outπ

You can find all the code and everything you need here!
Don't forget to star the repo: github.com/patchy631/ai-eβ¦
Don't forget to star the repo: github.com/patchy631/ai-eβ¦
If you're interested in:
- Python π
- ML/AI Engineering βοΈ
Find me βΒ @akshay_pachaarΒ βοΈ
Everyday, I share tutorials on above topics!
- Python π
- ML/AI Engineering βοΈ
Find me βΒ @akshay_pachaarΒ βοΈ
Everyday, I share tutorials on above topics!
Generated by Thread Navigator
Press β + S to quick-export
