blob: 97f255fd1f19cc55825b4c70a37f762a0916fe53 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
|
# Retrieval Augmented Generation
RAG with ollama (and optionally cohere) and qdrant. This is basically a glorified
(bloated) `grep`.
## Usage
### Setup
#### 1. Environment Variables
Create a .env file or set the following parameters:
```.env
CHUNK_SIZE=4096
CHUNK_OVERLAP=256
ENCODER_MODEL=nomic-embed-text
EMBEDDING_DIM=768
RETRIEVER_TOP_K=15
RETRIEVER_SCORE_THRESHOLD=0.5
RERANK_MODEL=mixedbread-ai/mxbai-rerank-large-v1
RERANK_TOP_K=5
GENERATOR_MODEL=llama3
DOCUMENT_DB_NAME=rag
DOCUMENT_DB_USER=aktersnurra
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=knowledge-base
COHERE_API_KEY = <COHERE_API_KEY> # OPTIONAL
COHERE_RERANK_MODEL = "rerank-english-v3.0"
```
#### 2. Install Python Dependencies
```
poetry install
```
#### 3. Ollama
Make sure ollama is running:
```sh
ollama serve
```
Download the encoder and generator models with ollama:
```sh
ollama pull $GENERATOR_MODEL
ollama pull $ENCODER_MODEL
```
#### 4. Qdrant
Qdrant is used to store the embeddings of the chunks from the documents.
Download and run qdrant.
#### 5. Postgres
Postgres is used to save hashes of the document to prevent documents from
being added to the vector db more than ones.
Download and run qdrant.
#### 6. Cohere
Get an API from their website, but is optional.
### Running
Activate the poetry shell:
```sh
poetry shell
```
Use the cli:
```sh
python rag/cli.py
```
or the ui using a browser:
```sh
streamlit run rag/ui.py
```
### Notes
Yes, it is inefficient/dumb to use ollama when you can just load the models with python
in the same process.
### TODO
- [x] Rerank history if it is relevant.
- [x] message ollama/cohere
- [x] create db script
- [x] write a general model for cli/ui
- [ ] ~~use huggingface instead of ollama~~
- Huggingface is too slow...
- [ ] Refactor messages
- [ ] Rewrite in functional style
- [ ] Try out nemotron-mini
- [ ] Try out llama3-chatqa
### Inspiration
I took some inspiration from these tutorials:
[rag-openai-qdrant](https://colab.research.google.com/github/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb)
[building-rag-application-using-langchain-openai-faiss](https://medium.com/@solidokishore/building-rag-application-using-langchain-openai-faiss-3b2af23d98ba)
[knowledge_gpt](https://github.com/mmz-001/knowledge_gpt)
|