blob: d89cf45c1ad3ed6ae6b281c21f9c445054463d93 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
|
# Retrieval Augmented Generation
RAG with ollama (and optionally cohere) and qdrant. This is basically a glorified
(bloated) `grep`.
## Usage
### Setup
#### 1. Environment Variables
Create a .env file or set the following parameters:
```.env
CHUNK_SIZE=4096
CHUNK_OVERLAP=256
ENCODER_MODEL=nomic-embed-text
EMBEDDING_DIM=768
RETRIEVER_TOP_K=15
RETRIEVER_SCORE_THRESHOLD=0.5
RERANK_MODEL=mixedbread-ai/mxbai-rerank-large-v1
RERANK_TOP_K=5
GENERATOR_MODEL=llama3
DOCUMENT_DB_NAME=rag
DOCUMENT_DB_USER=aktersnurra
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=knowledge-base
COHERE_API_KEY = <COHERE_API_KEY> # OPTIONAL
COHERE_RERANK_MODEL = "rerank-english-v3.0"
```
#### 2. Install Python Dependencies
```
poetry install
```
#### 3. Ollama
Make sure ollama is running:
```sh
ollama serve
```
Download the encoder and generator models with ollama:
```sh
ollama pull $GENERATOR_MODEL
ollama pull $ENCODER_MODEL
```
#### 4. Qdrant
Qdrant is used to store the embeddings of the chunks from the documents.
Download and run qdrant.
#### 5. Postgres
Postgres is used to save hashes of the document to prevent documents from
being added to the vector db more than ones.
Download and run qdrant.
#### 6. Cohere
Get an API from their website, but is optional.
### Running
Activate the poetry shell:
```sh
poetry shell
```
Use the cli:
```sh
python rag/cli.py
```
or the ui using a browser:
```sh
streamlit run rag/ui.py
```
### Notes
Yes, it is inefficient/dumb to use ollama when you can just load the models with python
in the same process.
### TODO
- [x] Rerank history if it is relevant.
- [x] message ollama/cohere
- [x] create db script
- [x] write a general model for cli/ui
- [ ] use huggingface instead of ollama
- [ ] Refactor messages
### Inspiration
I took some inspiration from these tutorials:
[rag-openai-qdrant](https://colab.research.google.com/github/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb)
[building-rag-application-using-langchain-openai-faiss](https://medium.com/@solidokishore/building-rag-application-using-langchain-openai-faiss-3b2af23d98ba)
[knowledge_gpt](https://github.com/mmz-001/knowledge_gpt)
|