blob: ff86560289bc41427c2cfe2775a2c1c2cd55fd7e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
|
# Retrieval Augmented Generation
RAG with ollama and qdrant.
## Usage
### Setup
#### Environment Variables
Create a .env file or set the following parameters:
```.env
CHUNK_SIZE = <CHUNK_SIZE>
CHUNK_OVERLAP = <CHUNK_OVERLAP>
ENCODER_MODEL = <ENCODER_MODEL>
EMBEDDING_DIM = <EMBEDDING_DIM>
GENERATOR_MODEL = <GENERATOR_MODEL>
DOCUMENT_DB_NAME = <DOCUMENT_DB_NAME>
DOCUMENT_DB_USER = <DOCUMENT_DB_USER>
QDRANT_URL = <QDRANT_URL>
QDRANT_COLLECTION_NAME = <QDRANT_COLLECTION_NAME>
```
### Ollama
Download the encoder and generator models with ollama:
```sh
ollama pull $GENERATOR_MODEL
ollama pull $ENCODER_MODEL
```
### Qdrant
Qdrant will is used to store the embeddings of the chunks from the documents.
Download and run qdrant.
### Postgres
Postgres is used to save hashes of the document chunks to prevent document chunks from
being added to the vector db more than ones.
Download and run qdrant.
#### Running
Build script/or FE for adding pdfs or retrieve information
### Frontend (Low priority)
[streamlit](https://github.com/streamlit/streamlit)
### Notes
Yes, it is inefficient/dumb to use ollama when you can just load the models with python
in the same process.
### Inspiration
I took some inspiration from these tutorials.
[rag-openai-qdrant](https://colab.research.google.com/github/qdrant/examples/blob/master/rag-openai-qdrant/rag-openai-qdrant.ipynb)
[building-rag-application-using-langchain-openai-faiss](https://medium.com/@solidokishore/building-rag-application-using-langchain-openai-faiss-3b2af23d98ba)
|