blob: 8511b8360b2039838accc9c5880aa085792ad74f (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
|
# Text Recognizer
Implementing the text recognizer project from the course ["Full Stack Deep Learning Course"](https://fullstackdeeplearning.com/march2019) (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in ["Hypermodern Python"](https://cjolowicz.github.io/posts/hypermodern-python-01-setup/).
## Prerequisite
- [pyenv](https://github.com/pyenv/pyenv) (or similar) and python 3.9.\* installed.
- [nox](https://nox.thea.codes/en/stable/index.html) for linting, formatting, and testing.
- [Poetry](https://python-poetry.org/) is a project manager for python.
## Installation
Install poetry and pyenv.
```sh
pyenv local 3.9.*
make install
```
## Generate Datasets
Download and generate datasets by running:
```sh
make download
make generate
```
## Train
Use, modify, or create a new experiment found at `training/conf/experiment/`.
To run an experiment we first need to enter the virtual env by running:
```sh
poetry shell
```
Then we can train a new model by running:
```sh
python main.py +experiment=conv_transformer_paragraphs
```
## Network
Create a picture of the network and place it here
## Graveyard
Ideas of mine that did not work unfortunately:
* Efficientnet was apparently a terrible choice of an encoder
- A ConvNext module heavily copied from lucidrains [x-unet](https://github.com/lucidrains/x-unet)
was incredibly much better at encoding the images to a better representation.
* Use VQVAE to create pre-train a good latent representation
- Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
- This is very unfortunate as I really hoped that this idea would work :(
- I still really like this idea, and I might not have given up just yet...
- I have now given up... :( ConvNext ftw
* Axial Transformer Encoder
- Added a lot of extra parameters with no gain in performance
- Cool idea, but on a single GPU
* Word Pieces
- Might have worked better, but liked the idea of single character recognition more.
## Todo
- [ ] remove einops (try)
- [ ] Tests
- [ ] Evaluation
- [ ] Wandb artifact fetcher
- [ ] fix linting
- [ ] Modularize the decoder
- [ ] Add kv cache
- [ ] Fix stems
- [ ] residual attn
|