Update README with installation

author: Gustaf Rydholm <gustaf.rydholm@gmail.com> 2021-10-01 00:03:38 +0200
committer: Gustaf Rydholm <gustaf.rydholm@gmail.com> 2021-10-01 00:03:38 +0200
commit: 99c61a0b45a6f613f97fae94cab401e097da3118 (patch)
tree: dbe2c5d8004af63c2fdd1dd45ded0ab0f3d2c6f4 /README.md
parent: 6adcf85afc71a6f276370c86f32b36b15603c9f5 (diff)
1 files changed, 21 insertions, 17 deletions
diff --git a/README.md b/README.md
index d1f09bc..96cab40 100644
--- a/README.md
+++ b/README.md
@@ -1,30 +1,26 @@
 # Text Recognizer
 Implementing the text recognizer project from the course ["Full Stack Deep Learning Course"](https://fullstackdeeplearning.com/march2019) (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in ["Hypermodern Python"](https://cjolowicz.github.io/posts/hypermodern-python-01-setup/).
 
+## Installation
 
-## Setup
+Install poetry and pyenv.
 
-TBC
+```sh
+pyenv local 3.9.1
+make install
+```
 
+## Generate Datasets
 
-### Build word piece dataset
+Download and generate datasets by running:
 
-Extract text from the iam dataset:
-TODO: Fix these!
-```
-python extract-iam-text --use_words --save_text train.txt --save_tokens letters.txt
+```sh
+make download
+make generate
 ```
 
-Create word pieces from the extracted training text:
-```
-python make-wordpieces --output_prefix iamdb_1kwp --text_file train.txt --num_pieces 100
-```
 
-Optionally, build a transition graph for word pieces:
-```
-python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10
-```
-(TODO: Not working atm, needed for GTN loss function)
+## TODO
 
 ## Todo
 - [ ] Local attention for target sequence
@@ -41,8 +37,10 @@ python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kw
 - [ ] Train with Smoothloss
 - [ ] Train with SWA
 - [ ] VqTransformer without the quantization
+- [ ] VqTransformer with extra layer
+
 
-## Run Sweeps
+## Run Sweeps (old stuff)
  Run the following commands to execute hyperparameter search with W&B:
 
 ```
@@ -51,3 +49,9 @@ export SWEEP_ID=...
 wandb agent $SWEEP_ID
 
 ```
+
+(TODO: Not working atm, needed for GTN loss function)
+Optionally, build a transition graph for word pieces:
+```
+python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10
+```
author	Gustaf Rydholm <gustaf.rydholm@gmail.com>	2021-10-01 00:03:38 +0200
committer	Gustaf Rydholm <gustaf.rydholm@gmail.com>	2021-10-01 00:03:38 +0200
commit	99c61a0b45a6f613f97fae94cab401e097da3118 (patch)
tree	dbe2c5d8004af63c2fdd1dd45ded0ab0f3d2c6f4 /README.md
parent	6adcf85afc71a6f276370c86f32b36b15603c9f5 (diff)