diff options
author | Gustaf Rydholm <gustaf.rydholm@gmail.com> | 2021-10-01 00:03:38 +0200 |
---|---|---|
committer | Gustaf Rydholm <gustaf.rydholm@gmail.com> | 2021-10-01 00:03:38 +0200 |
commit | 99c61a0b45a6f613f97fae94cab401e097da3118 (patch) | |
tree | dbe2c5d8004af63c2fdd1dd45ded0ab0f3d2c6f4 | |
parent | 6adcf85afc71a6f276370c86f32b36b15603c9f5 (diff) |
Update README with installation
-rw-r--r-- | README.md | 38 |
1 files changed, 21 insertions, 17 deletions
@@ -1,30 +1,26 @@ # Text Recognizer Implementing the text recognizer project from the course ["Full Stack Deep Learning Course"](https://fullstackdeeplearning.com/march2019) (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in ["Hypermodern Python"](https://cjolowicz.github.io/posts/hypermodern-python-01-setup/). +## Installation -## Setup +Install poetry and pyenv. -TBC +```sh +pyenv local 3.9.1 +make install +``` +## Generate Datasets -### Build word piece dataset +Download and generate datasets by running: -Extract text from the iam dataset: -TODO: Fix these! -``` -python extract-iam-text --use_words --save_text train.txt --save_tokens letters.txt +```sh +make download +make generate ``` -Create word pieces from the extracted training text: -``` -python make-wordpieces --output_prefix iamdb_1kwp --text_file train.txt --num_pieces 100 -``` -Optionally, build a transition graph for word pieces: -``` -python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 -``` -(TODO: Not working atm, needed for GTN loss function) +## TODO ## Todo - [ ] Local attention for target sequence @@ -41,8 +37,10 @@ python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kw - [ ] Train with Smoothloss - [ ] Train with SWA - [ ] VqTransformer without the quantization +- [ ] VqTransformer with extra layer + -## Run Sweeps +## Run Sweeps (old stuff) Run the following commands to execute hyperparameter search with W&B: ``` @@ -51,3 +49,9 @@ export SWEEP_ID=... wandb agent $SWEEP_ID ``` + +(TODO: Not working atm, needed for GTN loss function) +Optionally, build a transition graph for word pieces: +``` +python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 +``` |