From 99c61a0b45a6f613f97fae94cab401e097da3118 Mon Sep 17 00:00:00 2001 From: Gustaf Rydholm Date: Fri, 1 Oct 2021 00:03:38 +0200 Subject: Update README with installation --- README.md | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index d1f09bc..96cab40 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,26 @@ # Text Recognizer Implementing the text recognizer project from the course ["Full Stack Deep Learning Course"](https://fullstackdeeplearning.com/march2019) (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in ["Hypermodern Python"](https://cjolowicz.github.io/posts/hypermodern-python-01-setup/). +## Installation -## Setup +Install poetry and pyenv. -TBC +```sh +pyenv local 3.9.1 +make install +``` +## Generate Datasets -### Build word piece dataset +Download and generate datasets by running: -Extract text from the iam dataset: -TODO: Fix these! -``` -python extract-iam-text --use_words --save_text train.txt --save_tokens letters.txt +```sh +make download +make generate ``` -Create word pieces from the extracted training text: -``` -python make-wordpieces --output_prefix iamdb_1kwp --text_file train.txt --num_pieces 100 -``` -Optionally, build a transition graph for word pieces: -``` -python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 -``` -(TODO: Not working atm, needed for GTN loss function) +## TODO ## Todo - [ ] Local attention for target sequence @@ -41,8 +37,10 @@ python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kw - [ ] Train with Smoothloss - [ ] Train with SWA - [ ] VqTransformer without the quantization +- [ ] VqTransformer with extra layer + -## Run Sweeps +## Run Sweeps (old stuff) Run the following commands to execute hyperparameter search with W&B: ``` @@ -51,3 +49,9 @@ export SWEEP_ID=... wandb agent $SWEEP_ID ``` + +(TODO: Not working atm, needed for GTN loss function) +Optionally, build a transition graph for word pieces: +``` +python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 +``` -- cgit v1.2.3-70-g09d2