From 5108e57aad2427c7e47061f7251ebfdc4bb6eedc Mon Sep 17 00:00:00 2001 From: Gustaf Rydholm Date: Sun, 2 May 2021 22:57:17 +0200 Subject: Updated todos in readme --- README.md | 34 ++++++++-------------------------- notebooks/00-testing-stuff-out.ipynb | 14 +++++++------- 2 files changed, 15 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index dac7e98..ed93955 100644 --- a/README.md +++ b/README.md @@ -11,44 +11,26 @@ TBC Extract text from the iam dataset: ``` -poetry run extract-iam-text --use_words --save_text train.txt --save_tokens letters.txt +poetry run python extract-iam-text --use_words --save_text train.txt --save_tokens letters.txt ``` Create word pieces from the extracted training text: ``` -poetry run make-wordpieces --output_prefix iamdb_1kwp --text_file train.txt --num_pieces 100 +poetry run python make-wordpieces --output_prefix iamdb_1kwp --text_file train.txt --num_pieces 100 ``` Optionally, build a transition graph for word pieces: ``` -poetry run build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 +poetry run python build-transitions --tokens iamdb_1kwp_tokens_1000.txt --lexicon iamdb_1kwp_lex_1000.txt --blank optional --self_loops --save_path 1kwp_prune_0_10_optblank.bin --prune 0 10 ``` (TODO: Not working atm, needed for GTN loss function) ## Todo -- [x] create wordpieces - - [x] make_wordpieces.py - - [x] build_transitions.py - - [x] transform that encodes iam targets to wordpieces - - [x] transducer loss function -- [ ] Train with word pieces - - [ ] Pad word pieces index to same length -- [ ] Local attention in first layer of transformer -- [ ] Halonet encoder -- [ ] Implement CPC - - [ ] https://arxiv.org/pdf/1905.09272.pdf - - [ ] https://pytorch-lightning-bolts.readthedocs.io/en/latest/self_supervised_models.html?highlight=byol - - -- [ ] Predictive coding - - https://arxiv.org/pdf/1807.03748.pdf - - https://arxiv.org/pdf/1904.05862.pdf - - https://arxiv.org/pdf/1910.05453.pdf - - https://blog.evjang.com/2016/11/tutorial-categorical-variational.html - - - - +- [ ] Reimplement transformer from scratch +- [ ] Implement Nyström attention (for efficient attention) +- [ ] Dino +- [ ] Efficient-net b0 + transformer decoder +- [ ] Test encoder pre-training ViT (CvT?) with Dino, then train decoder in a separate step ## Run Sweeps diff --git a/notebooks/00-testing-stuff-out.ipynb b/notebooks/00-testing-stuff-out.ipynb index 92faaf7..12c5145 100644 --- a/notebooks/00-testing-stuff-out.ipynb +++ b/notebooks/00-testing-stuff-out.ipynb @@ -420,7 +420,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -478,7 +478,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -487,26 +487,26 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ - "patch_size=4\n", + "patch_size=16\n", "p = rearrange(x, 'b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = patch_size, p2 = patch_size)" ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "torch.Size([1, 1440, 16])" + "torch.Size([1, 1440, 256])" ] }, - "execution_count": 36, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } -- cgit v1.2.3-70-g09d2