A Mind for Language

Dec 10, 2019

Fun with vocab-tools: vocab info for a book

More fun with James Tauber's vocabulary-tools. I'm trying to read the whole NT in Greek. Titus is next. I started reading it, but there was a lot of unfamiliar vocab or at least vocab I didn't feel certain of. Vocabulary-tools to the rescue again. Sure I could buy a readers Greek New Testament, but where's the fun in that? Also using vocabulary tools lets me customize what words are added to the list.

from collections import Counter
from gnt_data import get_tokens, get_tokens_by_chunk, TokenType, ChunkType
from abott_glosser import Glosser
from ref_tools import get_book
import sys

# Get all lemmas in GNT
gnt_lemmas = Counter(get_tokens(TokenType.lemma))

# Get lemmas for chapter
NEW_CHAPTER = Counter(get_tokens(TokenType.lemma, ChunkType.book, get_book("TIT", 60)))

# get GNT freq, rather than freq in current chatper
def getNTFreq(nt, tgt):
    out = {}
    for t in tgt.items():
        lemma = t[0]
        if lemma in nt:
            out[lemma] = nt[lemma]
    return out

#subtract vocab from the last chatper from list
ACT_NT_FREQ = getNTFreq(gnt_lemmas, NEW_CHAPTER)

# Filter lemmas based on those that occur less than LIM in the GNT as a whole
LIM = 10
freq = lambda x: int(x[1]) < LIM
TGT = sorted(list(filter(freq,ACT_NT_FREQ.items())), key=lambda x: x[0])

# setup glosser
glosser = Glosser("custom-glosses.tab")

# output results
for l in TGT:
    print(f"{l[0]}\t{l[1]}\t{glosser.get(l[0])}")

By running py get_chatper.py > titus_vocab.txt I now have a vocab list. Now I can print the list and stick it in my GNT for easy access. In theory I could also keep track of this list and filter these out when I move on to the next book. Or filter out those that I have only seen a certain number of times. Also by tweaking the print line to print(f"{l[0]}\t{glosser.get(l[0])}"), the file could be imported into Anki and boom! Instant flashcards.