A Mind for Language

Nov 19, 2019

More fun with JTauber's vocab tools: Finding verses and pericopes with shared vocab

Vocabulary acquisition requires repeated exposure to a word in order for our brains to acquire that word. In other words, we need encounter a given word repeatedly to acquire it. Reading texts that cover similar topics is a great way to do this. Since the topic is similar, the likelihood that there will be repeated vocabulary between the texts is higher.

For those of us interested in New Testament Greek and acquiring vocabulary, reading the GNT would be a good way to do this. Read the whole thing and you will certainly have acquired a good deal of vocab. But sometimes, biblical texts don't address the same topic with enough repetition for us to naturally get the repeated exposure we need to acquire a word within a short period of time.

What if we could read passages that have a high degree of shared vocab? That should provide the repetition. But how do we find these passages?

Enter the dragon... I mean, enter James Tauber's vocabulary tools for the GNT.

The code

The following code loops through each verse in the GNT and then gets set of all lemmas found there. It then loops through every verse in the GNT, and figures out what lemmas are not common to those two verses. If the number lemmas that aren't shared is below a given limit (in this case 5), it saves them to be output.

from gnt_data import get_tokens, get_tokens_by_chunk, TokenType, ChunkType
from greekutils.verse_ref import bcv_to_verse_ref

reffer = lambda x: bcv_to_verse_ref(x, start=61)


gnt_verses = get_tokens_by_chunk(TokenType.lemma, ChunkType.verse)

commons = dict()
LIM = 5
for verse, lemma in gnt_verses.items():
    print(reffer(verse))
    verse_set = set(lemma)
    for v, l in gnt_verses.items():
        if v == verse:
            continue
        vset = set(l)
        u = verse_set.union(vset)
        intr = verse_set.intersection(vset)
        not_common = u - intr
        if len(not_common) < LIM:
            if verse in commons:
                commons[verse].append(v)
            else:
                commons[verse] = [v]
with open("common_list_verses.txt", 'w') as g:
    for k,v in commons.items():
        print(reffer(k), file=g)
        for i in v:
            print("\t" + reffer(i), file=g)
print("DONE!")

Here's a snippet of the results:

Matt 4:14
    Matt 2:17
    Matt 12:17
    Matt 21:4

Now let's compare them (Greek text taken from [1]):

Matt 4:14 is ἵνα πληρωθῇ τὸ ῥηθὲν διὰ Ἠσαΐου τοῦ προφήτου λέγοντος·

  • Matt 2:17 – τότε ἐπληρώθη τὸ ῥηθὲν ⸀διὰ Ἰερεμίου τοῦ προφήτου λέγοντος
  • Matt 12:17 – ⸀ἵνα πληρωθῇ τὸ ῥηθὲν διὰ Ἠσαΐου τοῦ προφήτου λέγοντος·
  • Matt 21:4 –Τοῦτο ⸀δὲ γέγονεν ἵνα πληρωθῇ τὸ ῥηθὲν διὰ τοῦ προφήτου λέγοντος·

What about larger units of text

Ok, but who wants to skip around reading random verses? By making a few tweaks to the code above we can compare pericopes.

gnt_verses = get_tokens_by_chunk(TokenType.lemma, ChunkType.pericope)
...

LIM = 10
...
with open("common_list_pericope.txt", 'w') as g:
    for k,v in commons.items():
        print(k, file=g)
        for i in v:
            print("\t" + i, file=g)

Which returns the following passages. I had to write some extra code to convert the pericope codes into the normal passage references so you'll want this file and this file if you want to run this part yourself.

Mark 10:13 - Mark 10:16
    Luke 18:15 - Luke 18:17
Luke 18:15 - Luke 18:17
    Mark 10:13 - Mark 10:16
Eph 1:1 - Eph 1:2
    Col 1:1 - Col 1:2
Col 1:1 - Col 1:2
    Eph 1:1 - Eph 1:2

By changing LIM to 15 we get the following list.

Mark 10:13 - Mark 10:16
    Luke 18:15 - Luke 18:17
Luke 18:15 - Luke 18:17
    Mark 10:13 - Mark 10:16
Eph 1:1 - Eph 1:2
    Phil 1:1 - Phil 1:2
    Col 1:1 - Col 1:2
    2 Thess 1:1 - 2 Thess 1:2
    2 Tim 1:1 - 2 Tim 1:2
Phil 1:1 - Phil 1:2
    Eph 1:1 - Eph 1:2
    Col 1:1 - Col 1:2
    2 Thess 1:1 - 2 Thess 1:2
Col 1:1 - Col 1:2
    Eph 1:1 - Eph 1:2
    Phil 1:1 - Phil 1:2
    2 Thess 1:1 - 2 Thess 1:2
    2 Tim 1:1 - 2 Tim 1:2
2 Thess 1:1 - 2 Thess 1:2
    Eph 1:1 - Eph 1:2
    Phil 1:1 - Phil 1:2
    Col 1:1 - Col 1:2
    1 Tim 1:1 - 1 Tim 1:2
    2 Tim 1:1 - 2 Tim 1:2
    Phlm 1:1 - Phlm 1:3
1 Tim 1:1 - 1 Tim 1:2
    2 Thess 1:1 - 2 Thess 1:2
    2 Tim 1:1 - 2 Tim 1:2
2 Tim 1:1 - 2 Tim 1:2
    Eph 1:1 - Eph 1:2
    Col 1:1 - Col 1:2
    2 Thess 1:1 - 2 Thess 1:2
    1 Tim 1:1 - 1 Tim 1:2
Phlm 1:1 - Phlm 1:3
    2 Thess 1:1 - 2 Thess 1:2

κ.τ.λ.

ChunkType could also be changed to chapter if you'd like to compare chapters.

All of the above uses lemmas. If you are interested in forms, then simply replacing TokenType.lemma form TokenType.form in this line will do the trick.

gnt_verses = get_tokens_by_chunk(TokenType.form, ChunkType.pericope)

I doubt this will change your life as a student or as a teacher, but it is certainly interesting to know which verses or passages share vocabulary. This could help us develop better reading assignments for students or direct us to which passages could be interesting reading to grow our own vocabulary.


[1]: Michael W. Holmes, The Greek New Testament: SBL Edition (Lexham Press; Society of Biblical Literature, 2011–2013)