Uploaded yet another version of the Frequentizer, which has an improved syllable model that supports full and null onsets or codas, intervocalic consonants or consonant clusters, and can restrict the analysis to syllables in a certain position in the word. This means you can now ask for things like “vowels in word-medial syllables” or “syllable onsets in non-final syllables”. The program also presents the results in a shiny visual diagram now (made with the Chart.js library), and it has a proper license (CC-BY-SA).
Fanael wrote:Why all the h's are replaced by x's in the syllable onsets/codas/rhymes and consonant clusters reports?
Because the sample data is from my conlang Buruya Nzaysa, where there's a single phoneme /x/ which is written <h> word-finally and <x> everywhere else. More technically, because the sample consonant definitions (which are, of course, specifically tailored to B.Nz.) contain a row "x h", which tells the program that <h> is another way to write the segment /x/.
Version 0.4 of the Frequentizer is up, now under an open source license suitable for software (Free BSD). The program can now give some word-level statistics, restrict the analysis to words of a certain length, determine the most commonly used bi- and trigrams within words, and report the frequency of syllable shapes of the type CV, CCV, CVC etc. It also supports comments in the text corpus; everything from // to the end of the line will be ignored.