posts - 79, comments - 412, trackbacks - 0, articles - 0

Phoneme Inventories in the Goeman-Taeldeman Project

Posted on Saturday, August 28, 2004 6:25 PM
The Goeman-Taeldeman-Van Reenen Project (GTRP) is a wonderful source of material, which can be the source of many studies into e.g. phoneme inventories. Unfortunately, the Fonologische Atlas van de Nederlandse Dialecten (Phonological Atlas of Dutch Dialects), which is based on the GTRP data is mainly historical in focus, and therefore does not use these possiblilities. But fortunately all of the material is available on CD-ROM and through the project website in clean textfiles.

Today, I wrote a small AWK script to demonstrate that it is quite easy to extract interesting data:

#! /sw/bin/awk -f
# expects file with {kloeke code} {question #} {data}

 		{  
 			if (match($3, "h}")) {
 				list[$1] = $3 	# the last word containing the search item
                                                     # will be remembered
 			}
 		}
 
 END 	{ 	
 			for (item in list) {
 				print item, ", " list[item] #  kloeke code, data
 			}
 	 	}

This script extracts all pharyngeals from the raw data file and remembers the word in which they occur and the Kloeke code, a standard dialectological code to denote regions in the Dutch language area. (Of course, it is very easy to change this script in order to look at other segments as well, if you know the so-called KIPA code). Feeding the output of this programme to the Kloeketabel application automatically generates a map:

Something you cannot see on this jpg screendump is that many different words are involved in this: some dialects have the segment in one word, others have it in a different word. This is visible on the dynamic map generated by Kloeketabel, and also in the raw output of my script:

I047p , da8.h}6n
C079p , h}o2:~
I095p , te.`h}6n
L052p , h}6t
I049p , ikrih}l>a2>st
G015q , n;h}a2u_t#?a2l2d6r26vr2o.u_h}6_~n_
I076p , sisa2ta2no5:r2!h}ei_v6
I099p , s6za2tano5:rh}e.i2_v#6_~
K043p , h}r7a:r3
I036p , ziza2tw2a2la2nd6rh}ei_vi2_
C105p , duw2is.nit?o2(2fhe2i_6t?a2)?n,;
I042p , h}au_6
K180p , leh}6
I125p , ?ikr2ih}l2a2stva2nd6we2r26_mt6
I067p , te.`h}6n

I106p , h}r2uo2_tst
I069p , h}lo:`v6n
I108p , u_6h}6_r!
I073p , h}i2st6r6
I052p , i2k_s|ah}6t
F047p , h}a2l6v6
C192p , h}o2j6pi?m;
I033p , a2s|t_{6_ta2~/jo5l>d6r6vro(2u_h}6
L204p , myh}6r
I062a , te.ih}6n
I058p , wa2ni:rh}a8:~
E076p , ?6n?o:h}
We can clearly see (on the map) that almost all dialects have the relevant segment, but in the list we see that this is true only if we accumulate all potentially relevant words, i.e. the whole list.

Post Comment

Title  
Name  
Url
Comment   

ATTENTION: the code you need to copy is CaSe SeNsItIvE and is required to prevent spam.
Enter the code you see: