-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
25 lines (16 loc) · 2.02 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
For Allen Ginsberg, pioneer of the concept of the "American Sentence"--similar
to the Japanese haiku but without the 5-7-5 structure.
One poem, one sentence, seventeen syllables--often by surprise.
Tompkins Square Lower East Side N.Y.
Four skin heads stand in the streetlight rain chatting under an umbrella.
(1987)
"Put on my tie in a taxi, short of breath, rushing to meditate. "
(November 1991 New York)
WARNING:
May contain spoilers for these literary classics.
Also note that the ones from "My Secret Life" tend to be very, very dirty.
Original dictionary was inadequate and didn't recognize common stemmed words (e.g. bar but not bars, drop but not dropped). Original source was http://hindson.com.au/info/free/free-english-language-hyphenation-dictionary/
Switched to use CMU pronouncing dictionary cmudict-0.7b from http://www.speech.cs.cmu.edu/cgi-bin/cmudict but note that some words have alternate pronunciations with more or fewer syllables, only the first given is taken currently.
Also note that as with anything the dictionary is good but far from complete so words are missing.
UPDATE I added code to parse together a dictionary that combines the two above and also takes a third dictionary (fakecmudict.txt) which can take user-submitted words to flesh out the other two. Entries may be added to the fakecmudict.txt by mimicking the format in the original. For current purposes this is not striving to be a pronunciation dictionary, only a dictionary to add unrecognized words and give a syllable count (and approximation to meter). For each syllable, simply add a SYL0 SYL1 SYL2 with SYL0 being unstressed, SYL1 being stressed, and SYL2 having secondary stress. The important part for code purposes and syllable recognition is that you have letters followed by a number, but the closer to getting at least the stress right the better results will be for further analysis.
Running parsedict.rb will spit out json that has a combination of the three. Note that the dictionary produced by this is also utilized in the analyze_poem project.