Skip to content

ahmetlekesiz/sentence-generator-tr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sentence-generator-tr

Turkish Sentence Generator by using Zemberek

We require a system to generate words and sentences in Turkish based on certain calculations and rules. Assume that ascending order Turkish alphabet is assigned numerical values starting from 1. (e.g. letter_values = {'a':1, 'b':2, 'c':3, 'ç':4, 'd':5, 'e':6, ...} ).

MODULE 1: Generates given number of words whose sum of letter values equal to a given number. (e.g. sum of letter values for "yabancılar" or "şirket" is 100)

MODULE 2: Generate a sentence whose sum of letter values equal to a given number. This sentence should obey the grammatical rules of Turkish language. From this step on you should work with a corpus of Turkish documents. You can parse sentences and words, assign Part of Speech (POS) tags to words, create dictionaries of for different POS labels and choose appropriate words from these dictionaries to form a sentence (e.g. "Ali" from Subject dictionary, "topu" from Object dictionary, and "tut" from verb dictionary --> "Ali topu tut". ) For this module, I don't expect to see meaningful sentences but the sentences should be syntactically correct in the basic level (on the most basic level should obey the subject object verb order)

MODULE 3: Generate a sentence whose sum of letter values equal to a given number. This sentence should be both syntactically and semantically correct (as much as possible!). You should devise a methodology, an algorithm to combine words + suffixes in a meaningful order. To do so you can devise basic statistics from the corpus such as word co-occurrence frequencies and when combining different words to form a sentence you can check if two neighboring words co-occur frequently in the sentences in the corpus. This may be a good indicator of the semantic relatedness. You may extend this to not just neighboring words but the words co-occur in the sentences or in the documents.

About

Turkish Sentence Generator by using Zemberek

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages