User:Laurier Rochon/notes/artifical language research: Difference between revisions
No edit summary |
No edit summary |
||
Line 392: | Line 392: | ||
src = http://en.wikipedia.org/wiki/Ithkuil | src = http://en.wikipedia.org/wiki/Ithkuil | ||
== Random additives == | |||
Regular expressions : http://en.wikipedia.org/wiki/Regular_expression (to visualize ranges, patterns, etc.) | |||
Sigur Ros : http://en.wikipedia.org/wiki/Sigur_R%C3%B3s#Vonlenska |
Latest revision as of 19:09, 28 May 2011
Je note, tu notes, il note, nous notons, vous notez, ils notent.
Rudolf Carnap
Rudolf Carnap (May 18, 1891 – September 14, 1970) was an influential German-born philosopher who was active in Europe before 1935 and in the United States thereafter. He was a leading member of the Vienna Circle and a prominent advocate of logical positivism.
src = http://en.wikipedia.org/wiki/Rudolf_Carnap
Constructed language (ConLang)
A planned or constructed language—known colloquially as a conlang—is a language whose phonology, grammar, and/or vocabulary has been consciously devised by an individual or group, instead of having evolved naturally. There are many possible reasons to create a constructed language: to ease human communication (see international auxiliary language and code); to give fiction or an associated constructed world an added layer of realism; for linguistic experimentation; for artistic creation; and for language games.
The expression planned language is sometimes used to mean international auxiliary languages and other languages designed for actual use in human communication. Some prefer it to the term "artificial", as that term may have pejorative connotations in some languages. Outside the Esperanto community, the term language planning means the prescriptions given to a natural language to standardize it; in this regard, even "natural languages" may be artificial in some respects. Prescriptive grammars, which date to ancient times for classical languages such as Latin, Sanskrit, and Chinese are rule-based codifications of natural languages, such codifications being a middle ground between naive natural selection and development of language and its explicit construction. The term glossopoeia, coined by J. R. R. Tolkien, is also used to mean language construction, particularly construction of artistic languages.
src = http://en.wikipedia.org/wiki/Constructed_language
Engineered language
Engineered languages (sometimes abbreviated to engilangs or engelangs) are constructed languages devised to test or prove some hypotheses about how languages work or might work. There are at least three subcategories, philosophical languages (or ideal languages), logical languages (sometimes abbreviated as loglangs), and experimental languages. Raymond Brown describes engineered languages as "languages that are designed to specified objective criteria, and modeled to meet those criteria".
Some engineered languages have been considered candidate global auxiliary languages, and some languages intended as international auxiliary languages have certain "engineered" aspects (in which they are more regular and systematic than their natural language sources).
src = http://en.wikipedia.org/wiki/Engineered_language
Artistic language
An artistic language (artlang) is a constructed language designed for aesthetic pleasure. Unlike engineered languages or auxiliary languages, artistic languages usually have irregular grammar systems, much like natural languages. Many are designed within the context of fictional worlds, such as J. R. R. Tolkien's Middle-earth. Others represent fictional minority languages in a world not patently different from the real world, or have no particular fictional background attached. There are several different schools of artistic language construction. The most prominent is the naturalist school, which seeks to imitate the complexity and historicity of natural languages. Others do not attempt to imitate the natural evolution of languages, but follow a more abstract style.
src = http://en.wikipedia.org/wiki/Artistic_language
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic (a less precise term). First-order logic is distinguished from propositional logic by its use of quantifiers; each interpretation of first-order logic includes a domain of discourse over which the quantifiers range. The adjective "first-order" is used to distinguish first-order theories from higher-order theories in which there are predicates having other predicates or functions as arguments or in which predicate quantifiers or function quantifiers are permitted or both.[1] In interpretations of first-order theories predicates are associated with sets. In interpretations of higher order theories they may be also associated with sets of sets.
src = http://en.wikipedia.org/wiki/First-order_logic
Human-usable engineered languages
- An Essay towards a Real Character and a Philosophical Language by John Wilkins
- aUI
- Characteristica universalis
- Ilaksh
- Isotype
- Ithkuil
- Láadan (ldn)
- Loglan
- Logopandecteision
- Lojban (jbo)
- Ro
- Unilingua
src = http://en.wikipedia.org/wiki/List_of_constructed_languages#Engineered_languages
Lojban
Lojban (pronounced [ˈloʒban]) is a constructed, syntactically unambiguous human language based on predicate logic, succeeding the project of Loglan. The name "Lojban" is a combination of loj and ban, which are short forms of logji (logic) and bangu (language), respectively. Development of the language began in 1987 by The Logical Language Group (LLG), who intended to realize Loglan's purposes as well as further complement the language by making it more usable, and freely available (as indicated by its official full English name "Lojban: a realization of Loglan"). After a long initial period of debating and testing, the baseline was completed in 1998 with the publication of The Complete Lojban Language. In an interview in 2010 with the New York Times, Arika Okrent, the author of In the Land of Invented Languages, stated: "The constructed language with the most complete grammar is probably Lojban – a language created to reflect the principles of logic."The main sources of its basic vocabulary were the six (at the time) most widely spoken languages: Mandarin, English, Hindi, Spanish, Russian, and Arabic, chosen to reduce the unfamiliarity or strangeness of the root words to people of diverse linguistic backgrounds. The language has drawn on other constructed languages' components, a notable instance of which is Láadan's set of indicators.
The computer-tested, unambiguous rules also include grammar for 'incomplete' sentences e.g. for narrative, quotational, or mathematical phrases. Lojbanic expressions are modular; smaller constructs of words are assembled into larger phrases so that all incorporating pieces manifest as a possible grammatical unity. This mechanism allows for simple yet infinitely powerful phrasings; "a more complex phrase can be placed inside a simple structure, which in turn can be used in another instance of the complex phrase structure". Its typology can be said to be basically Subject Verb Object and Subject Object Verb. However, it can practically be anything:
- mi prami do (SVO) (I love you)
- mi do prami (SOV) (By me, you are loved)
- do se prami mi (OVS) (You are loved by me)
- do mi se prami (OSV) (You, I love)
- prami fa mi do (VSO) (Loved by me, you are)
- prami do fa mi (VOS) (Love you, I do)
src = http://en.wikipedia.org/wiki/Lojban
src = http://www.lojban.org/
Esperanto
Esperanto (help·info) is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto (Esperanto translates as - one who hopes), the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887. Zamenhof's goal was to create an easy-to-learn and politically neutral language that would foster peace and international understanding between people with different regional and/or national languages.
Estimates of Esperanto speakers range from 10,000 to two million active or fluent speakers. Esperanto has native speakers, that is, people who learned Esperanto from their parents as one of their native languages. Esperanto is spoken in about 115 countries. Usage is particularly high in Europe, eastern Asia and North & South America.[1] The first World Congress of Esperanto was organized in France in 1905. Since then congresses have been held in various countries every year with the exception of years in which there were world wars. Although no country has adopted Esperanto officially, Esperanto was recommended by the French Academy of Sciences in 1921 and was recognized by UNESCO in 1954. In 2007 Esperanto was the 32nd language that adhered to the "Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR)".[2] When counting Wikipedia articles, Esperanto is the 26th language.[3] Worldwide, there are 6,912 recognized languages. Esperanto is currently the language of instruction of the International Academy of Sciences in San Marino. There is evidence that learning Esperanto may provide a superior foundation for learning languages in general, and some primary schools teach it as preparation for learning other foreign languages
aUI (artificial language)
aUI is a constructed language credited to John W. Weilgart, created in the beginning of the 1960s. Because of its structure it is classified as a logical language or philosophical language.
aUI has 42 phonemes (including nasalized variations on the vowels for numbers), each with an associated meaning:
- a /ǝ/: 'space'
- e /ɛ/: 'movement'
- i /ɪ/: 'light'
- u /ʊ/: 'human'
- o /ɔ/: 'life'
- y /y/: 'negative'
- q /œ/: 'condition'
- A /a/: 'time'
- E /e/: 'matter'
- I /i/: 'sound'
- U /u/: 'mind'
- O /o/: 'feeling'
- b: 'together'
- c /ʃ/: 'being'
- d: 'through'
- f: 'this'
- g: 'inside'
- h: 'question'
- j /ʒ/: 'equal'
- k: 'above'
- l: 'round'
- m: 'quality'
- n: 'quantity'
- p: 'before'
- r: 'positive'
- s: 'thing'
- t: 'toward'
- v: 'active'
- w: 'power'
- x /x/: 'relation'
- z: 'part'
The language was designed so that, ideally, the features of each phoneme would represent its meaning. The phoneme ‹b›, for instance, meaning "together", is pronounced with the lips pressed together. The short ‹i›, which means "light", takes the brightest, highest-frequency sound, while the long ‹I›, which means "sound", takes longer to say, because sound travels more slowly than light. Each phoneme also has a character that represents its meaning. The symbol for ‹a›, meaning "space", for instance, is a circle to enclose an open space. The symbol for ‹e›, meaning "movement", follows the movement of a spiral nebula. The ‹u›, meaning "human", is a caret shape, suggesting two legs. The ‹o›, meaning "life", is represented by the shape of a leaf, plants and photosynthesis forming the basis of all life. The ‹v›, meaning "active", is represented by a lightning bolt, the most active thing in nature. The character for ‹g›, meaning "inside", is a dot inside a circle. The character for ‹t›, meaning "toward", is a split arrow shape pointing towards the right. aUI attempts oligosynthesis. Short nasal vowels (marked with an asterisk) are used for numerals: ‹y*› 0, ‹a*› 1, ‹e*› 2, ‹i*› 3, ‹u*› 4, ‹o*› 5, ‹A*› 6, ‹E*› 7, ‹I*› 8, ‹U*› 9, ‹O*› 10.
src = http://en.wikipedia.org/wiki/AUI_(language)
src = http://home.centurytel.net/languageofspace/ (check this one out...its worth it)
Rotokas language
Rotokas is a language (part of the East Papuan language phylum) spoken by some 4000 people in Bougainville, an island to the east of New Guinea, part of Papua New Guinea. There are at least three dialects of the language: Central Rotokas ("Rotokas Proper"), Aita Rotokas, and Pipipaia. Central Rotokas is most notable for its extremely small phonemic inventory and for having perhaps the smallest modern alphabet.
Grammar
Typologically, Rotokas is a fairly typical verb-final language, with adjectives and demonstrative pronouns preceding the nouns they modify, and postpositions following. Although adverbs are fairly free in their ordering, they tend to precede the verb, as in the following example:
osirei-toarei avuka-va iava ururupa-vira tou-pa-si-veira eye-MASC.DL old-FEM.SG POST closed-ADV be-PROG-2.DL.MASC-HABITUAL "The old woman's eyes are shut."
Orthography
The alphabet is perhaps the smallest in use. The letters are A E G I K O P R S T U V. T and S both represent the phoneme /t/, written with S before an I and in the name 'Rotokas', and with T elsewhere. The V is sometimes written B.
src = http://en.wikipedia.org/wiki/Rotokas_language
src = http://www.archive.org/stream/rosettaproject_roo_morsyn-1#page/n3/mode/2up
Speedtalk (the main point of interest)
Speedtalk
Speedtalk is an idea for a new language put forth by Robert A. Heinlein in his novella, Gulf (1949). Speedtalk was defined as an entirely logic-based language, and it was a key plot device. The basic concept was that the conlang would utilize a complex syntax with a minimal vocabulary and a phonemically extensive alphabet (including such letters as œ, ħ, ø, and ʉ), and it was therefore considered extremely efficient. A single phoneme indicated a word, so a "word" indicated a sentence. In the only example given, a single word meant "The far horizons draw no nearer." Many of these ideas have been incorporated into the Ithkuil language.
Case study 1 : Ithkuil
Ithkuil is a constructed language marked by outstanding grammatical complexity, expressed with a rich phonemic inventory or through an original, graphically structured, system of writing. The language’s author, John Quijada, presents Ithkuil[1] as a cross between an a priori philosophical and a logical language designed to express deeper levels of human cognition overtly and clearly, particularly in regard to human categorization, yet briefly. It also strives to minimize the ambiguities and semantic vagueness found in natural human languages. The many examples from the original grammar book[1] show that a message, like a meaningful phrase or a sentence, can usually be expressed in Ithkuil with fewer sounds, or lexically distinct speech-elements, than in natural human languages. J. Quijada deems his creation too complex and strictly regular a language to have developed “naturally”, but nonetheless a language suited for human conversation. No person is hitherto known to be able to speak Ithkuil fluently; Quijada, for one, does not.
In 2004 — and again in 2009 with its offshoot, Ilaksh — Ithkuil was featured in the Russian-language science magazine Computerra. Quijada plans for a major revision of Ithkuil to be released in summer 2011.
Interesting essay 1 on the topic (re-pasted here as it is only available through archive.org's 'wayback machine', and I am afraid it will disappear eventually. This from ~1998)
In the 1940's Robert A. Heinlein wrote a science fiction story named Gulf, which described the exploits of a society of supermen who used a language named Speedtalk. The premise, as Heinlein described it, was that every word in the language consisted of only a single phoneme, and thus each sentence would be only as long as a single English word. Heinlein argued that people who spoke such a language would be able to think more quickly as well, by virtue of the fact that their thoughts would all be in Speedtalk. As a result, they would be able to squeeze centuries of experience into a few decades of calendar time and would experience a longevity of the mind, if not of the body
There are a number of problems with Heinlein's original scheme. Aside from the ruthlessly efficiency-oriented view of life inherent in his claims for the life-expanding quality of Speedtalk (does thinking ten times faster enable you to enjoy ten times as many sunsets?) his technical claims were glib, to say the least. Firstly, he claimed that one can express any idea in English in combinations of 850 basic vocabulary words. While two linguists did indeed make this claim, it unfortunately rests on the fact that two English speakers will share a great number of idiomatic expressions, and thus can communicate using information which goes beyond the meanings of the 850 basic words. Would a native speaker of Chinese, dropped into the middle of Manhattan with an 850 word English vocabulary, really know that one can say "succeed" by using the expression "make good"? Furthermore, is it really so slow to say "watermelon" when the alternative is to say "sweet green large egg-shaped fruit with pink flesh"- even if each of those words is a single phoneme?
Secondly, Heinlein claims that since there are roughly 100 phonemes in the International Phonetic Alphabet, and since one can vary each of them in terms of length and pitch, one can thereby construct the 900 or so phonemes one needs for Speedtalk. But how does one make a long "k" sound? A rising "d"?
Nonetheless, Speedtalk is an intriguing idea- one writer speculated that everyone who is interested in constructed languages as a hobby sooner or later tries to create Speedtalk. I myself have been interested in Speedtalk ever since I read Gulf, in part because the claim that Speedtalk would make someone think faster is an interesting variant on the Sapir-Whorf hypothesis (it is also an interesting contrast to Marylin vos Savant's advice to think pictorially, rather than verbally, with much the same goals in mind as Heinlein,) and also because the human vocal apparatus is so wonderfully flexible that it is interesting to wonder how much bandwidth, exactly, one could squeeze out of it.
All of the proposed Speedtalk schemes which I have seen, however, suffer from the same flaw: not enough phonemes. Typically, people will establish an alphabet of a large number of phonemes (oftentimes being quite imaginative in the process,) but the final number will only be around 100, and in the end each word will be composed of two phonemes. Since part of the fun of Speedtalk lies in Heinlein's original plan to have every word be a single phoneme, such schemes leave one wondering if one can push the human vocal apparatus further. After reading a bit about linguistics, I have devised one possible way to create a language not only in which one phoneme equals one word, but also in which one has far more phonemes than Heinlein's original 850.
My basic scheme involves a number of modifications (in terms of length, stress, tone, etc.) to a basic set of four vowels. While one could start with more vowels, and my scheme does not even come close to exhausting the number of possible modifications (some languages have five lengths for their phonemes, for example, while my Speedtalk only has three,) I think this demonstration will serve to show how one could go about creating a "fast," high-bandwidth language like Speedtalk (no doubt it will also contain mistakes that will demonstrate the flaws in my knowledge of phonetics!) As I go through the explanation I will gradually introduce the transliteration scheme for writing down the phoneme-words; ideally one would want a writing system for Speedtalk which would be easy to speedread, and I will demonstrate one such system later.
First, start with four basic vowels, each of which can be tense or lax, and all of which should be pronounced with the lips unrounded (try pronouncing them while smiling, if that helps):
a as in "bat," or A as in "bay."
e as in "bit" or E as in "beet."
o as in the "a" of "father" or O as in "boat," but with the lips unrounded.
u as in "book" or U as in "boot," again with the lips unrounded (there is no English equivalent for the last 3 phonemes.)
One can then make the vowels rounded or unrounded:
o as in "father" or o) as in "sport" or "horse." Try prononcing a continuous "o" sound while alternating between smiling and rounding the lips, and the difference should become clear.
The vowels can also be nasalized or non-nasalized. Think of French words such as "bon" or "Laurent." Nasalized phonemes are indicated with a caret: a)^ would be "a" as in "bat," rounded and nasalized.
The vowels can also be pressured or non-pressured. This is a distinction drawn in the Khoisan languages, which need an extremely large number of phonemes due to some of the quirks of their grammar. A "pressured" vowel is pronounced by straining against a partly closed throat, making a croaky voice like storytellers use when trying to sound like a talking frog or goat. Danny Torrance in "The Shining" talks in "pressured" vowels when his invisible friend speaks through him. Pressurization is indicated by an asterisk: o)* would be the "o" in the word "frog" as spoken by a frog.
Each vowel can have one of three lengths, short, medium, and long. Short would be the length of the "u" in "but," medium would the the "u" of "dune," and long would be the "oo" of moon, or slightly longer. Length is indicated by s, m, or l: U)l would be the "u" of "moon."
Each vowel can also have one of seven tones: high (H), basic (B), low (L), rising (R), falling (F), mountain (M), and valley(V). Basic is the normal, middle pitch of English speech, and high and low are, reasonably enough, higher and lower in pitch, respectively. Rising tone means that the tone rises from low to high pitch: AmR sounds like "Eh?" Falling tone falls from high to low. Mountain goes from low to high to low (like Yoda saying "hmmMMMmmm,") whereas Valley goes from high to low to high, a bit like someone saying "Hrm?" in an extremely querulous voice. usH usB usH ulL should sound a bit like Beethoven's Fifth. u*sH u*sB u*sH u*lL is, of course, Beethoven's Fifth as sung by a goat.
Lastly, one can have stressed, unstressed, and heavily stressed vowels. "Stress" refers to to amplitude or loudness of the vowel- be careful, since in English stress also tends to change the length and pitch of a sound, whereas in Speedtalk it should only change the amplitude.
As one can see, this gives us a total of 8x2x2x2x3x7x3=4032 words. This more than covers the realm of basic vocabulary- even in Heinlein's scheme, most words would be phrases composed of simpler ones. In general, if I know 3,000 or so words of a language I can get the gist of a text written in that language, so this is an excellent start. Remember, too, that we can stretch things further- the "u" of "but" or the "e" of "get" are not in the basic vowels, and we can also increase the number of lengths to four or five. If we allowed "rising from low to basic" and "rising from basic to high" as well as the equivalent falling tones, we would have four extra tones, not to mention variants on the mountain and valley tones such as "medium-high-low," etc. Including all of these could give us 20,000 phonemes. If we also include the possibility of off-glides (imagine the difference between "bay" and an abrupt "ba," or "bow" and "bo,") and ending vowels with or without glottal stops, one could have well over 100,000 basic phonemes- more than enough for an entire language, and we still haven't exhausted all of the possibilities.
There are a number of approaches one can take in creating a grammar for Speedtalk. Most people seem to try to create a new grammar for Speedtalk out of whole cloth (Heinlein, thinking again in terms of the Sapir-Whorf hypothesis, had the grammar of Speedtalk be based on General Semantics, so that his supermen would be super-rational as well.) My personal preference is to think of Speedtalk as a sort of "compression algorithm" in which Speedtalk phonemes are code for words in, say, English. Ideally, one would use an isolative language like Chinese, in which words are unmodified by prefixes and suffixes, and a single phoneme really could stand for a single word in its entirety. Alternately, the idea of using Speedtalk phonemes as code for the basic vocabulary of Esperanto is intriguing- one could then use consonants as the Esperanto prefixes and suffixes that modify words. Since Esperanto contains a small number of basic words which are then turned into other words using prefixes and suffixes ("left" is "unright" in Esperanto,) it would be well suited to use with a small number (ie, 4,000 or so) of Speedtalk vowels. However, this would draw one away from the basic idea of one word=one phoneme. Since I don't know Chinese and my Esperanto is a bit rusty, I like the idea of using Speedtalk to encode English (which also would make it much more accessible to the Anglophone public.) Basic words like "right," "dog," or "to eat" would be single vowels which could then be modified by consonants- while "left" would have its own word (no Esperantisms here!) "unopened" would be a compound.
Here are some possible Speedtalk prefixes and suffixes:
d- past tense "ed"
s- plural "s"
z- possessive "s"
n- opposition, "un"
ng- "ing" ("ng" is one phoneme, not two.)
sh- "ish"
l- "ly"
ch- "tion"
etc.
Thus if "My" were o)*H, "open" were e^M, and "letter" were "u^M", then "My unopened letters" would read: o)*H/n/e^M/d/u^M/z. As with the vowels, there is much room for creativity in creating consonants- many words could be considered compounds of basic words with prefixes and suffixes, if one wished, and one could easily create a list of at least 60 consonants. One should also consider the question of how to choose which vowel goes with which word. The best plan would most likely be to have the most common words in English be represented by those phonemes which are easiest to pronounce. In particular, one would want short phonemes with M and V tones to be used for the most difficult words, since it is hard to articulate the down-up-down and up-down-up patterns quickly. Alternately, one could take the reverse approach and make the most common words be represented by the most difficult phonemes, so that Speedtalkers would have the most practice with them. Lastly, opposites should be represented by very different-sounding phonemes, in order to make sure that they can be understood distinctly in speech- a particularly important consideration for words like "safe" and "dangerous"!
One can also give some thought to what sort of writing system would be best for Speedtalk (Speedwrite?) Such a system would probably be ideographic, so that readers could take in each word as a whole unit, but should also be systematic in order that people would be able to learn it easily and can translate new ideograms easily into their spoken equivalents. One way to do this would be to have separate elements of the ideogram represent different variables used in creating the vowels. A box could represent "A", a dot inside the box would mean it was rounded, a dash above the box would mean that it is a high tone, etc. I have devised one such system, but unfortunately do not have time to put it on the web right now.
Interesting essay 2 : taken from -> http://a-new-world-language.blogspot.com/2010/01/heinlens-speedtalk-and-existing-similar.html
Heinlen's SpeedTalk and existing similar languages.
Speedtalk is an idea for a new language put forth by Robert A. Heinlein in his novella, Gulf (1949). Speedtalk was defined as an entirely logic-based language, and it was a key plot device. The basic concept was that the conlang would utilize a complex syntax with a minimal vocabulary and a phonemically extensive alphabet (including such letters as œ, ħ, ø, and ʉ), and it was therefore considered extremely efficient. A single phoneme indicated a word, so a "word" indicated a sentence. In the only example given, a single word meant "The far horizons draw no nearer."
The story invokes the notions of the General Semantics of Alfred Korzybski and the work of Samuel Renshaw to explain the nature of thought and how people could be trained to think more rapidly and accurately.
The supermen communicate in an arcane language, a form of English called Speedtalk, which is both unintelligible and unlearnable by outsiders. Speedtalk is founded upon two principles: a reduced lexicon, and an enlarged phonology. Any English sentence can be composed from a small vocabulary, such as the word set of Basic English. Also, although the human vocal tract can produce hundreds of different sounds, no existing human language normally makes use of more than a few dozen of them. In Speedtalk, each word from the Basic English set is assigned to a different sound. A sentence in Basic English can therefore be pronounced in perhaps one fourth the normal time.
D: Oh. Been done. A designed language that explores this. http://en.wikipedia.org/wiki/Ithkuil_language
Ithkuil’s phonological system of 65 consonants and 17 vowels is based on sounds from a variety of languages, including such as Chechen or Abkhaz. It is often difficult for a monolingual speaker to pronounce, or even to distinguish, some of the sounds. The consonants of Ithkuil are as follows:
No person is hitherto known to be able to speak Ithkuil fluently; Quijada, for one, does not.[1]
D: Hmm. A language so difficult that it cannot be learned as an adult. Meaning only those raised on it from birth could possibly use it. That narrows down its utility...
I suppose, ultimately, that the brain's inherent limit, its Universal Grammar, provides a hard cap on how complex a language can be.
The most documented living language with the most tones currently is Ai-Cham (錦話), a member of Kam-Sui languages in the Tai-Kadai language family. It has a total number of 11 tones[citation needed]; Pinghua has 10 tones[citation needed]. However, preliminary linguistic work being done in the Chatino family of languages in southern Mexico suggests that some Chatino dialects may phonologically distinguish as many as 14 tones.
Language with the most words: English, approx. 250,000 distinct words
Language with the largest alphabet: Khmer (74 letters). This Austro-Asiatic language is the official language of Cambodia, where approx.12 million people speak it. Minority speakers live in a handful of other countries.
The language with the most sounds (phonemes): !Xóõ (112 phonemes). Approx. 4200 speak !Xóõ, the vast majority of whom live in the African country of Botswana.
Language with the most consonant sounds: Ubyx (81 consonants). This language of the North Causasian Language family, once spoken in the Haci Osman village near Istanbul, has been extinct since 1992. Among living languages, !Xóõ has the most consonants (77).
Language with the most vowel sounds: !Xóõ (31 vowels)
D: OK, let's look at a fairly basic CVC (consonant-vowel-consonant) form syllable: 1) about 30 vowels 2) about 80 consonants 3) about 10 tones 4) 3 way vowel gemination? Like Thok Reel. D: so... 80 x 30 x 80. x10 x 3? That's ignoring diphthongs. About... 500 0 0 0 0. Five million possible 1 syllable words? Gack.
So if this can be done, why is it not? Diminishing returns. Design tradeoffs.
Take the simplest of natural languages, Rotokas.
Rotokas possesses one of the world's smallest phoneme inventories and its alphabet is perhaps the smallest in use. (The Pirahã language has been claimed to have fewer speech sounds, but it is not written.) The alphabet consists of twelve letters, representing eleven phonemes. The alphabet characters are A E G I K O P R S T U V
D: one natural language - a pidgin - Taki Taki- has a vocabulary of 340 words.
Sranan (also Sranan Tongo "Surinamean tongue", Surinaams, Surinamese, Suriname Creole, Taki Taki) is a creole language spoken as a lingua franca by approximately 400,000 people in Suriname.[1] It is the mother tongue of the Creoles. Sranan was previously called nengre or negerengels (Dutch, "negroenglish").
D: the benefit of fewer sounds that are used as significant is that they are more clear. The difference between normal, nasalized and velarized is pretty subtle. Toss in less than ideal conditions, such as background noise, and I bet Rotokas is still crystal clear. That would not be true of Xoo. Or SpeedTalk.
I also have another objection to SpeedTalk as a concept. It is so busy trying to break the record for per-syllable info density that it ignores per-time. By this I mean that complex tones (rising/falling) and vowel gemination (1, 2, 3 time duration) extend the time duration of a syllable. The result is that a simpler system, with fewer possible meanings per syllable may very well have a HIGHER info density per second. And then there is the chance of pronunciation and hearing/processing mistakes.
I'm sure Spanish vowels are more distinct than English ones, for example.
bit, bet, bat, but beat, bait, boot, boat, bought, bite, and bout.
Versus AEIOU. Ah, eh, ee, oh, oo.
Aside. Yup, don't think I'll be making a carbon copy of Lojban. Being so unambiguous and logical is, well, downright unnatural.
http://www.joerg-rhiemeier.de/Conlang/auxlang-design.html
D: good primer. Too bad langmaker folded.
Logical languages are also unsuitable
The disciples of logical languages (loglangs, such as Loglan or Lojban) often propose using such a language as an international auxiliary language. However, loglangs are poorly equipped for this purpose. The loglangers tend to overlook the simple fact that language and formal logic do not serve the same purposes. Language is not primarily about making propositions that can be mathematically proven or disproven; its purpose lies in communicating ideas and emotions. There are many facets of language use which logical languages do not cover well. Language is language and logic is logic; they are different things.
It is also such that logical languages are very difficult to learn and use. Most people are not acquainted with the intricacies of formal logic; they cannot be expected to learn it in order to learn a language. I once tried to learn Lojban; I quickly abadoned that attempt because I simply could not understand how the language works. The language does not even have the same kind of parts of speech as human languages have. Instead of nouns and verbs, phrases and clauses, Lojban has things bearing such fanciful names as brivla, cmene or selbri. Those are Lojban words which cannot easily be translated into any other language; it tells a lot that Lojbanists use these words rather than English equivalents when talking about Lojban in English. I haven't met the same difficulties in any grammar of a natural language, no matter how exotic. Logical languages may be absolutely neutral - but only because they work in a way completely different from how human languages work, and are thus exceedingly difficult to master.
D: I found vast lists of alphabetical Lojban vocabulary. Of no use whatsoever in learning it. Is there a useful software primer?
D: the Decimese logic/ethics vocabulary will be terse and concise. But I won't be slapping logic in its core. It would cease to be a viable IAL.
Lojban, having been designed to test the Sapir-Whorf hypothesis, was NOT designed as an IAL. Its fans seem to think they can tack on 'oh yeah- and it could a great IAL!' as an afterthought. Language design doesn't work that way. If the principle is not central during design, it certainly won't appear in the final product.
Notably, I strongly part ways with all cultural neutrality premises. There have been 2 major approaches. 1) Euroclones. Esp-o, et al. Europe rocks- suck it up. 2) Neutral. Nobody opposes it... and nobody supports it. And I suppose, the usual North American refrain that everybody should/does use English.
So I suppose, in order, Decimese is designed to: 1) be easy 2) be powerful 3) suck up to the English (now) and Chinese (later).
A language that is made to be learned by adults better be much more Rotokas than Xoo. As many overt and redundant indicators of word boundaries and grammatical component should be present as possible. I default to subject-verb-object for the reason of 3). Plus there are two ways to bypass word order: 1) Europe rocks- suck it up. Latinate. Heavily infixed. Hard in itself to most of world. 2) same trick, but using isolated word particles. Like Lojban.
As I've said before, there IS a reason to use a different word order: SOV. Most 'natural', based on intuitive hand gesturing. Also, most clearly parsed by a computer.
On a related note, I think I can have an optional 'long vowel' system for external word particles. Not of any immediate use though. Not in Mandarin? Then don't bother.
http://nextbigfuture.com/2007/05/chinas-economy-could-pass-usa-in-2020.html
D: do I feel a bit like a prostitute, sucking up to English and Mandarin? Yep. But if nothing before has worked, then it's time for a new sales pitch. IALs remain a "solution without a problem" in the eyes of most of the world. So I need to implement a game-changing strategy if I expect to not get lost in obscurity like every other effort.
src = http://web.archive.org/web/20000618191251/http://fatmac.ee.cornell.edu/~ben/speedtalk3.html
src = http://en.wikipedia.org/wiki/Speedtalk
src = http://www.ithkuil.net/ithkuil-intro.htm
src = http://en.wikipedia.org/wiki/Ithkuil
Random additives
Regular expressions : http://en.wikipedia.org/wiki/Regular_expression (to visualize ranges, patterns, etc.)
Sigur Ros : http://en.wikipedia.org/wiki/Sigur_R%C3%B3s#Vonlenska