While neither Traditional Chinese nor Cuneiform turned out to be a help in organizing ideas into elemental symbols, I did learn a few tricks from their study that will make implementing a machine natural language easier.

Cuneiform has a nifty system of adding case to a root word as a suffix. Recently I was trying to implement a Mad-Libs style story generator. But you can't just take random subjects and verbs and connect them together with boiler text. You have to take into account if the subject is male, female, something in between, several people, an inanimate object, etc.

Let us take an example:

John mentioned to Becky that her shirt looked nice.

We have two people, one object, two verbs, and an adjective. This is how it would look in the past tense, from a third person's perspective. That same sentence in a screen play:

John (to Becky): Your shirt looks nice.
John: Hey Becky! That shirt looks nice!
John: Becky, that's a nice shirt.

We also have some options with prose:

John said "Becky, that is a nice shirt."
Becky heard John say: "That is a nice shirt."
Becky said, "John told me my shirt looked nice."

Each of those forms has roughly the same information in it. And each of those forms has a more or less correct context in which it would be used. The challenge for Eostric is to develop a set of rules to first allow a computer to create those sentences from some internal data structure. Later the challenge will be to take text from humans and turn them into that internal data structure. But I am focusing on composition first.

Part of the complexity of English is that written English more or less follows the patterns of spoken English. Written English is in the style of the Phoenician system. It is designed to mimic the sounds a speaker would have made, and in the order they would have made them. Like reading sheet music. If you don't understand spoken English, reading the written form is virtually no help. [1]

Not all forms of writing do that. In the ancient world, written language tended to trade in abstract symbols for big ideas, and only punt to sounding things out for names. The effect was that two parties, even though they spoke different languages, could exchange ideas in cuneiform. And that case is even true today with Chinese. The character set is used in multiple countries that speak different very different languages. A writer who only speaks Mandarin can communicate with a party who only speaks English if both understand the written form of Chinese.

Learning written Chinese

If I take the string John mentioned to Becky that her shirt looked nice, and run it through an online translator I get the following: 約翰向貝基提到她的襯衫看起來不錯。 .

That phrase breaks down to:
約翰 - John
向 - to
貝基 - Becky
提到 - mention
她的 - her
襯衫 - shirt
看起來 - looks
不錯 - good
。 - (end sentence)

Changing the sentence to John told Becky that her shirt looked good yields 約翰告訴貝基說她的襯衫看起來不錯。 , which breaks down to:
約翰 - John
告訴 - tell
貝基 - Becky
說 - say
她的 - her
襯衫 - shirt
看起來 - looks
不錯 - good

If tell the story from Becky's perspective, and run the phrase Becky heard John say her shirt looked nice through the translator we get: 貝琪聽到約翰說她的襯衫看起來不錯 . That breaks down to:
貝琪 - Betsy [2]
聽到 - Hear
到約 - John
說 - say
她的 - her
襯衫 - shirt
看起來 - looks
不錯 - good

And the phrase Becky heard John say that her shirt looked nice translates to: 貝琪聽到約翰說她的襯衫看起來不錯 , which is identical. Essentially the word that is lost in the wash when converting from English to Chinese in this case.

And the phrase Becky heard John say that her shirt looks nice translates to: 貝琪聽到約翰說她的襯衫看起來不錯 , which is also identical. The english way of stating the past or present tense for the verb inside the quotation is also lost in the wash.

And if we state that something will happen in the future with Becky will her John say that her shirt looks nice we get: 貝基會讓她的約翰說她的襯衫看起來不錯 , which the back translator says is Becky's going to get let John say her shirt looks good. Um, not what I was expecting at all. I made a typo and entered her instead of hear. But interestingly enough, that phrase that made no sense in English the translator happily added sense to. The back end of the phrase 她的襯衫看起來不錯 is identical. The front of the phrase: 貝基會讓她的約翰說 if I run it through by itself actually translates to Becky's going to get her John to say. So there is a bit of context reading that applies to the entire sentence to distinguish between someone listening passively and someone eliciting a response.

What I meant to translate was the phrase Becky will hear John say that her shirt looks nice. Which yields: 貝基會聽到約翰說她的襯衫看起來不錯 ,
貝琪 - Betsy
會聽 - Will Listen
到約 - John
說 - say
她的 - her
襯衫 - shirt
看起來 - looks
不錯 - good

Let's change it up a bit. How about John said Becky's shirt looked nice? We get: 約翰說貝基的襯衫看起來不錯 :
約翰 - John
說 - Say
貝基的 - Becky's
襯衫 - shirt
看起來 - looks
不錯 - good

Interesting. Let's take that phrase and change the adjective to red. John said Becky's shirt is red yields 約翰說貝基的襯衫是紅色的
約翰 - John
說 - Say
貝基的 - Becky's
襯衫 - shirt
是 - is
紅 - red
色的 - in color

And if I pick a different adjective, perhaps flavor? John said Becky's shirt is strawberry flavored, translates to 約翰說貝基的襯衫是草莓味的
約翰 - John
說 - Say
貝基的 - Becky's
襯衫 - shirt
是 - is
草莓 - strawberry
味的 - in flavor

Though I should point out the back translation actually displayed as John said Becky's shirt smelled of strawberry., so I imagine 味 could be flavor or smell, the reader is supposed to tell by context.

What if her shirt had a sound? John said Becky's shirt sounded like crickets, translates to 約翰說貝基的襯衫聽起來像蟋蟀
約翰 - John
說 - Say
貝基的 - Becky's
襯衫 - shirt
聽起來像 - sounds like
蟋蟀 - cricket

What if the shirt was missing? John said Becky's shirt was missing, translates to: 約翰說, 貝基的襯衫不見了 or:
約翰 - John
說 - Say
貝基的 - Becky's
襯衫 - shirt
不見了 - is gone

Let's take John out of the picture, because that doesn't seem to alter our grammer output much, and focus on the relationship between Becky (or Betsy) and the state of her shirt:

Becky's shirt is missing: 貝琪的襯衫不見了 .
貝基的 - Becky's
襯衫 - shirt
不見了 - is gone

Becky took off her shirt: 貝基脫下她的襯衫 .
貝基 - Becky
脫下 - take off
她的 - her
襯衫 - shirt

Becky didn't wear a shirt: 貝琪沒穿襯衫 .
貝基 - Becky
沒穿 - Not wear
襯衫 - shirt

And let's jumble it up a little bit. Becky, not wearing a shirt, entered the shower.. Here we have inserted a parenthetical. The actual verb is "entered the shower." That she wasn't wearing a shirt at the time is extra information. We get: 貝基沒穿襯衫, 就進了淋浴
貝基 - Becky
沒穿 - Not wear
襯衫 - shirt
, - (break)
就 - and
進了 - go into the
淋浴 - Shower

Let's remove the "not", Becky, wearing a shirt, entered the shower. translates to: 貝基穿著襯衫, 沖進了淋浴間
貝基 - Becky
穿著 - wear
襯衫 - shirt
, - (break)
沖進了 - Rushed into the
淋浴間 - shower room

If I simplify the job of the translator into two sentences: Becky was not wearing a shirt. She entered the shower., translates to: 貝基沒穿襯衫。她進了淋浴間。
貝基 - Becky
沒穿 - Not wear
襯衫 - shirt
。 - (end sentence)
她 - she
進了 - into the
淋浴間 - shower room
。 - (end sentence)

What did we Learn

From the example above, you can see that written Chinese does a fairly decent job of putting the ideas in a sentence in a consistent place. A quick read of the rules of Chinese Grammer confirms as much. Now I can't say our final thought data structure is going to mimic Chinese exactly, but I think it will be a decent start.

In my next post (Simplifying language structure), I'll be discussing how a language system could tear English text apart and what short of things that English grammer rules find important that Chinese sorts of ignores and vice versa.


[1] - In point of fact, teaching the deaf to read requires some special instructional techniques.

[2] - Yes, it made Becky into Betsy for some reason