By studying the development of ancient written language, and working up from the simple to the sublime, I might actually come up with a representation of ideas that machines can manipulate.
The text you are reading write now is in the tradition of the Phoenicians. Each letter, or group of letters, indicates a sound. And the reader sounds out the words in their own mind (or out loud) to understand the text. It's the dominant language form today because one only has to learn a few dozen symbols.
Ancient languages are structured differently. Every "idea" had a symbol. Ideas that worked together to make a more complex idea are arranged close together, or inside of a box or column to show their relation. Different languages had different rules, and the rules changed over time as people used the language and developed more complex thoughts, and made shorthand for bits of language they repeated over and over and over again.
I'll probably run into some obstacle that makes this effort fruitless, but at least it is a start.
So my plan is to learn Cuneiform, and in particular learn how the language evolved from it's earliest days to when the language splintered off to form the basis for other forms of communication. In particular I want to enumerate all of the ideas, and develop a database of the rules on how they connected. My thinking is that organizing thoughts like this may make developing a machine which can develop verbal communications on its own easier.
Trying to write Natural Language Parsing rules for English is frightfully difficult. English is a language built by taking several other languages, throwing all of their rules in a blender, and then piecing together the aftermath back into a language while very drunk and over the course of an all-nighter. On one hand, it's great for literature because there are so many ways to express the same idea.
On the other it's a pain to write a parser for because there are so many ways to express the same idea. I'm hoping that this pictograph language database idea of mine allows me to develop a repeatable system for representing the gist of a statement. And then, in my heart of hearts, I'm hoping to take this representation and turn it back into English. Or French. Or Chinese. The pictographs aren't verbal language specific.
Traditional Chinese would also be a good starting point too. My hope with Cuneiform is that, as it represents the language of a more Primitive time, the ideas it represents will themselves be simpler. Also, if I completely mangle something, there aren't a billion or so living Sumerians who will correct me incessantly online. I mean, if it works, there is nothing to say the same methodology couldn't be done on Traditional Chinese. In fact, it would be fascinating to compare the results to see if my theory that big ideas are written first actually holds up in light of actual history.
As they say in the film business, more on this as it develops.