that I could pause the time so I can have planty of time to finish the first post.
Pause... Now!
Re:The only problem is...
on
The Social Web
·
· Score: 1
I'm currently using LEDA C++ package which provides many functions to do computation on graphs. It has a feature-rich interface to display and manipulate graphs. You can change the layout (spring, random, etc.) on the fly within that interface. If you are programming in C++ then you'll find LEDA is a great help. LEDA is platform-independent (I've used Win32 and Linux version), but obviously you cannot display your result within browsers.
In China, the award for telling the police the location of an underground CD/VCD manufacturing line is 300,000 RMB, a whole life's earning for an average people. The government really put efforts into this.
But on the other hand, there are dozens of companies in China making VCD players, which presumably could only be useful if the effort fails. And the government more or less knows the situation. So in a way, you are right.
For simple techniques (without learning or any kind of intelligence) such as regular expression to extract or label contents from web pages, you won't expect a good coverage from pages written in all kinds of templates and with so many types of errors.
Right now I'm writting a Java program to extract links from Google search results (easy, don't shoot! Academic use only). What I'm using is OROMatcher, one of the best regular expression packages for Java. I'll say it's still a mission impossible to get 100% recall and be error-free even for this simple task.
The formal name of such a program (labelling and extracting contents) is a "wrapper". Probably the only way to improve the efficiency of a wrapper is to apply machine learning techniques. A well-trained wrapper program with good learning algorithm could be smart enough to adapt to HTML coding formats with small variances. A good example is in this paper.
Basically you need a pair of 8 bits for one Chinese character (the first bit for identifying this is Chinese and the other 7 for actual encoding). So each character tooks two bytes.
However, Chinese are commonly known as more concise than English or other languages with a small character set. There are thousands of commonly used characters each of which have the function of a word in English. Many characters have more than one meaning, and their combination (2 characters in most case) makes new words. And don't forget the amazing flexibility in the Grammar system (e.g. fewer stop words like "the")! We are not even talking about the ancient Chinese which is much SHORTER.
Give me any sentence with more than 10 English words (with no words like Yugoslavia of course), I guarentee to re-write it in Chinese in less space.
You see, this is the basic rule of information. You increase the complexity of encoding scheme, you get more density.
How complex this is? Well, I have to say that the 12 years' of Chinese class are a painful memory.
If it ain't broke, don't fuck with it. :)
I won't touch it. It's too ugly.
the lack of artistic vision in our mind.
I broke my Thunderbird with only a loosely-mounted fan.
that I could pause the time so I can have planty of time to finish the first post.
Pause... Now!
I'm currently using LEDA C++ package which provides many functions to do computation on graphs. It has a feature-rich interface to display and manipulate graphs. You can change the layout (spring, random, etc.) on the fly within that interface. If you are programming in C++ then you'll find LEDA is a great help. LEDA is platform-independent (I've used Win32 and Linux version), but obviously you cannot display your result within browsers.
/
The address:
http://www.algorithmic-solutions.com/
Unfortunately, LEDA has just gone commercial:(
LEDA uses GML format to for graph representation. See the link below:
http://www.infosun.fmi.uni-passau.de/Graphlet/GML
In China, the award for telling the police the location of an underground CD/VCD manufacturing line is 300,000 RMB, a whole life's earning for an average people. The government really put efforts into this.
But on the other hand, there are dozens of companies in China making VCD players, which presumably could only be useful if the effort fails. And the government more or less knows the situation. So in a way, you are right.
to tell Pioneer 6 that John Lennon is dead 20 years ago.
For simple techniques (without learning or any kind of intelligence) such as regular expression to extract or label contents from web pages, you won't expect a good coverage from pages written in all kinds of templates and with so many types of errors.
Right now I'm writting a Java program to extract links from Google search results (easy, don't shoot! Academic use only). What I'm using is OROMatcher, one of the best regular expression packages for Java. I'll say it's still a mission impossible to get 100% recall and be error-free even for this simple task.
The formal name of such a program (labelling and extracting contents) is a "wrapper". Probably the only way to improve the efficiency of a wrapper is to apply machine learning techniques. A well-trained wrapper program with good learning algorithm could be smart enough to adapt to HTML coding formats with small variances. A good example is in this paper.
However, Chinese are commonly known as more concise than English or other languages with a small character set. There are thousands of commonly used characters each of which have the function of a word in English. Many characters have more than one meaning, and their combination (2 characters in most case) makes new words. And don't forget the amazing flexibility in the Grammar system (e.g. fewer stop words like "the")! We are not even talking about the ancient Chinese which is much SHORTER.
Give me any sentence with more than 10 English words (with no words like Yugoslavia of course), I guarentee to re-write it in Chinese in less space.
You see, this is the basic rule of information. You increase the complexity of encoding scheme, you get more density.
How complex this is? Well, I have to say that the 12 years' of Chinese class are a painful memory.