It's a good thing that Slashdot told us that Android Phones Sell Like Hotcakes In Kenya. As a firm believer in press releases, I
for one welcome the use of firewood for recharging smartphones twice a day...
Vocabulary lookup alone isn't enough in general. There's no 1-1
correspondence between word meanings in different languages, but the
correspondence is strong when the same word occurs in both, which is
my point.
If you think about a thesaurus, you immediately see the problem with pure dictionary lookup in general. You can map a word to a general idea, but which word
do you output (in the other language) for that general idea? Also, there are ideas which exist in one language and not in another, so that's a problem too.
One, the Normans were of Viking origin and spoke a peculiar dialect of
French. Two, even if they'd spoken the standard French of the time[1], it and
modern French have had a thousand years to diverge.
Of course they did not speak modern French, but your criticism doesn't
follow. You might have a point if both French and English had evolved completely
independently of the Old French spoken during the Middle Ages (and of each other), but this isn't true even approximately.
While the peasant vocabulary you mention -
family members, body parts, domestic animals - are indeed Germanic, the vocabulary used for legal matters and abstract thought is overwhelmingly of French and Latin origin. This is not surprising since the English ruling classes have had strong connections with France for
most of the intervening period, and for at least half that period, Latin was the Lingua Franca of abstract thought in Europe. This explains why both French and English haven't diverged radically along their common French vocabulary.
Finally, the modern world is industrial: the Germanic agricultural vocabulary is not as relevant as it once was, whereas the implanted French words haven't lost their comparative importance.
French is one of the easier (easiest?) languages to translate into English.
After the colonization of England by the French, the language was left with
many words which haven't changed substantially in meaning from the original French, enough to form a fairly complete vocabulary. In fact, one can get by
quite well in English without using too many anglo-saxon words. Moreover, the logical structure of the French grammar is a bonus for machine translation algorithms. It's harder to translate English into French, actually.
Yes, and the search ranking is supposed to rank by relevance. So if the 3rd result is the most relevant, then the search ranking is incorrect and should be fixed.
For me, the third suggested search is "MS Antivirus malware". Without having that there,
the search results for "MS Antivirus" that declare it as malware are all below the fold. The results for "MS Antivirus
malware" have the wikipedia entry for the malware itself as the first result.
Sounds like a limitation of Google's ranking algorithm to me. Shouldn't they fix the ranking, rather than rely on
an extral UI layer (aka "suggested search", that may or may not be turned on for a user)?
If you notice incorrect rankings, you should probably report them to Google, so they can tweak the signal weights.
I'm not
sure that's really a good argument for getting rid of patents as it doesn't
really speak to whether patents help or hinder innovation; it only shows that
any nation not at the top of the patent pyramid has a vested interest in
ignoring them.
You've just answered your own question, haven't you? If all the competitors (aka not at the top) have an interest in ignoring them, then the top nation is competing with one hand tied behind its metaphorical back. That may be ok provided it's winning, but otherwise it's suboptimal.
You are dreaming. Murdoch's newspapers are the *customers*, not the producers (mostly).
The way it works is (very roughly): 1) news agencies have people on the ground taking pictures and writing the facts. 2) The news agencies sell the facts to newspapers and TV. 3) The aggregators republish the news from the online versions of newspapers.
Cut out 3), and 1) + 2) is the same as it's always been, even before the internet existed. Even if you cut out 2), say if Murdoch goes belly up, then 1) can still sell the facts to 3), which is what TFA is about.
You're dreaming. The sources already have paying customers, like the newspapers and TV stations. They can easily survive if the aggregators delist them "completely". The other way around spells death (for the aggregators).
I was assuming most of the authors would be dead already (ie 18th/19th century works - there's a lot of religious material written during that time). IIRC in the US, anything before 1923 is fair game.
If you scan all the works and make them available as electronic books (*), then you don't have to bother with patrons returning their copy at all. Saves a lot of bureaucratic busywork and data entry.
As is often the case, embracing a technology completely to its logical end brings new advantages.
(*) This will make Jesus cry. Don't do it. Unless you live in a drought affected area. Then you should probably do it for the good of the neighbouring farmers, but he'll kick your ass when you hit the pearly gates.
Personally the whole fanatical thing seems a bit silly
That's not silly! There are two reasons for silliness. Surprise and
fear. Fear and surprise... and ruthless efficiency. There are *three*
reasons for silliness, these being fear, surprise, and ruthless
efficiency... and an almost fanatical devotion to the Pope. *Amongst* the reasons for silliness are such elements as fear, surprise, ruthless efficiency and... Ok, you're right, fanatical is silly after all.
We need to start from the fundamental assumption that the data is out there and will be collected, and figure out from that what we
need to do.
With those constraints, there's only one thing we can do. That's a technique called poisoning the well. It's a repurposing of spamming technology for Good(TM). The idea is to add sufficient contradictory information all over the place that the real information, which is also out there, cannot be separated from it, and thereby becomes too unreliable to be used.
Good lecturers can be left to do their thing, but I question whether
the majority of average or even bad lecturers should be encouraged to
attempt more interactive sessions. In theory that sounds fine, but
anyone who's tried to keep a meeting on track knows how difficult it
is to stay focused and move through items at the required rate.
IMHO, an average lecturer who sat around talking about problems with
students rather than lecturing would let time slip badly, and simply
not get through all the material in the allotted semester
hours. That's bad, because the students will either be left to learn
the missing topics without supervision, and/or the lecturer will spend
countless office hours answering questions from students knocking at
the door when the exam approaches. For all its faults, the structured
lecture with students listening and maybe asking questions at the end
is not so bad.
Exactly. My point is it shouldn't be up to them. I have a computer
that can spider the articles like a regular user, including the ads if
that's what it takes, and I have processing tools on my machine to
mine the content. What I don't have (but *should*, IMHO) are tools
that make this pipeline so effortless that I can use them regularly
during web surfing.
If what you are interested in doesn't work this way already it is very likely that it is intentional on the part of the
content holders.
But if the information is there in my machine/browser I ought to have
tools to do what I want with it, irrespective of what the content
holders designed for me to see. You seem to argue that what the
content holder wants should be good enough for me. It usually is, but
only because the effort to extract/repurpose the bits I'm interested
in is too high. The current web experience is a bit like looking at a
raw log file, without grep. We can find stuff, but only if we look at
every irrelevant line as well and concentrate real hard.
I really have no idea where you are going with the second paragraph... Either documents have explicit metadata attached or
they do not. In the former case, my original response applies. In the latter you are going into a discussion that is
completely orthogonal to the "semantic web".
Correct me if I'm wrong but the semantic web idea is that publishers (or users) tag bits of a page so that it becomes easy to
retrieve the bits by that tag. Ideally the tag should be meaningful, but practically the meanings can't be standardized.
Publishers do very little tagging themselves, but users could tag pieces too (for private consumption say) if they have software that makes it painless. The latter is not something that depends on publishers' willingness to cooperate or do work, the latter only requires the community to come up with good tools that work reliably to describe/refer/extract/tag bits of information contained in a web page.
I don't think that's what the grandparent is talking about at all.
Let's say I find a web page that I like, and maybe it has a form on it somewhere with a dropdown containing a list of countries. I'd like to scrape that list and do some kind of throwaway mashup for myself. It's painful. Or maybe I'd like to sift through a list of articles on a magazine website, and I care only about some paragraphs which talk about a city I've been to. And I'd like to display those paragraphs on a private dashboard. Again, it's throwaway stuff, I just want it to last for a few hours starting right now.
There are no tools that make this kind of stuff painless. There are not even any *adequate* tools for this. The semantic web *should* make it possible at least. We ought to have ways of extracting the pieces that belong to a web page, not by generic component type (that's what DOM does), but by referring to the content we want in more human friendly ways. We ought to be able to extract the pieces, and recombine them into something else with minimal technical complications. And the result should itself be able to be mined for the information it contains, easily, in case someone else or myself wants to refine/extract some more.
The point being that open source allows experts to audit the software for the users as well as the users or contractors who are paid by the users or....
That's always a substantial improvement over closed software that can't be audited by experts, or users, or contractors paid by the users or...
Moreover, in the case of malware, having the source of the virus/trojan already available in the repository helps response time and sending out fixes in case it hasn't been caught earlier and lets users self-assess if they have the malware easily, and fix it in a variety of ways if they need or want to.
All of this isn't an option with closed software containing malware, but nice try.
It's the slash-set of servers as the internet population rotates around the shiny. Give it a few hours, and the server will rise again.
It's a good thing that Slashdot told us that Android Phones Sell Like Hotcakes In Kenya. As a firm believer in press releases, I for one welcome the use of firewood for recharging smartphones twice a day...
If you think about a thesaurus, you immediately see the problem with pure dictionary lookup in general. You can map a word to a general idea, but which word do you output (in the other language) for that general idea? Also, there are ideas which exist in one language and not in another, so that's a problem too.
Of course they did not speak modern French, but your criticism doesn't follow. You might have a point if both French and English had evolved completely independently of the Old French spoken during the Middle Ages (and of each other), but this isn't true even approximately.
While the peasant vocabulary you mention - family members, body parts, domestic animals - are indeed Germanic, the vocabulary used for legal matters and abstract thought is overwhelmingly of French and Latin origin. This is not surprising since the English ruling classes have had strong connections with France for most of the intervening period, and for at least half that period, Latin was the Lingua Franca of abstract thought in Europe. This explains why both French and English haven't diverged radically along their common French vocabulary.
Finally, the modern world is industrial: the Germanic agricultural vocabulary is not as relevant as it once was, whereas the implanted French words haven't lost their comparative importance.
French is one of the easier (easiest?) languages to translate into English. After the colonization of England by the French, the language was left with many words which haven't changed substantially in meaning from the original French, enough to form a fairly complete vocabulary. In fact, one can get by quite well in English without using too many anglo-saxon words. Moreover, the logical structure of the French grammar is a bonus for machine translation algorithms. It's harder to translate English into French, actually.
Yes, and the search ranking is supposed to rank by relevance. So if the 3rd result is the most relevant, then the search ranking is incorrect and should be fixed.
Whoa! So you're saying, when God created the penis, he was thinking about wiggling, crawling super ants? Thanks, that explains a lot!
Sounds like a limitation of Google's ranking algorithm to me. Shouldn't they fix the ranking, rather than rely on an extral UI layer (aka "suggested search", that may or may not be turned on for a user)?
If you notice incorrect rankings, you should probably report them to Google, so they can tweak the signal weights.
You've just answered your own question, haven't you? If all the competitors (aka not at the top) have an interest in ignoring them, then the top nation is competing with one hand tied behind its metaphorical back. That may be ok provided it's winning, but otherwise it's suboptimal.
The way it works is (very roughly): 1) news agencies have people on the ground taking pictures and writing the facts. 2) The news agencies sell the facts to newspapers and TV. 3) The aggregators republish the news from the online versions of newspapers.
Cut out 3), and 1) + 2) is the same as it's always been, even before the internet existed. Even if you cut out 2), say if Murdoch goes belly up, then 1) can still sell the facts to 3), which is what TFA is about.
You're dreaming. The sources already have paying customers, like the newspapers and TV stations. They can easily survive if the aggregators delist them "completely". The other way around spells death (for the aggregators).
I don't know about you, but I don't think my "boys" like that much UV...
I was assuming most of the authors would be dead already (ie 18th/19th century works - there's a lot of religious material written during that time). IIRC in the US, anything before 1923 is fair game.
As is often the case, embracing a technology completely to its logical end brings new advantages.
(*) This will make Jesus cry. Don't do it. Unless you live in a drought affected area. Then you should probably do it for the good of the neighbouring farmers, but he'll kick your ass when you hit the pearly gates.
That's not silly! There are two reasons for silliness. Surprise and fear. Fear and surprise... and ruthless efficiency. There are *three* reasons for silliness, these being fear, surprise, and ruthless efficiency... and an almost fanatical devotion to the Pope. *Amongst* the reasons for silliness are such elements as fear, surprise, ruthless efficiency and ... Ok, you're right, fanatical is silly after all.
With those constraints, there's only one thing we can do. That's a technique called poisoning the well. It's a repurposing of spamming technology for Good(TM). The idea is to add sufficient contradictory information all over the place that the real information, which is also out there, cannot be separated from it, and thereby becomes too unreliable to be used.
Nope. Different kind of ring. You'll have to pucker up if you want to succeed in this game.
IMHO, an average lecturer who sat around talking about problems with students rather than lecturing would let time slip badly, and simply not get through all the material in the allotted semester hours. That's bad, because the students will either be left to learn the missing topics without supervision, and/or the lecturer will spend countless office hours answering questions from students knocking at the door when the exam approaches. For all its faults, the structured lecture with students listening and maybe asking questions at the end is not so bad.
I was hoping they would develop a solar powered torchlight!
Yup, popular is one of the more decepticonly versatile words in the English language, but I believe it's being used optimusly in this case, though.
Exactly. My point is it shouldn't be up to them. I have a computer that can spider the articles like a regular user, including the ads if that's what it takes, and I have processing tools on my machine to mine the content. What I don't have (but *should*, IMHO) are tools that make this pipeline so effortless that I can use them regularly during web surfing.
But if the information is there in my machine/browser I ought to have tools to do what I want with it, irrespective of what the content holders designed for me to see. You seem to argue that what the content holder wants should be good enough for me. It usually is, but only because the effort to extract/repurpose the bits I'm interested in is too high. The current web experience is a bit like looking at a raw log file, without grep. We can find stuff, but only if we look at every irrelevant line as well and concentrate real hard.
Correct me if I'm wrong but the semantic web idea is that publishers (or users) tag bits of a page so that it becomes easy to retrieve the bits by that tag. Ideally the tag should be meaningful, but practically the meanings can't be standardized. Publishers do very little tagging themselves, but users could tag pieces too (for private consumption say) if they have software that makes it painless. The latter is not something that depends on publishers' willingness to cooperate or do work, the latter only requires the community to come up with good tools that work reliably to describe/refer/extract/tag bits of information contained in a web page.
Let's say I find a web page that I like, and maybe it has a form on it somewhere with a dropdown containing a list of countries. I'd like to scrape that list and do some kind of throwaway mashup for myself. It's painful. Or maybe I'd like to sift through a list of articles on a magazine website, and I care only about some paragraphs which talk about a city I've been to. And I'd like to display those paragraphs on a private dashboard. Again, it's throwaway stuff, I just want it to last for a few hours starting right now.
There are no tools that make this kind of stuff painless. There are not even any *adequate* tools for this. The semantic web *should* make it possible at least. We ought to have ways of extracting the pieces that belong to a web page, not by generic component type (that's what DOM does), but by referring to the content we want in more human friendly ways. We ought to be able to extract the pieces, and recombine them into something else with minimal technical complications. And the result should itself be able to be mined for the information it contains, easily, in case someone else or myself wants to refine/extract some more.
That's always a substantial improvement over closed software that can't be audited by experts, or users, or contractors paid by the users or...
Moreover, in the case of malware, having the source of the virus/trojan already available in the repository helps response time and sending out fixes in case it hasn't been caught earlier and lets users self-assess if they have the malware easily, and fix it in a variety of ways if they need or want to.
All of this isn't an option with closed software containing malware, but nice try.
Whatever the reason, the silver lining is they didn't sell to Dell. That would have been the worst possible outcome for the Thinkpad line. *shudder*