smallfries · Slashdot Mirror

Re:Is that so? on Some Developers Leaving Google For Microsoft · 2008-07-01 19:50 · Score: 1

Why would you need unique labels? This is preciesly the limitation that you posited that I was asking you about. In your example why not just use:
May && Work && SmallProject ... etc

What do you gain by separating Work from other uses of Work in other contexts? Sure, if you just look for things tagged Work you'll get a union over many folders, but if you then refine it with May, or with May and Small Project then you'll get the same result.

Yet no one suggests this is a good idea anywhere but email.

Well, I did mention this a couple of posts ago. Microsoft have been promising to put this in WinFS since Cairo, and Reiser spent a long time looking at how this would work in a FS. If you're familiar with the newer search interfaces on the mac then they are going in this direction.

Re:Is that so? on Some Developers Leaving Google For Microsoft · 2008-07-01 19:43 · Score: 1

If you give tags something -equivalent- to a tree, then you have a tree, and we are just arguing semantics. ;)

Yes I think we are :) Especially now that I've read how you would implemented a "tree" for filing. In fact I think that we're both describing almost the same thing. The only difference between the two seems to be whether or not you want to preserve the ordering for inclusion or not. I'd argue that without strict ordering it is easy to find information. Your argument about reducing the scope during navigation / filing is quite persuasive.

Yes, you're right that Venn Diagrams would break down quickly for some combinations. But there are other ways to visualise that type of information. In particular if you allow the regions for each label to be non-contiguous then you can get around the layout issue quite easily. As a tradeoff, deciding where to split each category becomes very hard. Then you would need some form of dimensional analysis like PCA or Salmon Mapping.

Some interesting ideas about filing though, given the amount of thought you've put into it I'll expect to see a demo soon :)

Re:Is that so? on Some Developers Leaving Google For Microsoft · 2008-07-01 00:12 · Score: 1

The difference between tags and a tree is ordering. Despite your claims that it is a simple UI issue, if you organise information in a tree then the multiple "tags" that each level applies cannot be retrieved out of order efficiently. For large collections this becomes an issue as there are O(n!) possible arrangements of a set of applied tags into possible paths in the tree that they could be stored at.

You claim that this is beneficial because deeply nested folders have further distance from one another, but also that it "speeds up" applying collections of tags. You forget that you have to navigate to that depth before you can apply all of the tags together, so the amount of work is the same.

Your claims about using leafs as tagged objects and performing database queries of tree structures ignore the complexity of these operations.

I'm aware that the ordering info is extra, hence the "(almost)" in my post. But the tags do add something that is not explicit in the tree, which is easy to access unions and intersections. In a tree representation these are very costly and don't scale very well.

Now I'm going to do what you've done and posit that some hypothetical UI for tags that doesn't exist is better than using folders :) The main issue that you (and others) seem to have with tags is that they are not as rigid as folders, and so it may be harder to organise inform according to them. This really is a UI issue. Instead of a flat list of tags there needs to be something equivalent to a tree to give the collection a spatial metaphor. Using something similar to a venn diagram would work, so that the "outer" larger regions corrospond to the nodes high in the tree, and the small "inner" intersections between regions are the subfolders.

Re:Is that so? on Some Developers Leaving Google For Microsoft · 2008-06-30 22:17 · Score: 1

No it isn't. It's equivalent to putting two labels on the same message and promising yourself never to use the "sublabel" in any other context.

Why? The context is separated by the set of labels that you are finding the intersection of. No such limitation exists. If you are going to argue otherwise then provide a example.

Re:Is that so? on Some Developers Leaving Google For Microsoft · 2008-06-30 18:39 · Score: 1

Now that you've finished arguing with the other guy about what you both think are labels, here is a simple question for you:

Do you realise that labels (and what you can do with them) are (almost) a superset of folders?

If you are having difficulty getting your head around hierarchy then stop using the box on the left to get into your labels and start using the search dialog. When you organise things hierarchically you put one folder inside another. This is equivalent to putting two labels on the same message. The "nested" group of messages is now the interesection between the two labels. In crappy ascii art:

Folders:
May
Work
Play
June
Work
Org
July
Play
Org

Labels:
[Bunch of messages labelled Work + May]
[Bunch of messages labelled Play + May]
[Bunch of messages labelled Work + June] ... etc

The reason that I said almost above is that ordering is strict in a tree but sloppy in labels. Most people consider this to be a benefit rather than a limitation as it means that I can find the first group above as either:
May/Work
Word/May

The reason that people see the above sloppyness as a benefit is because it takes less work at organisation time to get the same results at retrieval time. Note that the hierarchy is done just using intersection, you can of course use union to merge folders together (without moving them).

These benefits are the reason that Microsoft have been promising a database filesystem for a decade and Reiser hyped the crap out of his v4 for so long. The interface (what you guys started arguing about) is simply easier to use, and more powerful. You don't get that combination very often.

Re:Encryption on Brightnets are Owner Free File Systems · 2008-06-30 09:31 · Score: 1

It is an interesting argument, and one question that it raises is what happens if you distribute the map, but not the chunks? This other angle is interesting from a legal angle because one of the block sources could be a public text, perhaps a chunk of Gutenburg or another easily acquired (and legal text).

The other extension is what happens if we collude offline to make some blocks and a map. Then I distribute one, and you distribute the other. In this case the geek argument shouts "Aha we're both in the clear!", but legally that wouldn't wash. It would be obvious that we intended to violate copyright, and cases are largely run by peoples interpretation of the law.

One thing that it does make clear is that copyright laws are broken, and don't make sense once everything is represented digitally.

Re:Encryption on Brightnets are Owner Free File Systems · 2008-06-30 05:02 · Score: 4, Insightful

The point that I was making was that the law operates on intent rather than action. If the sole purpose of your XOR of Britney and someone's holiday photos is to allow reconstruction of the mp3 then it is a defense that the court would see through.

Re:Encryption on Brightnets are Owner Free File Systems · 2008-06-30 05:01 · Score: 1

Personally I think that's an interesting defense. But does it wourk out so well at the moment for cases involving BitTorrent?

Re:Short answer: no on Fresh Air For Windows? · 2008-06-30 04:59 · Score: 1

Depends on which prediction :) Originally he said it would fail at the end of the 70s, ... then the 80s ... It is supposed to have failed quite a few times so far. But, yeah. It is hard to see how they can keep it up for more than another decade.

Re:Encryption on Brightnets are Owner Free File Systems · 2008-06-30 02:27 · Score: 5, Insightful

It's not a form of encryption, the purpose is not to hide the data but to share representations. The basic idea is let's say that I have files/blocks A,B,C. Instead of storing them directly I will compute shares that merge the information into a new set of blocks. None of the new set of blocks will contain copyrighted info - or if it does then who will own it because there are competing copyright claims. To get file A back out I need to take a selection of the shares and xor them together.

It's an interesting technical approach, but a classic FAIL. Geeks never understand the law, they assume that it is a mechanical system that can be gamed (well, because they're geeks). But no matter how the law it is written, it is interpreted by people. The first time that it was tried is court would be something like this:

Pros: Could you explain to the court what you uploaded to Brightnet?
Def: It was a non-linear combination of the xor of .... .... .... in several parts.
Pros: Did you upload Britney Spears - Chart Slag.mp3?
Def: No, that was never on my computer.
Pros: Did you upload something that allowed the mp3 to be constructed exactly?
Def: Yes
Pros: Copyright infringment through unauthorised distribution, the prosecution rests.
Def: WTF?

Re:Short answer: no on Fresh Air For Windows? · 2008-06-30 02:16 · Score: 1

What you say is true in theory, although this is an area where the theory and the practice don't line up exactly. Increasing the clock speed will increase performance across all programs (assuming that everything gets clocked up: processor, memory, interconnect...). Increasing the number of cores only helps parallelised software.

But the practice is interesting. Because we don't clock everything up uniformly (memory is hard, interconnect is harder still) increasing the processor speed doesn't help certain types of programs (i.e. streaming processes that are memory bound).

We've now reached the point where a single core is fast enough to do most everyday jobs by itself. The things that require more performance tend be parallelisable by their nature; they're replicating copies of streaming processes. If you throw enough cores at these problems then you can turn them into memory-bound problems.

Of course it isn't easy to do so, and the games industry has already cried that they can't use lots of cores. But that will pass relatively quickly as most of the hard work inside a game can be parallelised, and there is an install base to take advantage of software that does.

I'm quite suprised by how much software has already taken the low hanging fruit. On my dual-core laptop it is rare to see one core idle. Most (if not all) of the software that I run has sufficient threads with a big enough division of work that it distributes nicely. I was really suprised to see that Civ4 partitions onto the two cores really well as I don't think that was intentional design.

As you say it gets harder each time the new of cores doubles, but the software that needs the performance the most is the most parallelisable in the first place.

Re:Short answer: no on Fresh Air For Windows? · 2008-06-29 14:35 · Score: 5, Insightful

Erm, when did CPUs stop showing exponential growth in performance? Was that a memo that nobody sent to Intel?

Although clockspeeds are stuck because it is no longer economical to raise them, performance and transistor density are still scaling at the same rate. If anything we are in a period of performance increases that is slightly above trend, because now that the horrific NetBurst ISA has been killed off the Core2 replacement is rather lovely. Clock-for-clock it runs twice as fast as the old ISA because of shorter pipeline stages that have reduced instruction latency, and so far Intel have doubled the number of cores every 18 months. Given that they are ready to scale up to new fabs that can handle 2B transistors I would assume that they can continue to do so for the near future.

It would be a seismic shift for the industry if processor performance flatlined but I don't see that happening for a long time. What we are seeing with the introduction of the Eee Pc et al is actually a trend that has been going on for decades. Roughly every ten years a new form factor is introduced at the bottom of the market, with the same performance, but with the price halving each time.

So although your analysis of what changes are happening is way off, your final paragraph is quite accurate about what it means. The amount of performance that people actually require for most day-to-day tasks was exceeded when processors passed the Ghz mark. Now we are seeing cheaper and cheaper devices that deliver that (roughly) constant power. The effect on Microsoft is likely to be as you predict.

Re:Check out the Central Limit Theorem and be amaz on "Wisdom of Crowds" Works For Individuals Too · 2008-06-27 13:53 · Score: 1

Well spotted, and a very good explanation.

*bows*

Re:Ars Technia users fail math on Bell's Own Data Exposes P2P As a Red Herring · 2008-06-27 11:48 · Score: 1

Thankyou. It's nice to know that I'm still too rational for slashdot. I'll work on it some...

Re:In comparison... on Bell's Own Data Exposes P2P As a Red Herring · 2008-06-27 06:41 · Score: 1

It caught me as amusing enough that I had to point it out. No offense intended. I thought you'd get that from the canada comment (it's in your reply) :)

Re:In comparison... on Bell's Own Data Exposes P2P As a Red Herring · 2008-06-27 04:05 · Score: 1

The GP said:

In comparison, the tiny Netherlands with all that cheese and those cows seems to have a lot of consumer ISPs to choose from.

You replied:

You live in Europe, going by the names of those ISPs and the address you used. There's more competition there. The netherlands perhaps?

Seriously, is the education system where you live really that bad? Going by the names of those ISPs and the address you used. Canada perhaps?

Re:How funny on Bell's Own Data Exposes P2P As a Red Herring · 2008-06-27 03:58 · Score: 1

Not really. It's been a long time since I worked in telecoms so if my explanation is really far off I'm sure someone will jump in and call me a tool before correcting me :)

You have two types of resources where you will measure capacity. One type of resource is a simple point-to-point link. You want to run this resource as close to 100% as you can*. Your intuition is correct that not using the link is wasted capacity. * = modulo what I'm about to say.

But you can't make a very effective network out of point-to-point links, they need some form of routing between them. Where ever these traffic flows merge you need to arbitrate for access on the outgoing link. This may be some kind of ring network using ATM link like Sonnet (as I said it's been a long time) or a router.

In either case when you get a collision you have to make a decision what to do. Lets say my router has three links A, B and C. If 100% of links A and B is saturated with incoming traffic bound for link C then what can I do? Assuming the links have the same capacity then some of the traffic needs to be dropped. This is why delivery is not guaranteed at the low-levels.

I can use memory on the router to buffer some of the traffic so that if usage drops I can squeeze onto the outgoing link. But once that fixed amount of memory is used I have to drop packets. If I hold it in memory for two long then I may as well drop it anyway because it will be stale.

So when the transmitting end realizes that packets went missing it retransmits - which increases the usage on the link. To get the optimal amount of traffic through the network you need to leave enough slack to reduce the probability of collisions, and thus the amount of retransmission needed. Somebody who has more knowledge of queuing theory could probably explain why that leads to using 50%, but this is the basic handwaving behind why there is a tradeoff that needs to be optimised.

In order to reach the desired capacity at the routing points you need to reduce the capacity at the links which contradicts the intuition that they should run at 100%. If you could model the collision rate at the routers accurately enough then the whole thing would reduce to a flow equation and you could probably solve it using something like Kirchoff's law. Qos isn't my field but from what I've read the situation is far too complex to model like that, so you end up with engineering rules of thumb about how much capacity to use.

Re:Ars Technia users fail math on Bell's Own Data Exposes P2P As a Red Herring · 2008-06-27 03:42 · Score: 2, Interesting

But the Japanese don't live in a world where the files they are sharing scale up with the bandwidth. Just because Comcast bandwidth is 100x lower, doesn't mean that the Japanese are using some sort of super internet where every file has been bulked up by a factor of a 100. It's not teh beefcake-interweb.

These caps are equal to 30Gb a day. Yes, it's a small percentage of their burst speed, but that is because they have huge bandwidth on the edge of the network, and a much smaller ratio between the capacity of the backbone and the capacity of the leaf nodes.

As a comparison, if I lived close enough to my exchange to get perfect ADSL2 I could have a 2.6Mb uplink. If I saturated that link 24 hours a day then I could still only upload 28GB a day. Are you trying to tell me that the deal in Japan is worse because it's a smaller percentage of the peak rate?

Re:In related news... on "Wisdom of Crowds" Works For Individuals Too · 2008-06-27 03:22 · Score: 1

Sorry I was a bit unclear with the way that I phrased it. I did mean the literal comparison between a "Guassian distribution about the correct answer" and just a uniform distribution i.e. not about the correct answer. Of course you are correct that any symmetric distribution would work as long as it was was centered around the correct solution.

It is with an image of Ferris Bueller playing the sax(?) that I say: and without a single stats class ever :)

Re:In related news... on "Wisdom of Crowds" Works For Individuals Too · 2008-06-27 03:11 · Score: 1

That's an interesting idea. It would be a good follow up to measure the short-term attention span of the subjects and see if there was a correlation between memory and accuracy.

In general though, doesn't short-term memory decrease with age? Or don't you remember what we were talking about :)

Re:In related news... on "Wisdom of Crowds" Works For Individuals Too · 2008-06-27 02:14 · Score: 4, Insightful

Not quite... but you are close. It sounds like you're pointing out that anyone will get lucky if given enough chances. These guys are claiming that the average will converge to the ground truth over time. This would need to have guesses with some Gaussian distribution about the correct answer.

If the guesses were uniformly distributed then the average wouldn't tend to the correct answer over time. Of course what is described in the summary has nothing to do with the wisdom of crowds as it is commonly thought of (i.e in markets) where shared information is vital. Instead it is simply an artifact of sampling (which is why the longer gaps are necessary for better accuracy)

Re:I feel dirty on NASA Tests Hypersonic Blackswift · 2008-06-27 00:33 · Score: 1

They'll never even make it to the Louvre!

Re:wrong wrong wrong on The World's Nine Largest Science Projects · 2008-06-27 00:14 · Score: 1

Sounds like you've made a big difference already. As you point out heating is likely to be your largest usage at the moment. I know from experience that electric heating sucks. It is very uncommon in the uk (outside of rented student accommodation) because people prefer gas. We found it to be expensive, lack the heat control of gas (takes a long time to heat up or cool down) and it never actually got the flat to a warm level.

Re:wrong wrong wrong on The World's Nine Largest Science Projects · 2008-06-26 11:34 · Score: 1

No, same as the Dutch example that I was comparing to we use gas. We did live in a horrific flat previously that had economy 7 storage heaters. They doubled our usage to about 3500-4000 kWh a year.

Re:Slashdot can finally be what it wants on ICANN Board Approves Wide Expansion of TLDs · 2008-06-26 07:23 · Score: 1

Oblig: http:///./

Slashdot Mirror

User: smallfries

Comments · 2,506