look at those small files in/lib - they're symlinks to larger files
The command line that I ran dereferenced symlinks (du -L -b -a), as I've previously mentioned. Due to the command I ran, the small files in/lib are either files or directories. If they are directories, then the number is the total size of the files within the directories, so the files within the directories are no larger than that.
Following your prompting, my previous posts have looked a number of locations on my computer(s) and in all cases found a substantial proportion of small files. I don't claim to be a normal user, but suggest that based on my evidence, your analysis and interpretation of results may not be statistically sound.
This reports directory sizes as the size of the containing files, which will skew to larger than actual sizes. Despite this, in/lib, I have 51% of my files with size 4kiB or less (median file size 3776 bytes, mean 76875.4 bytes). This is probably due to the linux kernel tree being in there on that computer. So I'll try the eee PC that I have (stripped down to a pretty minimal system):
So now we get the number of files less than 4096 bytes as 9.5% of the total files. Quite different from my other desktop, but I'll still stick with my statement that the frequency of small files on my computer(s) is not insignificant -- even when looking at/lib, there are still a reasonable amount of files with size less than 4kiB.
A path that follows a circle turned out to be a whopping 25 percent faster.
That's a pretty big performance boost. It'd need to get to 33% faster to turn a 3rd base run into a home run every time, but there may be times when 25% is all you need.
If you're talking about the average function in Excel/Calc, then that's the arithmetic mean, which is not useful for explaining how many of your files are under a particular file size (as I mentioned previously). To reiterate an often-mentioned issue with the mean, in the case where you have a small number of really large files (e.g. ISOs, DVD rips), the mean will be affected to a large degree.
So you're not so happy about the home directory usage because it's an "exception", let's try lsof (the currently open files on my computer, lsof -s -b -F ns0 > usedfiles.txt, analysed using R). Here are some statistics:
mean file size: 456807 bytes (~450kiB) SD of file size: 2551370 bytes (i.e. ~2MiB!) median file size: 56536 bytes (~50kiB)
The mean and median, in this case, are quite different, and suggest a substantial skew towards low-size files. So 50% of the files currently in use on my computer are more than 50kiB. Hence it is likely that "most" of the files are over 4kiB. I can verify this with counts:
number of open files with file size > 4kiB: 2972 number of open files with file size = 4kiB: 475 (13.8%)
13% is less than 50%, granted, but it's not insignificant. Your comment was "Almost NO file on your file system is under 4k in size", and again I suggest that at least on my computer, this statement is incorrect.
my home directory, the average file size is 19,065,740 bytes
Given that you said "average", I presume you mean mean, which is not a good indicator of the most frequently present file. Median would be better, if you want to say "50% of my files are under this size".
Well, I was going to whisper into the cacophony, "can we please assert that SSDs are also HDDs?" Then, just before writing this out, I expanded the acronyms and realised that "solid state drives" are not "hard disc drives". No doubt this will not be realised by most consumers -- I talk about bad computer memory and they get confused, or ask me if the files were backed up; another common confusion is hard drive == case + motherboard.
No, small random reads are NOT the primary pattern in desktop usage. Almost NO file on your file system is under 4k in size, which is the "chunk" size for most 8mb to 64mb hd caches.
I differ in that respect. Not sure if my use is typical, but here's a dump of the counts for the smallest file sizes in my home directory: ~$ du.* --apparent-size -a 2>/dev/null | awk '{print $1}' | sort | uniq -c | sort -n -k 2,2 | head -n 10
40006 1
11237 2
6862 3
4831 4
3554 5
2964 6
2783 24
2619 7
2477 8
2229 22
In other words, the highest frequency file size is 1kB (blocks are 1kB in my version of du), next highest 2kB, and so on. I get an odd jump at 24kB and 22kb (and FWIW 0kB comes in at #18), but in general the smaller a file is, the more frequent it is.
I admit I'm impressed if you actually make that work in the UK. If it does work just like that for you, then sure, no reason to change it. Sadly, experience has shown it's not nearly so well done in the US
Seems to work in New Zealand as well. When I was a polling officer, it was just one person from one party who was there, but they were sitting next to our table recording the names (actually page/row numbers) of the people who voted. No talking to them -- that could get the scrutineers kicked out.
It's a bit trickier with postal ballots (and I'm not quite sure how they're scrutinised). However, we've just had a bit of an upset in Wellington with the underdog green candidate ending up as mayor because too many preferred her over the current mayor (our mayor is voted under an STV system). The difference was some small number of votes (fewer than 500, I think), but the incumbent team doesn't seem to be crying foul over the election itself.
Its up to ten million and it hasn't found any more one digit UIDs, just the first ten.
Have you checked to make sure that there aren't any in the vast space between two whole numbers? That sounds like it could be quite a complicated exercise.
If I get a 5mm screw from "Scott's screws" and decide to one day switch to "Sam's Screws" I don't have to worry about retraining staff for how to use them
Yeah, but what if "Scott's screws" had a little kink just near the head that gave the screw a little extra bite, so your staff got used to tightening those screws a little less than most other screws?
That's not quite how our immune system works, but I agree with the idea.
I consider the whitelist to be equivalent to the process of selection against autoimmune antibodies, mentioned at the end of this section. B cells won't ordinarily progress through to maturation if they generate antibodies with affinity for self signatures.
If you want to model how our body recognises and deals with disease, you need to concentrate on whitelists, rather than blacklists. Vaccinations are similar to a community blacklist, but for most pathogens our own immune system can work out what things are appropriate to reject.
Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation)
The introduction of the paper states that "80% of the variation [for height] within a given population is estimated to be attributable to additive genetic factors, but over 40 previously published variants explain less than 5% of the variance." While this paper pushes that to 16%, it's nowhere near the limit of what can be detected.
I find it interesting that they've got a sample size of around 100,000 individuals for this study (actually a meta-analysis of summary statistics from 46 GWAS of 133,653 individuals), but still claim a need for more individuals. I suspect that'll still be said when a study is done on 10 million individuals, or a billion.
a utility-maximising foray into language improvement optimisation techniques to obviate the degredation of A) core procedural goals and B)reconstruction of enlightened creative thought processes
Doesn't OLPC XO-1 use 802.11s for ad-hoc/mesh networking?
look at those small files in /lib - they're symlinks to larger files
The command line that I ran dereferenced symlinks (du -L -b -a), as I've previously mentioned. Due to the command I ran, the small files in /lib are either files or directories. If they are directories, then the number is the total size of the files within the directories, so the files within the directories are no larger than that.
Following your prompting, my previous posts have looked a number of locations on my computer(s) and in all cases found a substantial proportion of small files. I don't claim to be a normal user, but suggest that based on my evidence, your analysis and interpretation of results may not be statistically sound.
Then again, go into any user's desktop directory ... most have LOTS of big files there.
Desktop? really? Okay: ./Konsole.desktop ./Home.desktop ./Braid.lnk ./.directory ./trash.desktop
$ du -L -b -a
318
4508
659
73
197
5963 .
They look like pretty small files to me.
Or do like I did -go look in /lib, where most of your programs actually live. The only files at 4k or under are symlinks and directory entries
Fine, if you want:
du -L -b -a /lib | awk '{print $1}' > ~/libfiles.txt
[-L: dereference symlinks, -b: apparent size in bytes, -a: all files]
[analysis using R]:
> a <- read.table("libfiles.txt")
> mean(a$V1)
[1] 76875.4
> sd(a$V1)
[1] 2258044
> median(a$V1)
[1] 3776
> sum(a$V1 <= 4096)
[1] 8875
> sum(a$V1 > 4096)
[1] 8428
> 8875 / (8875 + 8428)
[1] 0.5129168
This reports directory sizes as the size of the containing files, which will skew to larger than actual sizes. Despite this, in /lib, I have 51% of my files with size 4kiB or less (median file size 3776 bytes, mean 76875.4 bytes). This is probably due to the linux kernel tree being in there on that computer. So I'll try the eee PC that I have (stripped down to a pretty minimal system):
> b <- read.table("libfiles.txt")
> mean(b$V1)
[1] 155920.5
> sd(b$V1)
[1] 2595565
> median(b$V1)
[1] 13651.5
> sum(b$V1 <= 4096)
[1] 350
> sum(b$V1 > 4096)
[1] 3318
> 350 / (350+3318)
[1] 0.09541985
So now we get the number of files less than 4096 bytes as 9.5% of the total files. Quite different from my other desktop, but I'll still stick with my statement that the frequency of small files on my computer(s) is not insignificant -- even when looking at /lib, there are still a reasonable amount of files with size less than 4kiB.
From the article:
A path that follows a circle turned out to be a whopping 25 percent faster.
That's a pretty big performance boost. It'd need to get to 33% faster to turn a 3rd base run into a home run every time, but there may be times when 25% is all you need.
No - I said average because I meant average.
Sure, but which average did you mean?
If you're talking about the average function in Excel/Calc, then that's the arithmetic mean, which is not useful for explaining how many of your files are under a particular file size (as I mentioned previously). To reiterate an often-mentioned issue with the mean, in the case where you have a small number of really large files (e.g. ISOs, DVD rips), the mean will be affected to a large degree.
So you're not so happy about the home directory usage because it's an "exception", let's try lsof (the currently open files on my computer, lsof -s -b -F ns0 > usedfiles.txt, analysed using R). Here are some statistics:
mean file size: 456807 bytes (~450kiB)
SD of file size: 2551370 bytes (i.e. ~2MiB!)
median file size: 56536 bytes (~50kiB)
The mean and median, in this case, are quite different, and suggest a substantial skew towards low-size files. So 50% of the files currently in use on my computer are more than 50kiB. Hence it is likely that "most" of the files are over 4kiB. I can verify this with counts:
number of open files with file size > 4kiB: 2972
number of open files with file size = 4kiB: 475 (13.8%)
13% is less than 50%, granted, but it's not insignificant. Your comment was "Almost NO file on your file system is under 4k in size", and again I suggest that at least on my computer, this statement is incorrect.
my home directory, the average file size is 19,065,740 bytes
Given that you said "average", I presume you mean mean, which is not a good indicator of the most frequently present file. Median would be better, if you want to say "50% of my files are under this size".
... and then I re-read the summary to see this confusing statement:
...may have the clout to shift the market away from hard drives, even if they're still an order of magnitude cheaper
SSD drives are "hard drives". Arguably, they're harder than HDDs because they can have less air in them (required for moving parts to move).
Well, I was going to whisper into the cacophony, "can we please assert that SSDs are also HDDs?" Then, just before writing this out, I expanded the acronyms and realised that "solid state drives" are not "hard disc drives". No doubt this will not be realised by most consumers -- I talk about bad computer memory and they get confused, or ask me if the files were backed up; another common confusion is hard drive == case + motherboard.
That evokes an interesting question: does a person lose if they hint at the famous dictator, but don't mention him specifically?
No, small random reads are NOT the primary pattern in desktop usage. Almost NO file on your file system is under 4k in size, which is the "chunk" size for most 8mb to 64mb hd caches.
I differ in that respect. Not sure if my use is typical, but here's a dump of the counts for the smallest file sizes in my home directory:
.* --apparent-size -a 2>/dev/null | awk '{print $1}' | sort | uniq -c | sort -n -k 2,2 | head -n 10
~$ du
40006 1
11237 2
6862 3
4831 4
3554 5
2964 6
2783 24
2619 7
2477 8
2229 22
In other words, the highest frequency file size is 1kB (blocks are 1kB in my version of du), next highest 2kB, and so on. I get an odd jump at 24kB and 22kb (and FWIW 0kB comes in at #18), but in general the smaller a file is, the more frequent it is.
I admit I'm impressed if you actually make that work in the UK. If it does work just like that for you, then sure, no reason to change it. Sadly, experience has shown it's not nearly so well done in the US
Seems to work in New Zealand as well. When I was a polling officer, it was just one person from one party who was there, but they were sitting next to our table recording the names (actually page/row numbers) of the people who voted. No talking to them -- that could get the scrutineers kicked out.
It's a bit trickier with postal ballots (and I'm not quite sure how they're scrutinised). However, we've just had a bit of an upset in Wellington with the underdog green candidate ending up as mayor because too many preferred her over the current mayor (our mayor is voted under an STV system). The difference was some small number of votes (fewer than 500, I think), but the incumbent team doesn't seem to be crying foul over the election itself.
Oh, and take a snapshot when (if) it gets to +5 troll. If enough people did that it might be believable -- surely /. wouldn't commit voter fraud.
Don't you mean 1/4 every month? Remember! Always simplify your fractions.
It's 1 /7 every month. Computer maths is a little stranger than normal maths.
Its up to ten million and it hasn't found any more one digit UIDs, just the first ten.
Have you checked to make sure that there aren't any in the vast space between two whole numbers? That sounds like it could be quite a complicated exercise.
If I get a 5mm screw from "Scott's screws" and decide to one day switch to "Sam's Screws" I don't have to worry about retraining staff for how to use them
Yeah, but what if "Scott's screws" had a little kink just near the head that gave the screw a little extra bite, so your staff got used to tightening those screws a little less than most other screws?
That's not quite how our immune system works, but I agree with the idea.
I consider the whitelist to be equivalent to the process of selection against autoimmune antibodies, mentioned at the end of this section. B cells won't ordinarily progress through to maturation if they generate antibodies with affinity for self signatures.
If you want to model how our body recognises and deals with disease, you need to concentrate on whitelists, rather than blacklists. Vaccinations are similar to a community blacklist, but for most pathogens our own immune system can work out what things are appropriate to reject.
No where was it mentioned about creating one. Ever.... actually read the summary ffs.
I think you may have missed this part of the summary:
do I try to write one my self
Why? because for small files (as I expect most software updates would be), downloading directly is quicker and safer.
We will be a mystery to archaeologists of the future.
You people from the future are a mystery to us here in New Zealand. 9/11 hasn't happened for us yet.
Took me a bit of time to find, but here's the link to the actual research paper (requires nature subscription):
http://www.nature.com/nature/journal/vaop/ncurrent/full/nature09410.html
From the abstract:
Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation)
The introduction of the paper states that "80% of the variation [for height] within a given population is estimated to be attributable to additive genetic factors, but over 40 previously published variants explain less than 5% of the variance." While this paper pushes that to 16%, it's nowhere near the limit of what can be detected.
I find it interesting that they've got a sample size of around 100,000 individuals for this study (actually a meta-analysis of summary statistics from 46 GWAS of 133,653 individuals), but still claim a need for more individuals. I suspect that'll still be said when a study is done on 10 million individuals, or a billion.
Don't worry, a couple of minutes with the HDCP Encryption/Decryption Code, and everyone will be able to see it again.
a utility-maximising foray into language improvement optimisation techniques to obviate the degredation of A) core procedural goals and B)reconstruction of enlightened creative thought processes
FTFY
It's a good thing they just used onmouseover rather than onload. That would have been quite a chaotic mess.
You can open the login link in a new tab (or window, if that gets your fancy). Then when you preview/submit, you'll be logged in.