You need to examine why you actually need the data, and if you can't think of a good reason (except it might be valuable in the future), then don't store it. If you do need it for analysis, machine learning apps, etc, try to anonymize it as early as possible, and not to keep raw data longer than you need it. (say raw data for 3 months, then just store aggregate info).
also.. for behavior.. you don't need years of information, studies have shown people change, so make sure the things people do recently are more important, and the old stuff gradually decays.
It was also the sampling frequency they used. others use a higher sampling frequency for better results.
as for FLAC/SHN as soon as mp3 player's can read them and play them back I'm sure people will start serving them, but for now they are a very niche market
check out the html source for most of the download-detail pages on download.com. half way through you'll see a comment... 'rotter rocks my world' http://download.com.com/3000-2150-10119668 .html
their used to be one a 'leo loves you' on all cnet pages.. but that got removed a while ago
I just surfed over there...
they seem to keep track of if you have clicked an ad or not.
once you have clicked an ad, you don't get the jumpthrus anymore.
if this is what it takes to keep the site alive
then I'm all for it
The following is a graph of the response time of OZEmails front door measured from ~30 different places around the world.
OZEmail's Response Time
Thought people may be interested
Gartner makes hundreds of these type of reports, and the people writing them are different, so if you get a pro-microsoft analyst, than you'll see that viewpoint come across.
I've read enough gartner reports to know that most of what they say
a. is commonsense
b. too general to be of any use to anyone (of course you can actually hire the analyst who wrote the report for big $$$$ to get a more specific report for your company/industry)
c. tied up in complex diagrams
Working for a large web site, I know that:
a) web logs get audited
b) each line in the log gets a magic number added to it, so auditors can check to see if the line is faked, or if lines are missing/added to it
why shouldn't they be able to do this? who said that the internet has to the same speed for everyone. Certain people pay top $$$ per month to their ISP for high speed connections, while other people are willing to pay nickels and dimes and get shitty service. I think it is a good idea, and a good revune stream for them
This was mentioned in the JavaOne conference yestereday. They mentioned that Rich Pettit was busily porting it to linux. it seems like a very usefull tool http://www.sun.com/sun-on-net/performance/se3/
If your firm had the $$, and most do then Solaris 2.x would be a superior choice for a LARGE system. you will find a lot more people with solaris sysadmin experience with large solaris systems.
On the other hand if the database is a small one (1G) than a linux machine would be sufficent.
Just think why you choose linux instead of solaris in the first place. is it because linux is 'cool' and flavor of the month ? or is it for cost reasons ?
Try not to pick something just because it will be 'fun' and 'cool' to say you are working on it.
Another question you should ask is what other machines the site has expierence with, because you will probably costing them more on support than what you would save implementing on linux.
I am not saying that linux is bad, just make sure you choose it for the right reasons
honestly... try not to store it.
You need to examine why you actually need the data, and if you can't think of a good reason (except it might be valuable in the future), then don't store it.
If you do need it for analysis, machine learning apps, etc, try to anonymize it as early as possible, and not to keep raw data longer than you need it. (say raw data for 3 months, then just store aggregate info).
also.. for behavior.. you don't need years of information, studies have shown people change, so make sure the things people do recently are more important, and the old stuff gradually decays.
you make me feel so young!
yeah.. well done Paul
Australia is not 373t3 enough to have warez people?
sob..
and here I was about to go and saw how technologically advanced we are
It was also the sampling frequency they used. others use a higher sampling frequency for better results.
as for FLAC/SHN as soon as mp3 player's can read them and play them back I'm sure people will start serving them, but for now they are a very niche market
or out of jobs completely.
MS brought computers to peoples desktops.
without DOS & Windows we would probably still be on green screens.
this is a Covalent thing not a apache thing.
you will have to pay $$$ for this
for medium-advanced unix things builder.com is great. they recently had a article about how sendfile works, and the differences of select and poll
http://builder.com.com
check out the html source for most of the download-detail pages on download.com.8 .html
.. but that got removed a while ago
half way through you'll see a comment... 'rotter rocks my world'
http://download.com.com/3000-2150-1011966
their used to be one a 'leo loves you' on all cnet pages
Yeah.. ISS Learnt from their F-Up on the apache hole they pre-announced a week ago.
I just surfed over there... they seem to keep track of if you have clicked an ad or not. once you have clicked an ad, you don't get the jumpthrus anymore. if this is what it takes to keep the site alive then I'm all for it
The following is a graph of the response time of OZEmails front door measured from ~30 different places around the world. OZEmail's Response Time Thought people may be interested
Gartner makes hundreds of these type of reports, and the people writing them are different, so if you get a pro-microsoft analyst, than you'll see that viewpoint come across.
I've read enough gartner reports to know that most of what they say
a. is commonsense
b. too general to be of any use to anyone (of course you can actually hire the analyst who wrote the report for big $$$$ to get a more specific report for your company/industry)
c. tied up in complex diagrams
Sega will now be a software company, and could write S/W for all consoles including indremeda...
Working for a large web site, I know that: a) web logs get audited b) each line in the log gets a magic number added to it, so auditors can check to see if the line is faked, or if lines are missing/added to it
why shouldn't they be able to do this? who said that the internet has to the same speed for everyone. Certain people pay top $$$ per month to their ISP for high speed connections, while other people are willing to pay nickels and dimes and get shitty service. I think it is a good idea, and a good revune stream for them
SF.net provides excelent hosting services for anyone who needs them.
They are responsive, and very willing to accomadate most projects.
I've had a project running for ~3 months now which could not have been started without sf.net
Thanks SF.net guys...
Are you sure.... the ISO is dated March 9th
Sell it to them for 500K or some absurd number and donate that to drug rehab (and register coke-addicts.ch) to do your stuff on
akamai does something simmilar with thier image caching ... check out yahoo's front page and the images on it.
So much for the apology... I still get the refused page. maybe if they had more linux people on their staff they may now how to fix it.
Most of the stuff that I see developed in for the
web, where swing is not used.
does anyone develop swing-based apps?
every customer we go to says web based HTML, no applets.
I have seen him running around SF since I have got here. Does anybody know his story... That would be interesting... ..Another Laid Back Aussie ;-)
This was mentioned in the JavaOne conference yestereday. They mentioned that Rich Pettit was busily porting it to linux. it seems like a very usefull tool http://www.sun.com/sun-on-net/performance/se3/
If your firm had the $$, and most do then Solaris 2.x would be a superior choice for a LARGE system.
you will find a lot more people with solaris sysadmin experience with large solaris systems.
On the other hand if the database is a small one (1G) than a linux machine would be sufficent.
Just think why you choose linux instead of solaris in the first place. is it because linux is 'cool' and flavor of the month ? or is it for cost reasons ?
Try not to pick something just because it will be 'fun' and 'cool' to say you are working on it.
Another question you should ask is what other machines the site has expierence with, because you will probably costing them more on support than what you would save implementing on linux.
I am not saying that linux is bad, just make sure you choose it for the right reasons