the+frizz · Slashdot Mirror

Unique vistiors from web logs - Detailed anaylsis on Hits or Misses: Who is Your Website's Audience? · 2004-06-21 06:22 · Score: 2, Informative

It was my task to develop log anaylsis software to count visitors for large web sites. I was not only surprised to find out how inaccurate the art was, but also had difficulty in convincing other web-experienced collegues on how impossible it was. "All the other web analysis programs display number of visitors" they said. Well they all make guesses is the answer. The current best practice is to count unique login names, but most sites don't use authenicated logins and even then you can have many hotmail accounts.Here's the disclaimer I eventually wrote for my sites unique visitor stats.

The number of visitors displayed does not accurately represent the number of actual people visiting your site. Many people can appear as a single IP address by sharing proxies, caches, NAT firewalls or even simply sharing the same computer at home. One person can also appear as many IP addresses by using dynamic IP addressing (most dialup and PPPoE users), being load balanced across proxies and caches or simply using multiple computers (e.g., at home and at work.)

Other reasons for overcounting include: robots; rogue client software that keeps changing its ID string; users that delete cookies, upgrade software or use multiple client software agents.
Other reasons for undercounting are clients that don't (or have been set not to) accept cookies or operate through anonymizers. If authenticated logins are used, determining the number of real people from server-side logs may be best derived from a cookie which is only set after authenticated login that only holds a value which uniquely corresponds to the user (e.g., an user name or account number).
When the statistics for more than one day is selected, the peak daily number of visitors is displayed rather than a sum of the daily visitors.

Since the above was written I discovered a common practice of sysadmins and help desks is the suggesting manually deleting all cookies (since you can't do it selectively with MS-IE) to get over site bugs. And now the increasing popular spyware removal tools (E.g., spybot) remove 3rd party cookies used just to count unique visitors in the name of removing sypware and viruses from your computer.

Originally I thought of defining a visitor for HTTP domains as the cookie if it exists, and the client IP address otherwise. But the flaw in this is that it will double count first time HTTP visitors. Once for the log line of their first hit with no cookie. And again for the subsequent hit. With streaming logs, using the GUID (effectively a cookie these days) and the client IP address is more useful as a unique visitor. The log lines in streaming are actually the summary of a sequence or request/reply transacations and so the first "hit" log line does have a GUID/cookie logged.

What follows is addition research I turned up:

ABCi (a web traffic auditor)
says: `` A visitor is defined as "a unique IP address with heuristic." To properly account for visits, the Web site needs to identify a "visitor" so that visitor activity is properly tracked. Registration and/or cookies are the best way to track a visitor's activity through the Web site. Unfortunately, a lot of Web sites do not require registration, nor do they use cookies [and browsers can disable cookies] If cookies are used, it is the clients' responsibility to provide the auditor with details on how the server sets the cookie, the cookie format and how the cookies are used. An alternative that has been suggested is to use the IP address AND user-agent in combination, to identify a unique visitor. The interaction with the site by this "visitor" is then analyzed to determine the number of visits which should be recorded. Using only the IP address to identify a visitor is not acceptable due to the number of visitors that may not be accurately reported because they are operating behind a proxy server or firewall. ''

Re:Time to check out other providers. on Akamai DNS Outage Messes up Net · 2004-06-15 07:10 · Score: 1

Actually last month's flop was not a DNS issue, but it did effectively shutdown websites also for one and a half hours.

And if we're pushing dual sourcing don't overlook Speedera. You can even see today's outage on the Speedrank performance page. BTW, C & W USA declared chapter 11 a while ago and got aquired by Savvis. Seems like they change company name every couple of years.

Re:Apple down, Microsoft up on Akamai Having Problems? · 2004-05-24 07:11 · Score: 2, Informative

See the Speedrank index for the affects this has had on 100 popular web sites.

Disclaimer. I work for Speedera, an Akamai competitor.

POSIX Reference on Advanced Unix Programming, 2nd Ed. · 2004-04-29 09:26 · Score: 5, Informative

AUP really is a classic. I may buy it just for sentimental reasons, even though I don't need the tutorial introducton to Unix anymore.

Nowdays though, my definitive reference for writing portable unix programs is the merged IEEE POSIX and Open groups's Single Unix Specification. Registration is free.

Re:Not really NEW technology..... on A Black Box for People · 2004-04-08 04:24 · Score: 1

Agree that this is not new to the world of cardiology. 10 years ago the pacemakers already logged heart rates, breath rate and volume, motion detecion (running vs walking) all smaller than a box of matches and the battery lasted for 5 years with no recharging. I wouldn't be suprised to see this device become much smaller.

An interesting sensor to add would be a GPS receiver. Some prisoners are released into public with GPS ankle bracelets with this kind of technology. There are plenty of non-nefarious spying uses for this too. A GPS combined with the CPOD could be used by athletes to track their performance.

And I agree there are many reasons why you wouldn't want to permantly wear one. When I was working for a pacemaker company 10 years ago we were developing a new kind of motion sensor to detect the running vs. walking. I strapped on the prototype for a day to collect some data. That night my wife refused to have sex with me, because she didn't want anyone at the office to annotate the trace with "they did it here" and stick it up on the cubicle wall.

Re:Can we set up a competition? Can it be measured on Google Traffic Takes Down Web Site · 2004-02-04 15:24 · Score: 2, Informative

"A NASA guy [That was me, but I don't work for NASA directly, but for Speedera who delivers their traffic] says ... Slashdot was a drop in the bucket compared to links from mainstream news web sites". I said it here. The Slashdot load depends on the size of the objects downloaded of course, but a reasonable generalization is that the traffic from a top 10 portal is about five to ten times higher.

Re:NASA isn't concerned with being slashdotted the on Mars Landers - Opportunity, Bedrock, Aerosmith? · 2004-01-28 13:25 · Score: 5, Informative

No they're not.

I work at Speedera who is delivering their content and NASA TV. At 6pm EST when slashdot posted this story the traffic increased only about 100Mbps. Articles posted on AOL, MSN and Yahoo home pages increase the traffic much more. The NASA TV live stream when Opportunity landed was 4 Gbps. There are lots of other sources that are bigger than the slashdot effect.

See the press release for more details on the traffic and our SpeedRank index for historical performance and availabilty of NASA's site.

Re:Freedom? on Windows Services For Unix Now Free Of Charge · 2004-01-15 12:46 · Score: 1

Let me explain with an example symlink (which works fine on unixen):

/mnt/server1/symlink -> /mnt/server2/target

where server1 and server2 dirs are nfs mounted. I want the equivalent thing (a symlink on one remote file system targetted to another remote filesystem) to work on windows.

Re:Freedom? on Windows Services For Unix Now Free Of Charge · 2004-01-14 09:05 · Score: 4, Informative

I too liked the fact that SFU has more access to the Windows core. E.g., some per process stuff can be seen via ps and /proc, The cmd.exe shell executes many of the utilities. But still not enough for me to switch kick cygwin off my system. The cygwin bash shell default setup beats ksh.

Here's some features that would have excited me, but I didn't find in SFU.

I was hoping to be able to truss(1) the native windows executeables, but I didn't have any luck with that.
A list of file descriptors in use under /proc/PID/fd/...
The SFU NFS client did follow symlinks when the target was on the same device, but it didn't seem to follow a symlink to another device. I tried making targets of c:\temp and \\host\share, but even though Windows Explorer could see the target directly, when Windows Explorer browses the remote NFS Network the the symlink target did not resolve. (A trace shows the NFS server returning the right target name to the SFU NFS client.)

Re:NFS client for win! (summary) on Windows Services For Unix Now Free Of Charge · 2004-01-14 08:43 · Score: 2, Informative

Microsoft has had this PC-NFS client out for a while now. I see knowledge base article 324084 was last updated on 6/6/2003 and my MSDN Aug 2002 Unix for Windows Services 3.0 CD included this too.

And seems like cheap options have long been available DOS/Windows NFS clients for a long time. In 1994, this summary mentions XFS (shareware NFS client from Germany, not the SGI filesystem) TSoft and Sun's PC-NFS.

Nowdays you also have at least these option, and you are right, many are not cheap.

HummingBird $300 My past impressions were always of good quality and features.
Reflection $88 I know this name.
ProNFS $40 (shareware?)
DiskAccess $179
SuperNFS $160 Found with google.

I only heard of the first two. The rest found with Goggle.

GPL is present, along with sources. on Windows Services For Unix Now Free Of Charge · 2004-01-14 07:17 · Score: 1

condition-label-red wrote:

One is licensed under GPL, and the other isn't....

Actually the August 2002 20.1 MSDN Unix Services for Windows 3.0 CD I used does contain the GPL. And in section 1.e of the Microsoft EULA it says:

Component Products. The Product includes certain components licensed to Microsoft from third parties (each, a "Component Product"). A Component Product may contain its own license agreement and/or copyright notice (each, a "Component Agreement"). The Component Agreements are located on the Product media at \PUBS\CPYRIGHT.TXT and \PUBS\GPL.TXT. In the event of inconsistencies between this EULA and any Component Agreement, the terms of the Component Agreement shall control solely with respect to that Component Product

And on this CD I also see the sources for all the GNU software I checked.

NAT firewalls are a huge factor and a problem on Dispelling the IPv4 Address Shortage Myth · 2003-11-04 05:39 · Score: 1

And here's a comprehsive list of Things that NATs break

My pet peeve is not being able to use NetMeeting without a server in the middle when both ends are behind a NAT. This happens all the time from one work place to another work place. Doesn't the same problem affect all p2p applications?

Re:actually, Telstra broke their mail software on Spam Slows Australian Net Traffic · 2003-10-13 13:19 · Score: 1

This article gets quotes from all three big Australian ISPs to show that the slowdowns were due to four different reasons.

BigPond - Software upgrade affects some user names, plus an unrelated email software fault.
OptusNet - incomming spam
OzEmail - DOS attack on its SMTP (outgoing) mail server.

Non-transparent proxies are coming on Is Comcast Intercepting Packets? · 2002-02-11 18:08 · Score: 2, Informative

While comcast and other ISPs may be running a transparent proxy, note that non-transparent proxies are coming. The Open Pluggable Edge Services (OPES) group is working on standard framework for non-transparent proxies.

Personally I approve of this because it will allow for a more efficient operation of many useful web services like content filtering, virus checking and ad stripping. An important part of this work will also be define a standard way for conforming OPES software to only invoke edge services after authorization from end-users and/or content providers.

Worlds Tallest Christmas Tree on Christmas is Coming · 2001-12-21 10:01 · Score: 1

The Tasmanian Wilderness Society has decorated a 80 metres (262 ft) tall Eucalyptus tree as a way of attracting attention to the plight of their tall native forests.

And here's link more likely to survive the slashdot effect.

Is eMFORCE part of the problem or solution? on Crazy Stats on Spam · 2001-12-19 07:18 · Score: 1

I notice that eMFORCE's main business seems to be in sending "targeted" email Some quotes from their service plan are below with my comments

Capable of sending differentiated and personalized 3,000,000 emails to each customers based on transactions, preferences, and demographics data within one hour by effective targeting tools and a high-tech assembling solution.
Score 1 part of the problem.
Maintaining appropriate email sending speed and considering effective speed with stability. Utilization of perfect gradual email sending, considering spam regulations of the email service provider.
Is this in order to be nice or avoid tripping the automated spam detectors? The later I think.
Analysis of the co-relationships between targeting variables and reaction rates. Based on analysis and customer scoring algorithms, such as Recency, Frequency, and Monetary Value (RFM), segment and score their customer base.
I think I'm prepared to give up some privacy for fewer and better targeted ads, but am skeptical of ever seeing less spam. In the ideal world, the only spam I get, will just be news and ads for stuff that interests me but I didn't know existed. Unfortunately, given the cost structures and unenforcibilty of global regulations on the net being the way they are, I don't see less spam becoming a reality.

Resource limits are needed by hosting companies on One-Machine Linux Cluster · 2001-11-06 19:58 · Score: 3, Insightful

My particular interest was to find virtual hosting solutions that would (1) not allow one runaway virtual server to deny the others of at least a predefined minimum level of CPU, RAM and I/O (disk and network) resources and (2) give any one virtual server extra resources if they were available. From my reading of other slashdotter's posting and the info on the web I've summarized below the various virtual server hosting solutions mentioned. Someone who actually has used these products should actually correct me.

Linux can natively be configured to enforce disk quotas and (with more difficulty) manage network bandwidth without any special virtual server software. Also the native unix process scheduling algorithm does reduce the priority of CPU bound tasks. The getrlimit(2) system call can be used to set various limits per process (not per virtual server unless the virtual server runs as one process I guess.) I know of no way to specifically limit disk bandwidth on Linux.

Freeware such as s_context and user mode linux provide no control over how much resources one virtual server gets over another besides disk usage. Other limited resources like CPU, disk and network bandwidth (RAM?) are shared just like they would be shared by separate processes under a single Linux system.

FreeVSD is not a virtual server, but a collection of scripts, binaries and multiple copies of hard-linked read-only filesystems for the common system environment. It is has the best chance for winning the total performance award but has no extra features for resource limits between systems.

True virtual machines. (E.g., vmware) provide very good isolation, but this leads to little sharing of excess unused resources between virtual servers I believe. They also have poorer performance in general because so much emulation is done.

The commercial, proprietary Private Server product from Ensim seems good from the marketing blurbs which say that they have "their own guaranteed share of the servers resources, including CPU, memory and bandwidth". I wonder what the performance penalty for this is and how much does it cost? Can anyone comment?

Re:Duh...Ever heard of Akamai? on Net: Now Our Most Serious News Medium? · 2001-10-11 07:09 · Score: 2, Informative

leviramsey writes:

Yeah, but afaik, akamai doesn't cache the actual html pages, just flash, images, videos, and so forth. Kinda difficult for those to be useful when no one can get CNN's index.html file, eh?

I don't know about Akamai, but other CDN's such as my employer, Speedera Networks, can cache HTML pages. We can even provide the raw logs back to content provider so you don't lose your statistics. E.g., we do this for the PGA, HP, our own page www.speedera.com and some news portals.

As for CNN on Sept 11th, they never delivered their HTML base page via a CDN which would have made for seemless handling of the traffic. But instead they solved the immediate congestion problem (after 3 hours and 40 minutes) by creating a single stripped down static page that used fewer resources for the site. Here is a timeline of the www.cnn.com home page as seen by our Site Analyser service.

08:50 EDT - Base page errors started occuring, presumably due to lots of requests generating a too high load on CNN's servers. This resulted in end users not being able to see any of the site's content.
12:00 to 13:30 - Base page errors fluctuate with embbeded content errors and a few seconds of DNS response time to 205.188.214.121 which nslookup calls tswebsys2.ptn.aol.com
13:30 - Successfull, sub-second delivery of a stripped down 2915 byte index.html page from www.cnn.com with only single 14144 byte image from akamai.net.

Re:Technical solution - fair queueing on Robo-chattel? New Legal Challenge to 'Bots · 2001-01-12 05:18 · Score: 1

Requests from recently seen IP addresses should go behind requests from new ones.

I like it, I'll get as much of a share of the resource as ALL of the AOL users behind one of the 12 AOL proxies :-)

Re:It's been here for a while on Two-Way Satellite Internet Is Here! · 2000-11-07 04:11 · Score: 1

Okay you've convinced me that it is currently as bad as you report. Especially for the first page at a site. (Subsequent pages probably result in significant browser cache hits of site-wide navigation images).

Theoretically, if you send all the requests at the beginning of the connection, you can reduce that latency. However you still have a minimum added latency of 2.25 seconds, ...

There's the big win. The HTTP/1.1 spec (RFC 2616 Section 8.1.2.2) already explicitly allows for such pipelining.

... and it's questionable whether or not you can do that with current browsers (I personally don't know).

Me neither. But maybe if two-way satelite delivery becomes popular enough, more browsers and proxies will make use of this.

Re:It's been here for a while on Two-Way Satellite Internet Is Here! · 2000-11-06 10:06 · Score: 1

Signe says:

However, for things like web surfing where you're setting up lots of connections (up to 30 or 40 per page sometimes), it's unbearable.

It probably isn't as bad as you suggest. You shouldn't be seeing 30 to 40 connections to pages with modern (or at least future) browsers and servers. After getting the HTML, the popular browsers usually open up 4 to 7 concurrent connections which incur the round trip times in parallel and then reuse them for subsequent requests when possible.

Browsers could also send all the requests in advance (pipelining) and then await all the replies. But I don't think any popular browsers do any request pipelining.

Re:Are they riding coattails or have I misread thi on Explaining The Symbiosis Between QNX RtP & Linux · 2000-11-01 05:29 · Score: 1

They are making most of their source code available. That's giving something back. Their Business FAQ says:

Which portions of the QNX platform can I access in source form?
You can download source for most components, including driver toolkits, OS utilities, TCP/IP stacks, startup code, media players, Internet applications, games, and so on. Components that will remain protected include the OS kernel, core OS modules (e.g. QNX process manager, QNX file system manager), and software licensed from third parties.

Also they say:

Why doesn't QNX provide source to the kernel and other core OS modules?
Because QNX developers don't need kernel source to extend the OS. With QNX's advanced architecture, most OS-level services (drivers, file systems, and so on) exist as user programs that run outside the kernel, just like regular applications. As a result, developing OS extensions doesn't require kernel source - or for that matter, kernel debuggers (tricky) and kernel programmers (expensive). You just use the same tools as for developing user applications.

I completely agree them them here. I used QNX for 3 years and never needed to see how the kernel was implemented. Sometimes to help writing my own applications I needed request the source code in their system utilities under a Non-Disclosure Agreement. Now these should be freely available without an NDA.

While our embedded customers want the flexibility provided by source code, they also demand a stable, high-performance core of technology that they can rely on. With our approach, they can enjoy both. Put simply, we can offer OEMs key benefits of an open source OS, but without the drawbacks.

But I think this is a bogus excuse. If a customer wants a a stable, high-performance core of technology, they could choose to use QNX Software Systems core only, but that doesn't stop QSS from open sourcing it.

I think the real reason why they don't open source is a comercial one. Maybe they would be better off open source the lot and offer their services for hire, but that's a risky decision which is theirs to make. At this point in time I'd rather thank them for code they are making available, instead of chastizing them for the code they aren't making available.

Re:I have used QNX on serious projects on Explaining The Symbiosis Between QNX RtP & Linux · 2000-11-01 04:42 · Score: 1

Omnigeek wrote

On the minus side, QNX (at least then) did NOT let you create a bootable floppy, something that annoys me no end. We had sufficient licences for all nodes (at $hundreds per node), but ya still needed those double-damned fingerprinted floppies to make it work.

When I used QNX 5 years ago I didn't needed to use a floppy to boot them. In fact, I often never used a hard disk either.

Slashdot Mirror

User: the+frizz

Comments · 48