Slashdot Mirror


The Setup Behind Microsoft.com

Toreo asesino writes "Jeff Alexander gives an insight into how Microsoft runs its main sites. Interesting details include having no firewall, having to manage 650 GB of IIS logs every day, and the use of their yet unreleased Windows Server 2008 in a production environment.

5 of 412 comments (clear)

  1. Re:Beta in production environment. by EvanED · · Score: 5, Informative

    Vista was never meant as a server. Same as XP isn't used as a server, it's Server 2003.

  2. Re:Firewall Schmirewall by great_snoopy · · Score: 5, Informative

    Of course they have a firewall, just watch the difference between a tcptraceroute to a public port (like 80) and tcptraceroute to the same ip but some other port (like 110 pop3 for example). You'll see that packets get dropped at some point indicating a firewall. It's not a RST (port closed) it's just dropping packets for nonpublic services. That is a packet filtering firewall.

  3. Re:Beta in production environment. by schnikies79 · · Score: 5, Informative

    Funny, but you're wrong. Pro is for networking enviorments where you need RDP, policies, ability to join a domain, file encryption, etc. Home lacks these.

    --
    Gone!
  4. Re:Swimming in acronym soup... by Anonymous Coward · · Score: 5, Informative

    GFS: Global Foundation Services. Microsoft's big internal network management thing. It's the people who keep the servers up and running for everything facing outward.

    HBI: High Business Impact. Social Security numbers ,Passport accounts, etc.

    NLB: Network Load Balancer.

    AV: AntiVirus.
    DoS: Denial of Service
    IIS: Internet Information Services. 'httpd' for Windows.

  5. Re:Firewall Schmirewall by lena_10326 · · Score: 5, Informative

    My question is why are the logs in ASCII text format? When all you want is say the IP [4 bytes], time of day [4 bytes], URI, referrer and return code [do you really care about their browser strings? You are MS after all, just assume it's IE]. Storing an IP as text requires on average 15 bytes, so right there you can shave off 11 bytes with a binary IP. Time of day is worse, a date+time string is like 25 chars. Doesn't seem like much, but multiply the 32 bytes per entry you save by say 50 million hits and that's 1.5Gbyte you saved. That's not counting the white space you can remove, and a simple huffman code you could apply to the URL/referrer.

    Logging in fixed format is not more efficient than variable format text files (unless we're talking about transactions but we're not). Let's assume you're logging the basics: IP address, Timestamp, Return code, URI and we'll look at logging in fixed format then variable format.

    [abcd] [timestmap] [code] [URI]
    4 bytes 8 bytes 1 byte 50 bytes (you actually need 2 bytes for HTTP return code, but let's ignore that)

    Every record will require 63 bytes and we'll round up to 64 for proper word alignment). So, if we log 1000 messages, we will consume 64,000 bytes total.

    Ok. Now for text logging with space delimiters. We have 3 options below, each requiring slightly less space than the previous. We'll run totals for each.

    123.567.890.123 YYYYMMDDHHMMSS x URI...............\n
    16 bytes 15 bytes 2 bytes 50 bytes 1 byte

    123.567.890.123 1197572382 x URI...............\n (UNIX time)
    16 bytes 11 bytes 2 bytes 50 bytes 1 byte

    1235678901231197572382xURI...............\n (UNIX time)
    12 bytes 10 bytes 1 bytes 50 bytes 1 byte

    16 + 15 + 2 + 50 + 1 = 84 bytes * 1000 = 84,000 bytes
    16 + 11 + 2 + 50 + 1 = 80 bytes * 1000 = 80,000 bytes
    12 + 10 + 1 + 50 + 1 = 74 bytes * 1000 = 74,000 bytes

    Wow. Fixed binary format kicks variable text format's ass. Wrong. This assumes the URI (or message) block will always occupy 50 bytes. It will not. Let's go right down the middle and assume it averages 25 bytes and we'll recalculate.

    16 + 15 + 2 + 25 + 1 = 59 bytes * 1000 = 59,000 bytes
    16 + 11 + 2 + 25 + 1 = 55 bytes * 1000 = 55,000 bytes
    12 + 10 + 1 + 25 + 1 = 49 bytes * 1000 = 49,000 bytes

    Variable text format almost always beats fixed binary format for logging. That's why Microsoft (and the rest of the world) stores log files as text. Plus, it's far easier to manage and debug when you can slice and dice the files with standard command line tools.

    One more thing. I know what you might be thinking. We're logging URLS, which will probably consume the majority of the 50 byte allotment. Most developers will calculate an average width size and double it, so no matter what we'll still be filling about 50% of the message section.

    Last point. If I were to use your example, the savings with text logging would even be greater. 2 URLS would be stored, both consuming about 50% of their data block. IP address, timestamp, URI, Referrer URI, Return Code. There's also a bunch of other little optimizations you can do such as storing the domain, year, month, and day in the filename rather than in the data or dropping the least significant byte in the HTTP return code.

    --
    Camping on quad since 1996.