jon_c · Slashdot Mirror

Re:TCP/IP for MMORPG on Anarchy Online - The Perils Of Pushing Products · 2001-07-13 05:34 · Score: 1

True, but about 98% of the time the data will be different. there is little/no time when you are not getting input from the server. When playing you will always have a monster, other player, status to know about.

I'm not defending them, i just want to good reason why they made a mistake.

-Jon

mod this moron down. on Lossy Music Formats Compared · 2001-07-13 05:16 · Score: 1

Anyone who promotes linux and "license free software" and still promotes mp3 is a idiot. Fraunhofer-IIS has massive licensing and patents on MP3 and MP3Pro.

next time you promote your hippy ways rememeber to push Vorbis, that is the correct response.

-Jon

MS mp3 encoding not bad. on Lossy Music Formats Compared · 2001-07-13 05:09 · Score: 1

...it's crippeled to 56kbit. It uses frouphefer (spelling?) encoder, you know, the guys who made MP3?

Fouhenipher (heh) MP3 It's the best encoder out there, however one should not that mp3 is very bad at 56kbit compared to WMA, which would explain your ignornace.

-Jon

TCP/IP for MMORPG on Anarchy Online - The Perils Of Pushing Products · 2001-07-12 12:31 · Score: 1

in case anyone cares Graeme basicly starts off by saying "hey, i'm really good at this stuff, i think you did it wronge" and then goes into a subjective analysis of how he thinks the game works followed by slamming them for using TCP instead of UDP.

Personally i'm unsure of the TCP/UDP problem. after i read his plan i researched TCP loads on servers and found that many OS's can not handle more then 10k simutainus TCP sessions, still it seems that with the right load balencing it wouldn't be a problem.

For instance in AO there are hard zones. hard zones could be server transfers, if it is a phisical server transfer then you could be able to scale per server. it is very safe to say you won't have more then 10k people in one zone at a time, which would seem to make the UDP TCP discussion pointless. If there was only one phyiscal server for the intire game i could see the problem. if routers couldn't handle the traffic i could see the problem (btw there routers did have some major problems the first few days). Seeing that you can scale it by zone i really don't see the issue.

Also if one was to do UDP, wouldn't you have to put some type of sequencing and error correction yourself? - much like how RPC does if it goes over UDP? i can understand them not wanting to buffer the data (Nagle algrythim i belive) however you can turn that off.

so whats the problem with TCP?

-Jon

I don't agree on the age thing. on How To Deal With (Techie) Prima Donnas · 2001-07-10 08:51 · Score: 1

I'm not completly sure, but i think a co-worker of mine would fit pretty well. He won't admit a mistake, trys to take over other projects, and think's he is very (very) smart.

does this fit? i don't know.. he has a large ego, he's never wronge, he thinks he's a smarter then everyone. but he also talks to other programmers about his work (hell that's all he does) and is older.

Is he a pre-madonna? maybe, i think this artical try's to cast a narrow sterotype where there are a lot of grays, that's just my opinion.

anyway, thanks for the copy. i couldn't get the link to work either.

-Jon

marketing esr and rms on Ask IBM's Linux Marketing Director · 2001-06-27 00:37 · Score: 1

is Eric S. Raymond (esr) an asset in marketing? How do people in your line of work see him? How about Richard S. Stallman (rms), does it concern you that some of the most verbal people in the linux world could be considered by some to be nuts?

-Jon

mod up on Unix: A Component Architecture? · 2001-06-26 12:29 · Score: 1

my thoughts exactly! i went to the link thinking i would hear about how unix lacks a real component object model. instead i read a bunch of BS about how pipeing streams of data from one app to another is the ultimate tool.

unix badly needs something like COM. I would also agree that COM is flawed. I've spent the last 6 months at my job learning all the in's and out's, and you are right. it's very badly implemented.

-Jon

damm your getting screwed. on MSDN Subscriber Forced to use Passport · 2001-06-26 11:35 · Score: 1

most dev's make a hell of a lot more then that. i just hope thats after taxes.

-Jon

Re:What group we're you interning for? on Proudly Serving My Corporate Masters · 2001-06-21 03:34 · Score: 1

Are you're in Seattle?

Re:[ot]Google's data structure? on Interview With Google's Director of Research · 2001-06-21 02:58 · Score: 1

Google's data structures are optimized so that a large document collection can be crawled, indexed, and searched with little cost. Although, CPUs and bulk input output rates have improved dramatically over the years, a disk seek still requires about 10 ms to complete. Google is designed to avoid disk seeks whenever possible, and this has had a considerable influence on the design of the data structures.

BigFiles
BigFiles are virtual files spanning multiple file systems and are addressable by 64 bit integers. The allocation among multiple file systems is handled automatically. The BigFiles package also handles allocation and deallocation of file descriptors, since the operating systems do not provide enough for our needs. BigFiles also support rudimentary compression options.
4.2.2 Repository

Figure 2. Repository Data Structure
The repository contains the full HTML of every web page. Each page is compressed using zlib (see RFC1950). The choice of compression technique is a tradeoff between speed and compression ratio. We chose zlib's speed over a significant improvement in compression offered by bzip. The compression rate of bzip was approximately 4 to 1 on the repository as compared to zlib's 3 to 1 compression. In the repository, the documents are stored one after the other and are prefixed by docID, length, and URL as can be seen in Figure 2. The repository requires no other data structures to be used in order to access it. This helps with data consistency and makes development much easier; we can rebuild all the other data structures from only the repository and a file which lists crawler errors.

Document Index
The document index keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, ordered by docID. The information stored in each entry includes the current document status, a pointer into the repository, a document checksum, and various statistics. If the document has been crawled, it also contains a pointer into a variable width file called docinfo which contains its URL and title. Otherwise the pointer points into the URLlist which contains just the URL. This design decision was driven by the desire to have a reasonably compact data structure, and the ability to fetch a record in one disk seek during a search
Additionally, there is a file which is used to convert URLs into docIDs. It is a list of URL checksums with their corresponding docIDs and is sorted by checksum. In order to find the docID of a particular URL, the URL's checksum is computed and a binary search is performed on the checksums file to find its docID. URLs may be converted into docIDs in batch by doing a merge with this file. This is the technique the URLresolver uses to turn URLs into docIDs. This batch mode of update is crucial because otherwise we must perform one seek for every link which assuming one disk would take more than a month for our 322 million link dataset.

Lexicon
The lexicon has several different forms. One important change from earlier systems is that the lexicon can fit in memory for a reasonable price. In the current implementation we can keep the lexicon in memory on a machine with 256 MB of main memory. The current lexicon contains 14 million words (though some rare words were not added to the lexicon). It is implemented in two parts -- a list of the words (concatenated together but separated by nulls) and a hash table of pointers. For various functions, the list of words has some auxiliary information which is beyond the scope of this paper to explain fully.

Hit Lists
A hit list corresponds to a list of occurrences of a particular word in a particular document including position, font, and capitalization information. Hit lists account for most of the space used in both the forward and the inverted indices. Because of this, it is important to represent them as efficiently as possible. We considered several alternatives for encoding position, font, and capitalization -- simple encoding (a triple of integers), a compact encoding (a hand optimized allocation of bits), and Huffman coding. In the end we chose a hand optimized compact encoding since it required far less space than the simple encoding and far less bit manipulation than Huffman coding. The details of the hits are shown in Figure 3.
Our compact encoding uses two bytes for every hit. There are two types of hits: fancy hits and plain hits. Fancy hits include hits occurring in a URL, title, anchor text, or meta tag. Plain hits include everything else. A plain hit consists of a capitalization bit, font size, and 12 bits of word position in a document (all positions higher than 4095 are labeled 4096). Font size is represented relative to the rest of the document using three bits (only 7 values are actually used because 111 is the flag that signals a fancy hit). A fancy hit consists of a capitalization bit, the font size set to 7 to indicate it is a fancy hit, 4 bits to encode the type of fancy hit, and 8 bits of position. For anchor hits, the 8 bits of position are split into 4 bits for position in anchor and 4 bits for a hash of the docID the anchor occurs in. This gives us some limited phrase searching as long as there are not that many anchors for a particular word. We expect to update the way that anchor hits are stored to allow for greater resolution in the position and docIDhash fields. We use font size relative to the rest of the document because when searching, you do not want to rank otherwise identical documents differently just because one of the documents is in a larger font.

The length of a hit list is stored before the hits themselves. To save space, the length of the hit list is combined with the wordID in the forward index and the docID in the inverted index. This limits it to 8 and 5 bits respectively (there are some tricks which allow 8 bits to be borrowed from the wordID). If the length is longer than would fit in that many bits, an escape code is used in those bits, and the next two bytes contain the actual length.

Forward Index
The forward index is actually already partially sorted. It is stored in a number of barrels (we used 64). Each barrel holds a range of wordID's. If a document contains words that fall into a particular barrel, the docID is recorded into the barrel, followed by a list of wordID's with hitlists which correspond to those words. This scheme requires slightly more storage because of duplicated docIDs but the difference is very small for a reasonable number of buckets and saves considerable time and coding complexity in the final indexing phase done by the sorter. Furthermore, instead of storing actual wordID's, we store each wordID as a relative difference from the minimum wordID that falls into the barrel the wordID is in. This way, we can use just 24 bits for the wordID's in the unsorted barrels, leaving 8 bits for the hit list length.

Inverted Index
The inverted index consists of the same barrels as the forward index, except that they have been processed by the sorter. For every valid wordID, the lexicon contains a pointer into the barrel that wordID falls into. It points to a doclist of docID's together with their corresponding hit lists. This doclist represents all the occurrences of that word in all documents.
An important issue is in what order the docID's should appear in the doclist. One simple solution is to store them sorted by docID. This allows for quick merging of different doclists for multiple word queries. Another option is to store them sorted by a ranking of the occurrence of the word in each document. This makes answering one word queries trivial and makes it likely that the answers to multiple word queries are near the start. However, merging is much more difficult. Also, this makes development much more difficult in that a change to the ranking function requires a rebuild of the index. We chose a compromise between these options, keeping two sets of inverted barrels -- one set for hit lists which include title or anchor hits and another set for all hit lists. This way, we check the first set of barrels first and if there are not enough matches within those barrels we check the larger ones.

Re:why *I* like google on Interview With Google's Director of Research · 2001-06-21 02:53 · Score: 1

who else has BSD only searches?.. and not only that, a cool BSD google logo!

http://www.google.com/bsd

-jon

Re:[ot]Google's data structure? on Interview With Google's Director of Research · 2001-06-21 01:01 · Score: 1

I would seriously doubt they have a SQL interface for there DB. I also would bet a you mom's poop that they don't use a comersial database for the website indexing.

I figure it's something derived from a B-Tree (like a binary tree - but better for databases) and distribute it on a cluster of of boxens (linux right?)

I'm sure there's a hell of a lot more to it then that. a hell of a lot more, hell let's ask him.

begin question
Hey google guy, how is the webpage index data stored and retrieved. What data structures and what algorithms are used. how many boxes do you have for indexing?
end question

maybe he'll answer.

-Jon

What group we're you interning for? on Proudly Serving My Corporate Masters · 2001-06-21 00:09 · Score: 1

When i was over at MS Press I didn't know of any (of the two) coders that smoked. There was this guy who was a pheduo content/vb guy... i think he smoked on occasion. The only dev i respected their didn't smoke.

Latter when i was at MS Research (in Seattle you just end up working at MS, hald of you're calls will br for them). I was one of the only dev's that smoked. I remember a tester for Aliegence (a game from MS Research) he smoked... This one kina scary looking FM for the database division smoked. That was about it.

I've never worked on main campus so i can't say. But i never saw to many smokers there. maybe it's a thing of that past. Most of the dev's in research we're phd's types. (usully also phd's) I didn't know one that smoked.

Something that will happen to you if you live in Seattle and work at MS. you'll eventually end up at a bar one weekend, and you'll try to pick up a girl (try is emphsized) and you'll find out that she also works at MS, or at least her brother does. and then you have the most retarted exchange of words

"oh my brother/sister/friend works at MS, what group are you with"

"um.. Research"
"oh, he works with Exchange IP, maybe you know him"
"uh, maybe." (ya.. right)
"he's name is , you know him?"
"nope"

"huh"

did i mention that i couldn't get laid in Seattle? They generally all know MS people, an as a whole don't like them. There's the rich new blood that made living on the east side a upper middle class ordeal. you can't find cheap housing anymore, at least no-where near redmond or bellevue.

-Jon

karam whoring on Proudly Serving My Corporate Masters · 2001-06-20 23:47 · Score: 5

Funny enough I was just reading about the author and some of his columns: here's some links

columns
home page
comments posted at kur5shin.org
stories posted to kuro5hin.org. one i like is where he talked about NT's TCP/IP stack history and why it's not from BSD
He's no MS shrill he was the one a while back proposed that we use the XBox as a cheap web farm

anyway interesting stuff.

-Jon

he was up there on Proudly Serving My Corporate Masters · 2001-06-20 23:40 · Score: 1

He worked as a NT kernel developer. at least during part of it.

-Jon

troll? on Kernel Configuration As An Adventure · 2001-06-20 02:38 · Score: 1

shesh. i think someone really doesn't like me.

mod down on WSJ Reports On MS Using Open Source · 2001-06-18 04:03 · Score: 1

this is a lie. As others have said your story seems like some kind of anti-ms crap, bassicly so you'll get modded up.. for, you know, "tellin it like it is", while also telling these linux zealots what they want to hear.

it actually sickens me.

why would I run a stable OS under an unstable one?

What kind of bullshit at that? you want some quotes from MS employess I worked with.

"It's a mirecle Windows 9x boots"

"Win9x code has comments in it that say 'we don't know what this does, but if we take it out, it breaks'"

"We're using linux for our webserver, just don't tell ITG that"

"NT's tight, but we're bitchs to anyone with cash. we have code in the kernal that actually check if adobe photoshop is running, just to quick fix a problem"

I worked at MS for two years. MS employers are geeks, they have some pride in what they do, so maybe they'll use WinCE instead of Palm, or something like that. but i've never met one who was a bigit against linux, most thought it was very cool, and did run it at home.

-Jon

OT Nov 8th????? on Microsoft Gets XBox Name · 2001-06-17 23:31 · Score: 1

Microsoft remains on target for a successful North American launch of its Xbox video game system on Nov. 8.

Thats not the target I remember, just back at E3 it was going to be like late september, or pretty close to that. The big deal was how Nintendo decided they we're not only release there shiznat a WEEK earlyer, but also for a 100 bucks less.

Anyone know what's up with them changing the date?

-Jon

dice seems to agree on Former Dot-Com Workers Crowd Homeless Shelters · 2001-06-17 04:46 · Score: 1

C++: 17673
Java: 9835
perl: 4060
Assembly: 3241
VB: 2473
asp: 2465
JavaScript: 2186
VBScript: 496
php: 140

if you know something that's easy you're fuck out of luck. learn C++ if you want a job, they don't need anymore php/javascript/asp hacks anymore.

What i find interesting is the number of "enbedded systems" developers they need w/ assembly exp. last time i went looking for a job it was all about the server side scripting stuff. I have to say i'm glade those jobs are drying up. (i know i'm singing to the choir). But the amount of people i saw reading "ASP in 24 hours" to get a high payed job astounded me.

We have a guy at my current job who's doing VB stuff, before this job he did a lot of ASP/VBScript stuff and a little VB. he's a jock, he doesn't really care about tech, and writes HORRIBLE code. things like copying and pasting the same freaking 6 lines 30 times. using Variant's, no hungarian, global varaibles, not using Option Explicit. not suprisingly he's section of the our app has been called "a quivering peice of dog shit" in confidence by another senior dev, and not suprisingly he gets the lions share of the bug's reported to him.

Naturally my compains pretty clueless to this. then again the majority of our "dev's" only know VB... sigh.

-Jon

don't do it by just 'hits' on Ask Dan Kusnetzky About Linux Server Counts · 2001-06-14 04:21 · Score: 1

There is a lot more do being a HTTP server then just fetching files off the file system and dumping it over a socket. Most websites incoperate server side scripting and databases.

For example if slashdot we're a static site I would guess that it would be much faster and available. However Slashdot is not a static site, it's one of the most dynamic sites on the web, it constantly works the hell out of MySQL it caches the results, and serves up static pages (i.e. the home page)

I think most people here are really interested in the nebulas question "what OS is the _best_" which of course you really can't answer. However I think if you we're going to use the amound of traffic a OS can handle as a measuring stick, then it's only fair to talk about constant load, and complexity or that load rather then just the number of 'hits'.

The problem with calculating that metric is that it's WAY more complecated and subjective then just hits, which may make it dam near unatainable. I mean how would you compare the complexity of a mod_perl+ssh+MySQL site vs. a WebSphere+Apache+DB2 or IIS+MTS+COM+. You could have to come up with a number saying, "ok this is a complexity of 8, where is is a 5' and you really can't do that.

-Jon

that's liGNUx on Gartner Claims Less Linux Than IDC · 2001-06-12 08:11 · Score: 1

RMS actually wanted Linux+GNU to be the clever: liGNUx. however some people didn't think it had quite the "ring" it should have. so he decided GNU/Linux was good enough.

-Jon

ya but the drivers suck/non-exsistant. on OSX/Win2K Deathmatch · 2001-06-09 03:22 · Score: 1

I'm typing this on my win2k install which is on a 20gig ATA-RAID partition, it's made out of two 20 gig drives.

I left the rest of the room for linux, or whatever else. unfortuantly i over estimated the ATA driver support in linux and can not install it on this drive.

My controler is from HighPoint technologies and came with my motherboard, HighPoint does support a driver in the form of a 2.2 kernal patch, however it does not support mirroring which is what i am using.

I subsequently added another drive to my regular IDE-33 chain so i could install linux, but it's a much slower driver and is only 6gigs.

A while back a ran a live video webcam from my apartment, i was running this on a win2k P-III 500. i really wanted to move to linux so i could telnet/ssh into it remotly, maybe run some scripts for it etc.. unfortatly my webcam was not supported in linux, and even if it was there isn't any real-time video encoding software i know that will work with Real or Windows media.

An OS is a peice of software that interfaces the hardware so other programs can use it. Most of the comparion was on those apps, not the OS. The apps that ship with an OS are usully not the best for the job, and IMO should not the the mesuring stick for the OS itself. the usefullness of an OS is the number of quality apps it can effeciently run, and the amount of quality quality hardware and can effeciently interface with. nothing else.

-Jon

mod up on Just For Fun · 2001-06-05 00:29 · Score: 1

thanks, i've been wanting to listen to this.

one thing got me:

interviewer: So it's licenced under the "General Public Licence"?

Linus: Yes, the "General Public Licence"

WTF?

-Jon

data sorting... on NSA Tapping Underwater Fiber Optics · 2001-05-23 10:04 · Score: 1

While reading the artical I started thinking about how to sort all that data.. If you we're looking for something specific from somewhere in paticular it doesn't *seem* like it would be that hard.

just filter for an ip/subnet and record that. then latter try to break the crypto or whatever.

-Jon

it's in assembly on First Legal Test of the GPL · 2001-05-23 02:54 · Score: 1

This would be true, however the guy who made VirtualDub did a good deal of his best work in x86 assembley, so it's very clear from looking at a debugger if they stole his code.

-Jon

Slashdot Mirror

User: jon_c

Comments · 532