Domain: bitwizard.nl
Stories and comments across the archive that link to bitwizard.nl.
Comments · 31
-
Matt's TraceRoute
-
Some more tools
Wireshark was already mentioned, so I'll list some other tools I've found useful:
Mtr is better than traceroute. It has ncurses and graphical versions.
For persistent ping tests, I can recommend SmokePing.
Any modern network should have SNMP monitoring capability in the switches and routers. Ask permissions to get read-only access on the devices and there's a wealth of information to be gathered. From basic information like port status, packet/byte counters, to more advanced like topologies learned by MAC learning and neighbor discovery protocols (CDP, LLDP). Or you can just buy one for the class. 100M 24-port managed switches are not that expensive and a Linux server can be used as a SNMP-enabled router (Install and configure snmpd).
To actually act on that data.. You can try one-off tools like Cacti for traffic monitoring, and NetDisco for device and topology discovery. Or a huge does-it-all tool like OpenNMS.
Managed network devices can also dump traffic, either using "monitoring ports" (that mirror traffic from other ports), sflow (sampled stream of packets, unless 1:1 sampling, only useful for statistical traffic measurements) or nflow/ipfix (aggregated flows).
I'm especially fond of nflow, in addition to previous tools. Nflows can be used to analyze, post-mortem, who contacted and where and how much data was transferred at what kind of approximate pattern. This kind of data can be dug out from a full dump, but it's usually infeasible to dump _everything_ to disk. I've used flow-tools.
-
Re:Stoopid.
A bit dated, but...
http://www.bitwizard.nl/sig11/
Short answer: Build Linux Kernels. Lots of them.
There's also something to be said for md5sum of large files. And memtest86. And recently I noticed a rare southbridgeRAM bug that flipped 1 bit every 500GB or so of data read in. That was hell to isolate! Turned out the RAM was not performing to spec.
-
Re:So let me get this straight:
Alright. The source files for the code in my citation are WinMTRNet.cpp (see also WinMTRNet.h) and net.c in the mtr-0.80.tar.gz. Both include the ICMPHeader struct with identical field names (not very strong evidence), the struct sequence with identical field names, and the structs nethost and s_nethost which share half a dozen field names precisely--when they do they are in the same order. These three structs are in the same order in each file.
The functions after new_sequence in the MTR source are, in order, net_send_query, net_process_ping, and net_process_return; the functions after GetNewSequence are SendQuery, ProcessPing, and ProcessReturn.
These are things I noticed looking at those two source files briefly; I didn't have to hunt around at all. It's reasonable to expect a fair amount of difference since WinMTR was started ~8 years ago and both sources have diverged somewhat since then. Expecting large blocks of identical code seems unreasonable because of the time frame, though I wouldn't be at all surprised to find numerous snippets like the above in a detailed analysis.
There appear to be large blocks of very suggestively similar code. Ordering and name conventions alone are enough for me that I would swear in court that the two are derivatives. -
Re:All it does is Traceroute and Ping?
No. The current maintainer of MTR, Roger Wolff, does care. He's quite explicit in his email, reproduced in TFA. He's been maintaining the software since before WinMTR existed as well according to the MTR page.
-
Re:Expect problems and bugs with OS software?
In the early days many machines would run fine with DOS and Windows but would crash with signal 11 on Linux, particularly when running gcc. As strange as it sounds this was usually a hardware problem - bad memory. There was even a FAQ on the signal 11 problem. Saying 'but it works with Windows' does not really excuse bad hardware. Similarly, if hardware is generating BSODs on Windows, and you have good reason to believe they're not caused by Windows kernel bugs, then most likely the hardware is faulty and Linux just doesn't push it as hard, or perhaps masks the problem rather than trapping it and dying immediately (which is the safest course of action).
I can't rule out that Windows prints a meaningless complaint about IRQ levels when the real cause is a bug somewhere else. -
Re:If you're stuck with one of these...
Please remember that these kind of programs only correctly identify faults in some cases. In other words: Don't rely 100% on them.
Clicky (go way down, you'll see questions relating to this) -
Re:bad ram a common problem
I've had RAM which could pass all day long on a so-called memory tester, put it into a PC and the thing couldn't even finish POST.
I used to use gcc linux kernel compile to thrash-test memory - start enough of 'em so it just starts to swap and let it run in a loop overnight. If no signal-11's in the morning it'll probably survive anything else.
-
First 10 on a unix box (Solaris/Linux mainly)Here are my first ten on my unix workstation:
- OpenSSL - support program
- OpenSSH - connections in and out
- Mutt - email
- nmap - scanning tool
- libpcap - support library
- Ethereal - network sniffer
- mtr (Matt's TraceRoute) - trace problems
- whois (ARIN compatible) - find where the problems are
- tf (tinyfugue) - BBS client
- mangband - multiplayer ascii game
-
Free equivalent
You're looking for the excellent mtr.
Believe me, there isn't anything you can do on a network in Windows that you can't do better in Linux. -
Re:Emperor's New Clothes test...
No memtest86 isn't that good for memory issues. It lets loads of things though. Try this sig11 stuff. I've found memory issues that memtest86 doesn't pick up. X and kernel compiles are good for bringing memory issue to light.
-
Re:Stay calm, this is a thread hijack. X11 on OS X
feh
gtk-gnutella
lopster
dc_gui w/ dctc
mtr
gkrellm (Not sure how well this would work...do OS X systems have a compatible /proc?)
xmms
-
Signal 11 FAQ
I don't know if this will directly address your problem, but I found it helpful once for diagnosing a bad FPU. There's lots of good tidbits talking about bad hardware and its symptoms.
-
Re:Get a booting -from CD distro + Compile the ker
~$ gcc explanation_of_signal11.c
gcc: Internal compiler error: program cc1 got fatal signal 11
~$ damn. maybe he/she will follow that link and understand it, then.
sh: damn.: command not found
~$ exit
logout -
Censored: My experiences, memory problems...
Censorship appears alive and well on Slashdot. Some low-life decided to mark my post down (-1), despite the fact that it actually *IS* relevant to the topic at hand. So I'm reposting...
BTW: NASA uses computers with multiple, redundant CPU's to detect problems. Do you use multiple CPU's as real-time backups? Where do you think a bit is more likely to become corrupted? In the CPU or in the RAM? (Well, neither, unless the heatsink fails...)
---
In my experiences, ECC is not worthwhile. There are too many ways the data can get corrupted before it ever hits the memory stick. ECC only helps if the information is accurately present on the memory data lines attacked the the RAM module, and then only when the RAM module itself fails. Otherwise you are just recording, with error-correction, incorrect data. And lets face it: If the memory module itself is fried, ECC ain't going to help.
Testing: I had some rather painful experiences with a FIC-503+ motherboard. Turned out to have a design defect that caused problems when both DIMM slots were utilized, regardless of the RAM type.
To test it, under linux (of course), with a minimal boot, running as few processes as possible, I created a large file (${FILE}) of non-uniform data by cat'ing (combining) several arbitrary convenient large files. About 2x - 3x the total size of all my RAM. I then did:
repeat 100 cksum ${FILE} | uniq -c
Any problems showed up right away. (Cksum returned different numbers.)
This was a simpler approach, though not quite as good, as the general make 100 linux kernels and diff the make-logs.
You might also look at: http://www.bitwizard.nl/sig11/
...Anonymous. Still too lazy to log in... -
My experiences, memory problems...
In my experiences, ECC is not worthwhile. There are too many ways the data can get corrupted before it ever hits the memory stick. ECC only helps if the information is accurately present on the memory data lines attacked the the RAM module, and then only when the RAM module itself fails. Otherwise you are just recording, with error-correction, incorrect data. And lets face it: If the memory module itself is fried, ECC ain't going to help.
Testing: I had some rather painful experiences with a FIC-503+ motherboard. Turned out to have a design defect that caused problems when both DIMM slots were utilized, regardless of the RAM type.
To test it, under linux (of course), with a minimal boot, running as few processes as possible, I created a large file (${FILE}) of non-uniform data by cat'ing (combining) several arbitrary convenient large files. About 2x - 3x the total size of all my RAM. I then did:
repeat 100 cksum ${FILE} | uniq -c
Any problems showed up right away. (Cksum returned different numbers.)
This was a simpler approach, though not quite as good, as the general make 100 linux kernels and diff the make-logs.
You might also look at: http://www.bitwizard.nl/sig11/
...Anonymous. Still too lazy to log in... -
Re:your cisco?
Wow, a business model built around mtr, isn't that fan-freaking-tastic
-
Maybe hardware
It is possible that its related to hardware. Check out http://www.bitwizard.nl/sig11/ - grep the page for "K6". If you get a signal 11 (no, not the Signal 11), its almost always CPU or memory related. I have an AMD-K6 350 that exhibits the same symptoms, unless I disable the CPU cache in the BIOS.
---- -
Re:I smell money...I've had my share of stupid Linux users, too. Spoke with a Mandrake user this morning who didn't realize I was going to ask him to check his configuration. He didn't have the machine turned on! Linux won't make its users smarter. Then again, it wasn't designed for that - for that matter, Windows wasn't either.
Have you had 100 unclean shutdowns?
No, I've had many more than that, especially on my laptop, which likes to spontaneously un-suspend while I'm not looking, and it's unplugged. I've had fsck -a bail three times in the last six years, and all three were traced to hardware failure, and none of them were on my laptop.
78 days uptime on a box is not a big deal, regardless of the Operating System.
No, it's not a big deal, unless that box is used in common by 15 users accustomed to Windows (AND rebooting). Maybe it's because I haven't told them HOW to reboot the machine. The box has been running Linux for a year and a half, and no one's needed to know how to reboot it, except for me, when I update the kernel.
These problems are difficult to troubleshoot as NT provides no sure fire way of finding out what is wrong.
By contrast, Linux tells you exactly what's wrong, and sometimes will even tell you how to fix it! Of course, if you have a bizarre and mysterious error message, you should cross reference it against sig11.
--- -
Re:"(a la Windows =))" remark out of line
Linux even appears to be more vulnerable to hardware failure than windows. See the signal 11 faq (no, not the slashdot user but the error message). I even experienced it myself. Under windows, the only randomness I experienced was a random crashing of the dos box. This didn't exactly strike me as something unusal. NT frequently showed me a blue screen, but everyone assured me that that was normal behaviour. Linux gave me a lot of signal 11's, especially when compiling, which the signal 11 faq explained almost certainly indicates a dodgy memory chip.
-
From the other end...I'm usually at the other end than the person who asked the question here.
I get asked to write Linux drivers for various hardware. The guys who made the chip and their colleagues have the closest feeling with the chip and its interfaces. However, I have intimate knowledge of Linux.
So when you develop "in-house", there is one advantage, if you leave it to the "linux experts" there is another advantage.
So the question is: which way do the scales tip?
Brook's law doesn't apply if you add programmers at the beginning. Thinking about the project with a medium-sized team will make the design better. This will reduce development time.
If you start with one, two or three programmers, and later start adding more and more programmers because you're running late, you'll find out first hand about brook's law.
Also, a company wanting to start supporting Linux should hire us because if you hire a person to "also" do the Linux driver, soon, he'll be doing nothing else. So supporting Linux would cost you a full year-salary per year. We offer MUCH cheaper maintenance and support contracts. And for that money you can have one or two drivers developed every year too! Roger.
-
From the other end...I'm usually at the other end than the person who asked the question here.
I get asked to write Linux drivers for various hardware. The guys who made the chip and their colleagues have the closest feeling with the chip and its interfaces. However, I have intimate knowledge of Linux.
So when you develop "in-house", there is one advantage, if you leave it to the "linux experts" there is another advantage.
So the question is: which way do the scales tip?
Brook's law doesn't apply if you add programmers at the beginning. Thinking about the project with a medium-sized team will make the design better. This will reduce development time.
If you start with one, two or three programmers, and later start adding more and more programmers because you're running late, you'll find out first hand about brook's law.
Also, a company wanting to start supporting Linux should hire us because if you hire a person to "also" do the Linux driver, soon, he'll be doing nothing else. So supporting Linux would cost you a full year-salary per year. We offer MUCH cheaper maintenance and support contracts. And for that money you can have one or two drivers developed every year too! Roger.
-
Re:Project listing site.Ok. Site is up!
I'll do the "different professors get to classify projects into their own categories later on, if there is interest.
Roger.
-
Re:Status updates
Check out http://www.bitwizard.nl/sig11/
"Most likely there is nothing wrong with your installation, your compiler or kernel. It very likely has something to do with your hardware. There are a variety of subsystems that can be wrong, and there is a variety of ways to fix it. Read on, and you'll find out more. There are two exceptions to this "rule". You could be running low on virtual memory, or you could be installing Red Hat 5.x or 6.x. There is more about this near the end."
-- -
Check your hardware if you get signal 11s!
well, the compilation died on me too but if you keep re-executing the commandline (make bzImage, or whatever) it'll eventually finish building.
If you have to reissue 'make bzImage' commands to finish building the kernel, you most definately have some faulty hardware, most likely bad memory. I bet that you're seeing signal 11's when compiling.
Check the The Signal 11 FAQ for clues on how to debug your problem.
First hint, if you're overclocking, don't!
-
Re:Signal 11
If you are getting Fatal Error 11, this might be similar to the Signal 11 that can happen when compiling the kernel. Maybe you have bad hardware...
(Btw: Check out the signal 11 faq)
-
Reinventing the wheel?
Hasn't this been available for a long time?
BitWizard has had a patch on their page for a long time and it supports drives from several different manufacturers, too!
What is new about Nathan's patch that isn't provided by the BitWizard patch? -
Re:signal 11 compiling
this site explains it all.
The essence is: your system is unstable when doing large memory transactions (most likely because of o/cing) - and this happens when using gcc. Sometime a bit gets reset accidentally which leads to the famous GPF - signal 11. -
Here's one
Project here
-
gcc "internal compiler fatal signal 11 error"
This generally shows some sort of hardware problem, generally memory (main or cache) problems, but there is a FAQ specifically on this topic. One location I found for it was http://www.bitwizard.nl/sig11/
I had this problem, and ended up having a bad SIMM.
-
Microsoft DOES NOT controll distribution> Also the gcc acts real funny on my machine with
> a whole bunch fatal signal 11 errors. I noticed
> when I try to recompile somtething multiple
> times, the compiler seems to stop at different
> places with the signal 11 error and something
> about program cc. If any of you know what is
> going on, please tell me.
This is most likely a problem with your hardware. Check out the GCC Signal 11 FAQ for more info.