isn't it better for vendors to supply source code & md5 hash? yes, every tarbal of linux is signed so far. as well as some drivers not included in kernel (yet) and distributed on vendor homepages... just the md5 & source:)
I' sure it's a major bug, but I unable to reproduce it and even when it happens, I'm unable to get anything usefull out of it. I mentoined netconsole being silent, but if you know about something hand it to me:).
the only report I ever found in logs is pasted bellow, even it is not the best place for it; I don't want to report incomplette info to busy developers, but you seem interrested:). the trace below just tells me that this happend just before crash (remote nagios found machine dead few seconds after this happend). all I know is that it happen during a write to pipe, and that postmaster was running process while something like "spin_lock(); schedule();" or "preempt_disable(); schedule();" was called, causing "note: postmaster[1138] exited with preempt_count 1". this makes me sure its rather bug in kernel, which is trigerred by postgresql. user-space process should never be able to kill kernel, but mine postmaster does - but this is the only evidence I have and I guess it's not enough for either kernel or pgsql developers.
Aug 8 22:56:17 travel kernel: Unable to handle kernel paging request at virtual address 20646973 Aug 8 22:56:17 travel kernel: printing eip: Aug 8 22:56:17 travel kernel: c0119d3a Aug 8 22:56:17 travel kernel: *pde = 00000000 Aug 8 22:56:17 travel kernel: Oops: 0000 [#1] Aug 8 22:56:17 travel kernel: PREEMPT SMP Aug 8 22:56:17 travel kernel: Modules linked in: iptable_filter ip_tables ipv6 genrtc dm_mod capability commoncap 8139too mii psmouse ide_cd cdrom ext3 jbd mbcache ide_generic piix ide_disk ide_core raid1 md unix font vesafb cfbcopyarea cfbimgblt cfbfillrect Aug 8 22:56:17 travel kernel: CPU: 0 Aug 8 22:56:17 travel kernel: EIP: 0060:[task_rq_lock+42/144] Not tainted Aug 8 22:56:17 travel kernel: EFLAGS: 00010086 (2.6.8-2-686-smp) Aug 8 22:56:17 travel kernel: EIP is at task_rq_lock+0x2a/0x90 Aug 8 22:56:17 travel kernel: eax: 20646963 ebx: c038bd40 ecx: 00000001 edx: f6f74e70 Aug 8 22:56:17 travel kernel: esi: c038bd40 edi: f6fe8000 ebp: f6fe9e74 esp: f6fe9e64 Aug 8 22:56:17 travel kernel: ds: 007b es: 007b ss: 0068 Aug 8 22:56:17 travel kernel: Process postmaster (pid: 1138, threadinfo=f6fe8000 task=f6830b70) Aug 8 22:56:17 travel kernel: Stack: c11273a0 00000000 f7a3c204 00000001 f6fe9ebc c011a482 f6f74e70 f6fe9eac Aug 8 22:56:17 travel kernel: 000000d0 f6ad4c80 00000001 f6830b70 00000010 c02def80 00000000 00000000 Aug 8 22:56:17 travel kernel: 00000040 000000d0 00000082 00000000 f7a3c204 00000001 f6fe9ee4 c011c561 Aug 8 22:56:17 travel kernel: Call Trace: Aug 8 22:56:17 travel kernel: [try_to_wake_up+34/704] try_to_wake_up+0x22/0x2c0 Aug 8 22:56:17 travel kernel: [__wake_up_common+65/112] __wake_up_common+0x41/0x70 Aug 8 22:56:17 travel kernel: [__wake_up+68/128] __wake_up+0x44/0x80 Aug 8 22:56:17 travel kernel: [pipe_writev+657/800] pipe_writev+0x291/0x320 Aug 8 22:56:17 travel kernel: [pipe_write+56/64] pipe_write+0x38/0x40 Aug 8 22:56:17 travel kernel: [vfs_write+237/352] vfs_write+0xed/0x160 Aug 8 22:56:17 travel kernel: [sys_write+81/128] sys_write+0x51/0x80 Aug 8 22:56:17 travel kernel: [syscall_call+7/11] syscall_call+0x7/0xb Aug 8 22:56:17 travel kernel: Code: 8b 40 10 8b 0c 85 20 00 39 c0 ff 47 14 01 cb 31 c0 86 03 84 Aug 8 22:56:17 travel kernel: note: postmaster[1138] exited with preempt_count 1 Aug 8 22:56:17 travel kernel: bad: scheduling while atomic! Aug 8 22:56:17 travel kernel: [schedule+2191/2208] schedule+0x88f/0x8a0 Aug 8 22:56:17 travel kernel: [free_pages_and_swap_cache+113/160] free_pages_and_swap_cache+0x71/0xa0 Aug 8 22:56:17 travel kernel: [unmap_vmas+482/576] unmap_vmas+0x1e2/0x240 Aug 8 22:56:17 travel kernel: [exit_mmap+177/464] exit_mmap+0xb1/0x1d0 Aug 8 22:56:17 travel kernel: [mmput+109/160] mmput+0x6d/0xa0 Aug 8 22:56:17 travel kernel: [do_exit+418/1328] do_exit+0x1a2/0x530 Aug 8 22:56:17 travel kernel:
to make it more clear, disk crash with pg tables on it has nothing to do with current pgsql installs which bring servers down.. I was repairing some pg tables before and I simply know its a pain, but it might get better since then.
as for sw missuse, read plz thread started by my post (4msgs as of now), I have responded to a coward in a way I usually don't, but I think I haven't missused any sw, including pgsql.
as of memory hog.. current info from what top says there are 4 postmasters, 90meg total, serving (du says) 98mb of databases, while there are two mysqls 40mb total, serving >500mb dbs, and with 4times more queries per minute (31555.46 q/m, pg has ~7300). I think it got better in 5 years but I remember pg ranting about insufficient memory on 256mb ram machine, which was almost "the cream of white boxes" when I was deciding between pg/mysql. so it was memory hog for me...
man, I have 25 yrs working behind my back, I use linux since '93, the 0.9x versions FYI, and I use linux on my desktop computer since '94. when I was standing before the choice, mysql or pgsql, I was quite familiar the environment I'm going to use them on, I was just not familiar with these sql servers.
and no, I didn't changed distros randomly, slack was used by former admin, I have _reinstalled_ to debian woody, and new iron was reinstalled to sarge. and I know pretty well how to compile kernel, some of my patched were accepted in the past and even I'm not active in kernel development about 5 years now, I still know what processor/mainboard change means.
as of db admin.. I regulary take care about 2.5 mil record database (small), but thats one of biggest databases in my small, ~5mil central european country.
and I am pretty sure about what I writte above, so I don't have to post anonymously.
few years back (~1998/9?) I was asked to replace now-dead-sql with mysql or pgsql and I ended up with mysql, because it was much better suited for _simple_web_applications_. I didn't need transactions and needed to support less expensive white boxes, where memory hog like pg simply would not fit or would not perform well. until now, I have never been told it was bad choice by anyone using the servers, custommers, partners etc.
before suggesting other dbs one should carefully pick replacement. I wouldn't tell the following if I had no experience with pg. I don't want to turn this into mysql/pgsql flame: my story might not be easy to hear for pg-lovers, feel warned.
few moths ago, I got a job... find out why simple LAPP (linux-apache-pgsql-postfix:) fails randomly after few days. I have changed distribution from slack to woody. I have changed hardware once, then even migrated from intel to amd. moved to sarge. checked about 5 different kernel versions and zillion configuration options when compiling kernel and I did used "vendor" stable kernels as well. all machines are dying randomly, record is 19 days. well over 25 LAMP servers I give much less attention than to LAPPs work smoothly with 500+ days uptime, whether you take system or mysql server uptime. I migrated the pgsql to one stable LAMP server (running about one year at the moment) and it went down in 2 days. btw the went-down means that machine becomes simply unresponsive and even netconsole.ko is silent!! (printk over udp, worth lokking at).
have you ever repaired pg databases after disk crash? and.. how about pgdump between different versions...? and how do you like pg docs? lol, don't ever recommend me pgsql.
I don't need more featurefull. I need stable server. I'm willing to code something better, use lock-free structures, and I am willing to write more sql queries to implement some "oneliners-in-pgsql". and I build-in some regular data integrity checks into my apps. but I'm not going to use/rely on constantly failing triggers, currupted tables and killed servers. thank god that mysql is gpl. and no, I don't care if sco prays to mickos to port mysql back to sco again.
remember that after seeing a line from ms code you'll cannot ever write a line of code on your own (did you signed shared source nda, didn't you???)
I can tell you: open source is big threat to free source. if I code tomorrow 10 lines 90% similar to what someone three days ago committed into an open source program protected by some see-only-don't-ever-try-to-use copyright/license, my free source sw is in trouble and sometimes an SCO-like can sue me because I had to see their code and copy it.
spending money on fighting malware, cleaning it up or making it illegal won't ever work. ever.
the only thing wich works is spending money on software quality, not from look & feel or user interface or features point of view, but from software design & security view. this is so simple that not much people realise this. imagine: if it is unable to install malware, virus or adware because software is foolproof, and secure by design (!), there would be no malware at all. you don't need laws making it illegal or any antivirus software.
certainly, programmers aren't gods and sometimes an error or buffer overflow or something other makes its way into code. if business policy is to fix it next-business-day or even asap or.., it can be easy because good software can have a typo at most, forgot endless[len] = 0, or... but these can be solved by recompiling & patching. design flaws cannot be fixed fast or completely. windows itself _is_ full of design flaws, what about apps? this is not a flame, if you coded for windows, you know.
if security holes are regulary closed by vendors in reasonable terms, all those who care are fine. those who don't? in my world, there are none. we all use apt-get update && apt-get -y upgrade in daily crontab. this is possible on windows as well - but why didn't ms invested in such system? because development of hidden system features or even rewriting some base parts of code is hard to sell. they need nice looking apps. feature-rich. like html email. what will be next?
if malware or viruses have no easy way of spreading because software is secure, and regulary updated, there would be no malware. and be sure there are secure software systems: imagine life support machines in hospitals. or others. why they are not used on desktop? because tehre are used systems like qnx or plan9 or tens others, which are too difficult for the joe average. but ms programers aren't joe average... are they? so why they don't design & code correctly then?
ms had/has effective marketing people, they sell lighter to lucifer. they managed to get 90%+ uniform environment. we all know that uniform environment is relatively stable and very good in self-developing and productivity. but is vulnerable. ability to damage one node of uniform environment is ability to destroy the 90%+. this is beautifule environment for malware. imagine boris veryclever and his ability destroy 90% of the world.. but if ms only had 20% and remaining computers were spread between 5-10 more _incompatible_ vendors, boris would be able to destroy only 20% or less. if he would be willing to start code for the 20% at all... who cares of 20%...
if male/female terms would ever be made (or considered so) terms offending a person's sex, how would underwear businesses promote men or women underwear? or will they be forced to create some kind of uni-sex underwear?
> Microsoft is in the same boat. It won't be until the Blue Screen of Death is really, provably responsible for human fatalities (Think saftey control at a power plant, or a crash aboard a military vehicle of some kind) that Microsoft will start being more responsible about their security and program design.
no, they won't make their programs better. they do supply windows-based systems into mission-critical industries, and it is the same (and I know what I'm talking about).
mr. Dell (I think) said something like that if MS would supply OS for cars, it would certainly pop-up confirmation box about releasing an airbag. no confirmation on time, no airbag.
I wont certainly be first who takes words of Big Bill (ironicaly) into my mouth...: Nobody will ever need more than 640k RAM. (Curious, see the numbers..?)
One of the major conditions for licence of building a celular-phone in a country where I live, was ability to eavesdrop phone calls and archive them for two weeks. Selected phones on demand (e.g. by police, ingeligence agency,...) had to be eavesdropped for an unlimited period of time.
I do not remember exact resulting number, but units were terabytes. Before you start your brains up, few numbers... 1.2 milions celular phones (5.5 mil. residents total - small country:-))....and now guess the numbers for U.S....hey... do you know echelon at all?;-)
isn't it better for vendors to supply source code & md5 hash? yes, every tarbal of linux is signed so far. as well as some drivers not included in kernel (yet) and distributed on vendor homepages... just the md5 & source :)
I' sure it's a major bug, but I unable to reproduce it and even when it happens, I'm unable to get anything usefull out of it. I mentoined netconsole being silent, but if you know about something hand it to me :).
:). the trace below just tells me that this happend just before crash (remote nagios found machine dead few seconds after this happend). all I know is that it happen during a write to pipe, and that postmaster was running process while something like "spin_lock(); schedule();" or "preempt_disable(); schedule();" was called, causing "note: postmaster[1138] exited with preempt_count 1". this makes me sure its rather bug in kernel, which is trigerred by postgresql. user-space process should never be able to kill kernel, but mine postmaster does - but this is the only evidence I have and I guess it's not enough for either kernel or pgsql developers.
the only report I ever found in logs is pasted bellow, even it is not the best place for it; I don't want to report incomplette info to busy developers, but you seem interrested
Aug 8 22:56:17 travel kernel: Unable to handle kernel paging request at virtual address 20646973
Aug 8 22:56:17 travel kernel: printing eip:
Aug 8 22:56:17 travel kernel: c0119d3a
Aug 8 22:56:17 travel kernel: *pde = 00000000
Aug 8 22:56:17 travel kernel: Oops: 0000 [#1]
Aug 8 22:56:17 travel kernel: PREEMPT SMP
Aug 8 22:56:17 travel kernel: Modules linked in: iptable_filter ip_tables ipv6 genrtc dm_mod capability commoncap 8139too mii psmouse ide_cd cdrom ext3 jbd mbcache ide_generic piix ide_disk ide_core raid1 md unix font vesafb cfbcopyarea cfbimgblt cfbfillrect
Aug 8 22:56:17 travel kernel: CPU: 0
Aug 8 22:56:17 travel kernel: EIP: 0060:[task_rq_lock+42/144] Not tainted
Aug 8 22:56:17 travel kernel: EFLAGS: 00010086 (2.6.8-2-686-smp)
Aug 8 22:56:17 travel kernel: EIP is at task_rq_lock+0x2a/0x90
Aug 8 22:56:17 travel kernel: eax: 20646963 ebx: c038bd40 ecx: 00000001 edx: f6f74e70
Aug 8 22:56:17 travel kernel: esi: c038bd40 edi: f6fe8000 ebp: f6fe9e74 esp: f6fe9e64
Aug 8 22:56:17 travel kernel: ds: 007b es: 007b ss: 0068
Aug 8 22:56:17 travel kernel: Process postmaster (pid: 1138, threadinfo=f6fe8000 task=f6830b70)
Aug 8 22:56:17 travel kernel: Stack: c11273a0 00000000 f7a3c204 00000001 f6fe9ebc c011a482 f6f74e70 f6fe9eac
Aug 8 22:56:17 travel kernel: 000000d0 f6ad4c80 00000001 f6830b70 00000010 c02def80 00000000 00000000
Aug 8 22:56:17 travel kernel: 00000040 000000d0 00000082 00000000 f7a3c204 00000001 f6fe9ee4 c011c561
Aug 8 22:56:17 travel kernel: Call Trace:
Aug 8 22:56:17 travel kernel: [try_to_wake_up+34/704] try_to_wake_up+0x22/0x2c0
Aug 8 22:56:17 travel kernel: [__wake_up_common+65/112] __wake_up_common+0x41/0x70
Aug 8 22:56:17 travel kernel: [__wake_up+68/128] __wake_up+0x44/0x80
Aug 8 22:56:17 travel kernel: [pipe_writev+657/800] pipe_writev+0x291/0x320
Aug 8 22:56:17 travel kernel: [pipe_write+56/64] pipe_write+0x38/0x40
Aug 8 22:56:17 travel kernel: [vfs_write+237/352] vfs_write+0xed/0x160
Aug 8 22:56:17 travel kernel: [sys_write+81/128] sys_write+0x51/0x80
Aug 8 22:56:17 travel kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Aug 8 22:56:17 travel kernel: Code: 8b 40 10 8b 0c 85 20 00 39 c0 ff 47 14 01 cb 31 c0 86 03 84
Aug 8 22:56:17 travel kernel: note: postmaster[1138] exited with preempt_count 1
Aug 8 22:56:17 travel kernel: bad: scheduling while atomic!
Aug 8 22:56:17 travel kernel: [schedule+2191/2208] schedule+0x88f/0x8a0
Aug 8 22:56:17 travel kernel: [free_pages_and_swap_cache+113/160] free_pages_and_swap_cache+0x71/0xa0
Aug 8 22:56:17 travel kernel: [unmap_vmas+482/576] unmap_vmas+0x1e2/0x240
Aug 8 22:56:17 travel kernel: [exit_mmap+177/464] exit_mmap+0xb1/0x1d0
Aug 8 22:56:17 travel kernel: [mmput+109/160] mmput+0x6d/0xa0
Aug 8 22:56:17 travel kernel: [do_exit+418/1328] do_exit+0x1a2/0x530
Aug 8 22:56:17 travel kernel:
to make it more clear, disk crash with pg tables on it has nothing to do with current pgsql installs which bring servers down.. I was repairing some pg tables before and I simply know its a pain, but it might get better since then.
as for sw missuse, read plz thread started by my post (4msgs as of now), I have responded to a coward in a way I usually don't, but I think I haven't missused any sw, including pgsql.
as of memory hog.. current info from what top says there are 4 postmasters, 90meg total, serving (du says) 98mb of databases, while there are two mysqls 40mb total, serving >500mb dbs, and with 4times more queries per minute (31555.46 q/m, pg has ~7300). I think it got better in 5 years but I remember pg ranting about insufficient memory on 256mb ram machine, which was almost "the cream of white boxes" when I was deciding between pg/mysql. so it was memory hog for me...
man, I have 25 yrs working behind my back, I use linux since '93, the 0.9x versions FYI, and I use linux on my desktop computer since '94. when I was standing before the choice, mysql or pgsql, I was quite familiar the environment I'm going to use them on, I was just not familiar with these sql servers.
and no, I didn't changed distros randomly, slack was used by former admin, I have _reinstalled_ to debian woody, and new iron was reinstalled to sarge. and I know pretty well how to compile kernel, some of my patched were accepted in the past and even I'm not active in kernel development about 5 years now, I still know what processor/mainboard change means.
as of db admin.. I regulary take care about 2.5 mil record database (small), but thats one of biggest databases in my small, ~5mil central european country.
and I am pretty sure about what I writte above, so I don't have to post anonymously.
have a nice day.
few years back (~1998/9?) I was asked to replace now-dead-sql with mysql or pgsql and I ended up with mysql, because it was much better suited for _simple_web_applications_. I didn't need transactions and needed to support less expensive white boxes, where memory hog like pg simply would not fit or would not perform well. until now, I have never been told it was bad choice by anyone using the servers, custommers, partners etc.
before suggesting other dbs one should carefully pick replacement. I wouldn't tell the following if I had no experience with pg. I don't want to turn this into mysql/pgsql flame: my story might not be easy to hear for pg-lovers, feel warned.
few moths ago, I got a job... find out why simple LAPP (linux-apache-pgsql-postfix:) fails randomly after few days. I have changed distribution from slack to woody. I have changed hardware once, then even migrated from intel to amd. moved to sarge. checked about 5 different kernel versions and zillion configuration options when compiling kernel and I did used "vendor" stable kernels as well. all machines are dying randomly, record is 19 days. well over 25 LAMP servers I give much less attention than to LAPPs work smoothly with 500+ days uptime, whether you take system or mysql server uptime. I migrated the pgsql to one stable LAMP server (running about one year at the moment) and it went down in 2 days. btw the went-down means that machine becomes simply unresponsive and even netconsole.ko is silent!! (printk over udp, worth lokking at).
have you ever repaired pg databases after disk crash? and.. how about pgdump between different versions...? and how do you like pg docs? lol, don't ever recommend me pgsql.
I don't need more featurefull. I need stable server. I'm willing to code something better, use lock-free structures, and I am willing to write more sql queries to implement some "oneliners-in-pgsql". and I build-in some regular data integrity checks into my apps. but I'm not going to use/rely on constantly failing triggers, currupted tables and killed servers. thank god that mysql is gpl. and no, I don't care if sco prays to mickos to port mysql back to sco again.
remember that after seeing a line from ms code you'll cannot ever write a line of code on your own (did you signed shared source nda, didn't you???)
I can tell you: open source is big threat to free source. if I code tomorrow 10 lines 90% similar to what someone three days ago committed into an open source program protected by some see-only-don't-ever-try-to-use copyright/license, my free source sw is in trouble and sometimes an SCO-like can sue me because I had to see their code and copy it.
Quote: "Most IT professionals don't want to be in the business of maintaining system-level software."
Okay, release less service packs.
spending money on fighting malware, cleaning it up or making it illegal won't ever work. ever.
the only thing wich works is spending money on software quality, not from look & feel or user interface or features point of view, but from software design & security view. this is so simple that not much people realise this. imagine: if it is unable to install malware, virus or adware because software is foolproof, and secure by design (!), there would be no malware at all. you don't need laws making it illegal or any antivirus software.
certainly, programmers aren't gods and sometimes an error or buffer overflow or something other makes its way into code. if business policy is to fix it next-business-day or even asap or.., it can be easy because good software can have a typo at most, forgot endless[len] = 0, or... but these can be solved by recompiling & patching. design flaws cannot be fixed fast or completely. windows itself _is_ full of design flaws, what about apps? this is not a flame, if you coded for windows, you know.
if security holes are regulary closed by vendors in reasonable terms, all those who care are fine. those who don't? in my world, there are none. we all use apt-get update && apt-get -y upgrade in daily crontab. this is possible on windows as well - but why didn't ms invested in such system? because development of hidden system features or even rewriting some base parts of code is hard to sell. they need nice looking apps. feature-rich. like html email. what will be next?
if malware or viruses have no easy way of spreading because software is secure, and regulary updated, there would be no malware. and be sure there are secure software systems: imagine life support machines in hospitals. or others. why they are not used on desktop? because tehre are used systems like qnx or plan9 or tens others, which are too difficult for the joe average. but ms programers aren't joe average... are they? so why they don't design & code correctly then?
ms had/has effective marketing people, they sell lighter to lucifer. they managed to get 90%+ uniform environment. we all know that uniform environment is relatively stable and very good in self-developing and productivity. but is vulnerable. ability to damage one node of uniform environment is ability to destroy the 90%+. this is beautifule environment for malware. imagine boris veryclever and his ability destroy 90% of the world.. but if ms only had 20% and remaining computers were spread between 5-10 more _incompatible_ vendors, boris would be able to destroy only 20% or less. if he would be willing to start code for the 20% at all... who cares of 20%...
...do you know an admin with holiday? :)
> that this blocks their constitutional right to run an infested box on the Internet
it is my constitutional right to block him, isn't it?
if male/female terms would ever be made (or considered so) terms offending a person's sex, how would underwear businesses promote men or women underwear? or will they be forced to create some kind of uni-sex underwear?
not true. I do know five others who do not get paid for their work. this is not a flamebait, its true. paid/not-paid is about 1:30.
no, they won't make their programs better. they do supply windows-based systems into mission-critical industries, and it is the same (and I know what I'm talking about).
mr. Dell (I think) said something like that if MS would supply OS for cars, it would certainly pop-up confirmation box about releasing an airbag. no confirmation on time, no airbag.
I wont certainly be first who takes words of Big Bill (ironicaly) into my mouth...: Nobody will ever need more than 640k RAM. (Curious, see the numbers..?)
One of the major conditions for licence of building a celular-phone in a country where I live, was ability to eavesdrop phone calls and archive them for two weeks. Selected phones on demand (e.g. by police, ingeligence agency, ...) had to be eavesdropped for an unlimited period of time.
I do not remember exact resulting number, but units were terabytes. Before you start your brains up, few numbers... 1.2 milions celular phones (5.5 mil. residents total - small country :-)). ...and now guess the numbers for U.S. ...hey... do you know echelon at all? ;-)
Cpt. Wheel