Slashdot Mirror


User: tweek

tweek's activity in the archive.

Stories
0
Comments
1,183
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 1,183

  1. Re:The "windows way": problem w/ study, or realist on A Continued Look at Linux vs Windows · · Score: 3, Insightful

    Well in our case, we have a full fledged QA environment that mirrors our production environment except for the number of app servers. It's even hosted in our datacenter to mimic connectivity.

    We even restore a copy of our production database before each major release to the QA box.

    Interestingly enough, we do the same thing for our few Windows servers (Navision for instance. Just did an upgrade over the weekend).

    I can't understand who would apply patches to a live system without a qa run first. The other thing that bugs me is that they didn't use the same application stack across the board. A better test would have been something like WebSphere or tomcat talking to a DB2 or Oracle database. Those products would have been better tests.

    The other thing that bugs me is that they did a major OS upgrade for some vendor binary. Would the same vendor binary have required a 200 to 2003 upgrade?

  2. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    That's the point. We needed to return the space back to the filesystem. At this point I've added another 4 73.4GB spindles to the array so I've got some room to grow now. At the time, I needed to reclaim the dead tuple space. Of course on this new server, I'm allocating even more spindles and space. I've also got a ton more memory and horsepower to work with.

  3. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    That's why I wish I could help. I'm not really looking for us to replace DB2 anyway. I'm just wanting to make PostgreSQL perform the best I can. I also really want it to be the best database it can be. I really spend most of my time dealing with the warehouse when it comes to ETL or BI. To its credit, Actuate has been VERY supportive of us using PostgreSQL as our warehouse. When we had ODBC related problems (mostly with DataDirect), Actuate made application level changes for us to work around the bug. I just needed to be able to translate it for them. Informatica with all its warts worked with us as well. All of this in spite of the fact that neither mysql or pgsql was a supported database.

    I think those industries are understanding that people aren't going to shell out for a DB2 or an Oracle just to run OLAP. I would argue in most of those cases that they are overkill. I also think pgsql gives the most clean skill translation from those other databases than mysql does.

  4. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    The full was simply because of the amount of disk space we were reclaiming each night. I mentioned only the two tables but there are probably 4 or 5 fact tables that are close to that size. Disk space is cheap but I don't want to get caught with my pants down in the middle of the night.

    This is also the interesting thing about the opensource database world. In that environment, one of an opensource database, many times your sysadmin will know just as much about the database as the dba. Many linux sysadmins have also used and managed various mysql or postgresql databases on the side because they could. Maybe it was for a weblog of some sort or maybe it was because they wanted to learn something.

    Many things about our data warehouse I know nothing about. My job is to manage the systems and the networks but sysadmin positions are rarely constrained in that regard. The first person that people come to is the SA. The SA needs to know enough about what is ON his systems to say "Go to the DBA. He needs an index because I'm seeing so much I/O he must be doing a table scan or something" or "Go to the developer because the database isn't using any system resources right now and I've checked the basic of the database and there is no activity.". Many times the DBA will say that the developer needs better predicates. In our case, most often, we have bad SQL but I can't ignore the fact that I'm seeing "X" in the system and maybe I need to work in a bigger maintenance window so for the database.

    Either way, we've run into issues where postgresql has had to build a 10GB temp table to get a result set as part of our warehouse load and it blew all of my capacity planning out of the water. We could have a busy Friday (Our business is retail financial) and could have 30k rows on each of 4 or 5 tables updated and we might end up having disk issues.

    With the new hardware and my latest benchmarks, we can do a VACUUM ANALYZE each night and I've overallocated space and spindles so we could put off the FULL to a weekly.

  5. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    It's situations like this that I wish I could contribute beyond making vague suggestions. Here's hoping a postgres developer gets wind of this thread and can tell me if these were things that weren't thought of or if there are valid reasons for not having these various features. I consider (and my opinion is worth f-all in most situations ;>) these to be enterprise level features. I don't expect them to change the method of data caching but I think the backup idea has some merit.

  6. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    We've actually considered doing full vac. analyze during the day after our automated reporting has finished generating in the mornings. This would really help us out because there's not a lot of ad-hoc queries to our warehouse. We really limit who has access to that because it takes one stupid finance person with a GUI query tool to select every row from one of our largest tables to bring things to a dead standstill.

    Here's a couple of row counts from some of our bigger tables:
    financial_detail_info: 64906276
    loan_account_agg_fact: 48078504

    Also every single active account in our system gets updated every night because every single active account will accrue interest. This means several of these large tables have a large portion of the rows updated.

  7. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    Actually I didn't mean for this to be a jab at PostgreSQL from a PostgreSQL sucks and Oracle/DB2 rocks perspective.

    It's all a matter of thinking. One of our new DB2 dbas seems to be dead set against any opensource database. Based on his comments, I understand where he's coming from even if it's illogical.

    In conversations we've been having today, it boiled down to a matter of opinion on the development side. Here's a couple of examples:

    Binary backup vs. SQL backup - From a migration and upgrade standpoint, doing a sql level backup is much more portable. I don't have to do a UNLOAD/LOAD to move from platform to platform or major version to major version. This is what we're doing now.

    But from a daily backup/restore perspective, a binary, block-level backup would win hands down. It's just not as portable. We recently migrated from DB2 on Linux/x86 to DB2 on AIX/Power. We had to go through a very detailed and high-risk process to migrate the database. In all of our trial runs, there were cases where the DDL was missing an index or something similar because of a bug with db2look or the order of object creation. In the case of the migration, a SQL level backup like mysql or pgsql uses would have saved us a TON of time and headache.

    I can't tell someone they have to wait 8 hours if we have a botched warehouse load and have to restore from backup. In those cases, a binary backup would be MUCH faster. In the end a choice of engines would be prefered. Our daily backups can be binary but in the case of this new hardware and version upgrade, we could go with the SQL backup.

    Another area that's reminds me of monolithic kernel debates, is that of memory management and data caching. DB2 and Oracle (Oracle I think as well) use internal memory segments called bufferpools to cache data for future reuse. Up until recently, I would have said that was a stupid idea because, unless you're using raw volumes and bypassing a file system cache, you're taking a double hit on memory. With DB2 8.2 (stinger), you can create tablespaces with a NO FILE SYSTEM CACHING. This avoids that double penalty.

    With postgres, it relies VERY heavily on the OS to cache the data on the filesystem. This is in some cases a poor idea because the OS will cache the blocks read and not the actual row data. In the bufferpool scenario, the database will cache the actual read rows instead of the chunk of disk that held the data. You have less useless data sitting in memory. It's just a bit more intelligent.

    During the above mentioned 8-hour restore, all 16GB of memory was in use and I have a very sneaky suspicion that the majority of that data would never be used again. When you have an OS like linux or AIX that is VERY aggressive at file system caching, this could cause problems.

    I think every database on the market has a place. That includes SQL Server. The only point I was trying to make is that some design decisions on pgsql or mysql, really default it to a different market segment.

  8. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    Actually all of these things are done by the pg_restore. It uses COPY and it also creates constraints and indices AFTER the data is loaded. This database is huge and I don't expect it to load in 20 minutes. The joy of it being a warehouse is that we CAN reload it from scratch as long as we haven't purged any data from our production system.

    The only thing I'm going to do next is to disable fsync during the process and make note in our process documentation that it should be done that way for future restores. It'll be interesting to see the results there. I'm not worried about power failure or anything of that nature. This box is fully redundant and the SAN it's attached to is as well. IF we loose power to both circuits on the SAN or the box, then IBM has some explaining to do at the datacenter level.

  9. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 2, Interesting

    Actually that WAS done. The settings are really well defined for a datawarehouse environment.

    Our biggest fact table has 48M rows if I'm not mistaken. It might actually be larger than that. As a side note about 1/3rd of that table gets updated every night as part of our warehouse load. Vacuums are a killer for us.

    One thing I did read is that you could disable fsync for the restore process. We may just make that a normal documented task anyway.

    On yet another note, since we're moving to new hardware, one thing we're doing (which is why we're restoring) is moving to 8.1. Greenplum has contributed some AMAZING changes back to postgres, not the least of which is the table partitioning. Try as you might, there are times when your optimizer will do a table scan no matter what. You simply can't outthink it. And most of the times, its been right. We ran some EXPLAINs on some of our reporting queries with table scans disabled and they WERE slower doing an index scan. With the table partitioning feature, we can break our tables out into smaller chunks without much extra effort.

    Example:
    Our loan account table could be broken out into loan_account with child tables of active, inactive, bankrupt, and whatever other WHERE criteria we want. At that point, we can actually have MUCH fewer rows to process if we just want bankrupt accounts. For DB2 people, the new PostgreSQL table partitioning is alot like an MQT. At least from the way I see it.

    I'm also very happy that autovacuum made it into the mainline. DB2 has an autorunstats and autoreorg so this is something we're very interested in. The old autovacuum didn't really work as well.

  10. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 1

    I'm glad someone does. This was a situation where a former Great Plains/Solomon consultant who had being doing side work for the company for years sold our non-tech savvy CFO on Navision.

    The problem is this:

    While we qualify for a midsize company in terms of staff, our revenue and accounting volume ranks in the Fortune 100 range. Navision was a shitty choice from the start.

  11. Re:Goodbye to Oracle ? on Sun Announces Support for PostgreSQL · · Score: 5, Informative

    I don't think opensource databases are becoming any more of a threat than they were in the past. They really do cater to a different market. This is WHY you see SQL Express and the new Oracle license.

    Here's the deal. The company where I'm the SysAdmin has 3 databases we support - DB2 (Linux and AIX), SQL Server (financial product decision made outside of our department without our consultation) and PostgreSQL.

    DB2 runs our core database for our enterprise application. All databases were investigated at the onset of this project and DB2 came out on top. SQL Server is in house for a shitty financial package (Navision) and another legacy system. PostgreSQL is our data warehouse.

    Because of some issues surrounding our DBA team and the fact that SysAdmins often have to cameo as DBAs in a quick pinch, I've come to learn quite a bit about DB2. It has its warts and bugs but it's 100 times more robust than PostgreSQL and 1000 times more robust than MySQL (which we use for a few self-managed databases here and there - intranet stuff/nagios).

    We're currently migrating our data warehouse to a new hardware set and at the same time upgrading from 8.0.3 to 8.1 of PostgreSQL. This requires a restore of the database to migrate. This 80GB datawarehouse took the better part of a day to restore on a box that was 10 times faster than the original. Reading from different volumes on different controllers on our SAN on an x445 with 8 CPUs and 16GB of memory took 8 hours to restore!

    This box used to run DB2 on Linux (we just migrated to AIX and a new SAN) and could restore a 100GB production database in 45 minutes.

    The box wasn't being used. I/O wait was at 1% the entire time. Each of the 8 CPUs was 90% idle the entire time. Of course memory was maxed out because PostgreSQL uses the OS to cache for it but we weren't using any swap. This was using the native PostgreSQL compressed backup format.

    Oddly enough for PostgreSQL, I had less insight into what the database was doing during that time than I would have with DB2.

    In DB2 I can make memory changes on the fly - db cfg, dbm cfg and speed this process up. I can use db2mtrk to see what my memory is doing. I have things like bufferpools to allocate memory where it's really needed.

    With postgresql, I can change a text file (which I love) but have to restart postgres for a lot of them to take effect. Some db2 changes require an instance restart as well but not many anymore.

    Some of the problem lay with me and I'll admit that but some also lay with PostgreSQL.

    The whole point is that DB2 and Oracle don't normally go after the same market as MySQL and PostgreSQL. Are there companies using those databases in place of DB2 or Oracle? Sure. And I'm sure they're very happy and have a nice humming system. Our warehouse runs wonderfully on PostgreSQL and there are no complaints but more often than not, the markets simply don't intersect.

  12. Re:It may be quicker to implement. . . on Microsoft Claims Firms 'Hitting a Wall' With Linux · · Score: 1

    I like most of what you're saying but I have to disagree with the IDE comment. I've been a Zend Studio user for years (since the betas) and even though I'm moving more and more into the RoR camp, I wouldn't think of using anything other than Zend Studio for my development of php. When you combine it with Zend Studio Server, you've got an awesome environment.

    I would agree that time-to-market as it were is much quicker under Windows when using the Microsoft dev tools and a total Microsoft environment but the long term cost is much greater for the reasons you mentioned above. The cost is also greater in terms of licensing and the ability to scale n-way.

  13. Re:Changing the default to GNOME *is* the indicati on Slashback: KDE, Tsunami Hacker, and Image Bugs · · Score: 1

    As nice as the free argument used to be, it's not really the case startings with QT4. QT, as of version 4, is now opensource on all 3 major platforms. Trolltech has shifted to the embedded market for most of its revenue.

    I personally prefer GTK for now other reason than asthetics. Even with theming, something about QT has always seemed off to me.

  14. Re:Who does actually host the alt.binary.* groups? on GUBA makes Usenet search easy as Google · · Score: 1

    Commerical news servers. Pure and simple. I have an account that has different servers with different retention time for binaries servers vs. text only. I can download up to 1GB a day on each server.

  15. Re:I'd rather see Linux VServer included on Red Hat Wants Xen In Linux Kernel · · Score: 1

    You have a misunderstanding of what Xen really is. Vserver is NOT the same thing. Vservers are nothing more than a fancy chroot and some other minor things. Mind you, I am NOT mocking or disparaging the project in anyway. There is a place for it but it does NOT create the same security domain that a Xen does. VServer is paravirtualization.

    Xen isn't TRYING to replace the vserver market. It's trying to compete with the market of an IBM pSeries machine.

    Until Xen came along, the only thing you had in the Intel market was VMWare ESX server or maybe the HP Superdome (not up to snuff on my Superdome)

    Check these links for what Xen is trying to match:
    http://en.wikipedia.org/wiki/Hypervisor
    http://en.wikipedia.org/wiki/IBM_p5

    Let me say this though, I agree that this is the outcome of large commercial interests in Linux (which I don't object to mind you). RedHat wants to beat Microsoft to market with a good virtualization product. Thus they are going to push hard to get this into the kernel so it's much easier to support (i.e. no backports or extra work involved beyond making it RHEL ready).

  16. Main reason on Why Do You Block Ads? · · Score: 1

    The main reason I started blocking ads is because they slowed down the loading of the website. I don't typically block unobtrusive ads that are hosted by the same site as I'm visiting.

    I, did, however go out of my way to find out who was hosting and block that stupid fucking intellitext web stuff. Every article I've ever read that uses that was dog-ass slow in loading and froze my browser while it waited to load this huge ass remote javascript.

    One thing I don't block? Google ads. They're unobtrusive and are typically pretty relevent*

    *Except in cases where nextag want's to help me find the lowest prices on "linux kernel driver"

  17. Re:no sql? on TurboGears: Python on Rails? · · Score: 1

    Finally someone after my own heart.

  18. Re:MySQL != SQL on MySQL Moves to Prime Time · · Score: 1

    True. We actually do that in some places. We do all operations on the database as dirty reads except in several places where we simply grab a connection from Hibernate and hit the database directly. Saves us the hassle of maintaining two connection pools to the database just to get different isolation levels.

    I get the feeling alot of times that many ORMs lull the developer into a false sense of security. I can't count how many times our DBAs have heard "This part of the application is performing slow" only to get statement back from the developer like "I don't know what the query is because Hibernate is generating it" when they ask for a copy of the query to run EXPLAIN against.

    It seems almost counter-intutive to development when the DBA team has to go back and create a view or an sproc just to make it perform.

    Another gripe I have with Hibernate is the mappings. My understanding is that many of the mappings have to be manually maintained or that hibernate just has problems with some data types? One thing I've grown to love about ActiveRecord is how it handles its mappings. Then again I've not thrown our database at it yet ;)

  19. Re:MySQL != SQL on MySQL Moves to Prime Time · · Score: 2, Informative

    As someone who works for a company that uses Hibernate pretty heavily, ORM is not the pancea that everyone claims. I like ORM as much as the next guy but in an effort to write generic SQL, your ORM will usually use a pretty inefficient route.

    Hibernate and ActiveRecord don't run EXPLAIN plans on queries. If you've ever looked at some of the SQL generated by hibernate, it can make you cringe. We created indices to match what hibernate was using only to move the logic into an sproc to get the performance we required.

    ORMs are nice from the developer side of things but can be a bitch from the DBA side of the house.

  20. Re:Just go with Big Iron on Clustering vs. Fault-Tolerant Servers · · Score: 1

    But that only addresses hardware. What about the application layer? What about your database or whatever else you're running on that mainframe? Sure you can partition out the fault-tolerant hardware of the mainframe in to N+1 partitions but does your application support it?

  21. Re:Google as an example on Clustering vs. Fault-Tolerant Servers · · Score: 1

    "Shared-nothing clusters simply require high-speed interconnects for transactional applications. Data changes must pushed everywhere, immediately, before the transaction commits. I don't see how you get around that."

    It depends on the implementation. DB2 HADR has three modes - active sync, passive sync and near sync. It all depends on the SLA you have with your enterprise.

    In our case, we're considering a near sync implementation. We recently had a major outage that was not helped by any level of HA implementation we were currently running. We had DB2 write a bad transaction (don't ask me how - they say because of memory corruption in the instance) and thus couldn't even bring the instance online. We couldn't failover because our data was corrupt. In a realtime sync between two shared-nothing nodes, this corruption would have spread to our other server as the transactions got replicated.

    With the near sync option, we have a tunable parameter (20 minutes was all we needed) that says always stay this far behind the primary node. That way we could have had business continuity. In our case, we had to restore from backup and reenter 6 hours of financial transactions. We couldn't determine where the corrupt transaction occured so we had to go to our last good full backup. We couldn't even roll-forward.

    No amount of HA/Fault-tolerance can fix that problem.

  22. Re:Cluster Fault Tolerant Humans! on Clustering vs. Fault-Tolerant Servers · · Score: 1

    On line item one:

    We actually just went through this same thing but it was an actual employee of the datacenter and not some kid.

    It seems that a water sensor was malfunctioning and reporting water under the floor. The onsite tech didn't follow procedure and instead of checking first, he hit the big red button.

    Boom, datacenter down for 4 hours while they figured out what was going on. And mind you this was an AT&T Global hosting center and POP here in Atlanta.

    Now the next 3 days has the following going on at that datacenter where we're a customer:

    1 - realize that a change control process wasn't in place well enough to document changes that had been made to various switches and routers. Some body forgot a "write mem" here and it wasn't documented.

    2 - realize that some equipment can occasionally fail after having not been power cycled for a year or two.

    3 - some servers after having uptimes of a year decide that a filesystem check is in order and, oh look, I just lost two drives because no one bothered to check that small server in the back somewhere that was handling DNS as a quick solution.

    4 - realize that something have to be brought up in a specific order and that the last round of new equipment wasn't added to the checklist.

    I'm sure number 3 didn't happend but I've seen it in other places. I can say for a fact that the extended outage had some traces of 1 and 2 in there.

  23. Re:Well, let's see on Clustering vs. Fault-Tolerant Servers · · Score: 1

    And what does that get you? If you're wanting to cluster/load-balance a web server or an app server, you're fine.

    But what about a database? Does mysql support N-way replication? Postgres? I think you can do it with Oracle RAC?

    Again, I big cabinet full of servers is nice if you're applying it to the right application.

  24. Re:Horses 4 Courses - They are NOT mutually exclus on Clustering vs. Fault-Tolerant Servers · · Score: 1

    Geographic Load balancing is a bitch. Especially if you don't have dark fiber between datacenters/facilities.

    We're just now starting to investigate HAGEO on AIX, geographic mirroring on our SAN and Q-Series replication for DB2. I know it's important but it really adds to the complexity of the environment ten fold.

  25. It really is two different subjects.... on Clustering vs. Fault-Tolerant Servers · · Score: 1

    or maybe it's just semantics.

    I see Clustering as:
    a group of machines sharing workload - i.e. a cluster of app or webservers.

    We have a cluster of Websphere servers handling our appserver load behind a CoyotePoint Load Balancers.

    The Coyotepoint LBs operate in an HA/FT mode. If one goes down, the other picks up right off the bat including existing state (which client was "stuck" to which server). I call this HA.

    It all depends on the product and the vendor. We have DB2 operating in an Active/Passive HACMP cluster but the workload isn't shared. As far as licensing, we only have to have licenses for the active server (according to IBM).

    There's also the shared-nothing vs shared-everything model. We currently run a shared everything model for our database allthough DB2 has a feature inherited from Informix called HADR which is a shared nothing model. It's still active/passive but the passive box is in a state of concurrency with the active primary based on user-configurable parameters i.e. update secondary node every 60 seconds or keep secondary node updated in real-time.

    Honestly it all depends on the actual implementation. Look past the vendor cruft and marketspeak to the actual implementation.

    They may be selling you a database cluster but it might require schema changes and really can overly complicate the problem. What are you trying to acheive? Business continuity? High Availability? Distributed workload? Some products support one but not the other. Some also support all of them with differing levels of complexity.

    If you're looking for a linux-only solution, we currently running two - SteelEye's LifeKeeper on one configuration and the linux-ha stuff (with drbd) on another.

    I'd be happy to provide answers to any business related experience via email.