Slashdot Mirror


Increasing the Transfer Rate?

Nintendork asks: "I recently started a new job as a resident computer geek and am analyzing the performance of our SQL server. I did quite a bit of research and would like an opinion from the Slashdot community on my proposed solution for increasing the STR (Sustained Transfer Rate) from the server to the workstations. The server (Compaq ProLiant ML530) has 16 10,000 RPM drives with an average STR of ~43MB/sec. per drive. 14 are used for two RAID 5 logical drives (7 physical drives per logical). The remaining 2 drives are backup drives in case one fails. Currently, they're all connected to a Compaq fibre RA4000 adapter. It runs at 100MB/sec. from what I could find in a jungle of fibre information. Reasoning tells me I have a huge bottleneck at the fibre adapter and the 100baseT NIC. I should also mention that the server has 2 PCI buses. One runs at 64-bit and 66Mhz and has 2 PCI slots. My proposed setup would be to back up all the data and create a new array with a few hardware modifications. Take out the fibre adapter and use two, dual channel 64-bit 66Mhz ultra160 adapters on the two 64-bit 66Mhz PCI slots (4 drives per channel). Take out the 100baseT NIC and start a gigabit backbone." Would this significantly increase performance? Read on, if you to check out the numbers on the new setup.

"From what I've learned thus far, the proposed setup would be a blazingly fast file server approaching ludicrous speed. Let me break it down. Data can be read from the drives at a STR of ~602MB/sec. (~43MB/sec. * 14 drives). Each Ultra160 channel has a STR of 132MB/sec. This provides a bearable bottleneck that reduces the overall STR to ~528MB/sec. (132MB/sec. * 4 channels). The 64-bit 66Mhz PCI bus has a STR of 528MB/sec., which is an exact match for the 4 ultra160 channels! From there, I assume the data goes out the NIC, which is on a gigabit backbone. This would provide a STR of ~528MB/sec. to the workstations. Unless I'm missing something such as a possible bottleneck between the PCI bus and the NIC, my reasoning makes gosh darned perfect sense!

Thanks in advance for any insight you all can provide on this issue."

4 of 34 comments (clear)

  1. Sounds like your NIC is the bottleneck by swillden · · Score: 4, Insightful

    It may be too early in the morning, and I'm just not thinking straight yet, but you calculate you'll have an STR of 538MB/s through the PCI bus which you're trying to shunt the data onto a 1000Mb/s network (notice the small b). 1000Mb/s is something less than 125MB/s, and that's complete saturation which Ethernet doesn't handle well. I'd be surprised if you get better than 100MB/s over that network. Sounds nice to me, but then I do fine with an 11Mb wireless network myself :-)

    You also called the box running this stuff a SQL server. Relational databases are very, very slow. I don't know what's under the hood of a ProLiant ML530 (and I'm too lazy to look it up) but I doubt it has the nads to process 528MB/s of SQL queries.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  2. database? by Anonymous Coward · · Score: 1, Insightful

    Great; you can transfer the data fast. Assuming you have any more data to transfer.

    Are the queries optimized for the database? the right indexes in use, etc? With bad index choices you could have any disk array you want, you can have any network configuration you want but it could still be slow.

    a machine running a database isn't a fileserver (and shouldn't be). Does it have enough memory to do it's work? Speeding up the disks is great; but you have to analys what the machine is actual doing with the data it is reading.

    1. Re:database? by Bryan+Andersen · · Score: 4, Insightful

      Yep.

      Look and find the real bottleneck. Check which queries are being done all the time. Are there indexes for them?

      I've run into querey nightmares quite often. One DB I helped tune kindof had an index the DB could use to narrow down the search, but it still left over a thousand records to search through to find the needed one. They we calling this query a few times a minute durring busisness hours. A simple index addition and they went from 100% saturation on the disk IO system to less than 1%. Sounds extreme, but it isn't. Bad indexing can be a real killer. So can poor selection criteria in a query. Both were at fault in this case, it is just that the DB query optimizers could deal with the poor selection criteria once it had a better fitting index. Bad indexing and queries cause the DB to need to build temporary indexes whice usually only can be used for those specific query run instance. This takes disk IO, memory, and CPU time that can be better spent elsewhere. It also makes for much duplicated effort.

      Another optimization fuckup. One place decided what they needed was more RAM. Users weren't getting their queries back fast enough, and the network wasn't saturated. It turned out that they already had over three times more RAM than the total size of their DB. Net effect, one happy hardware salesman and no DB speed up. They had a CPU bottleneck. Going to an eight way system and distributing the RAM they had to the six new CPUS changed their problem over to one of network access speed. Adding three more 100Mbit NICs solved the new problem and gave them some breathing room. Disk IO still wasn't and issue, though write speed was looking like it would be the next bottleneck. They went from about 12% of max write throughput to close to 65% durring only a few nightly batch jobs. Even then it would only be an issue if they weren't done by morning. The rest of the time they spent near .1% disk utilization. Reads, what reads. After the hole DB got cached into RAM the system only wrote to the disks. Consider that when you are evaluating benchmarks.

      Most databases I see hare horrors when it comes to indexing and physical database layout. These are the ones put together by profesionals. The ones put together by amatures make the profesionals look like saints. If you put the before image, after image and data files on the same disk subsystem they all fight for IO time, and you loose speed. You'll be about half as fast. The before image and after image files can overlap their writes, but the data file can't have writes done to it until the before image data is written to disk. The after image data has to wait till the data segment writes have completed. Splitting them out onto their own disks allows for a significant increase in speed. The before image writes for one query can now overlap the data writes of the previously processed query. The after image writes can also be overlapped. Also get additional disk subsystems each for your log files and index files. Spread the IO out. This also provides better data integrity. If your data segment's disk subsystem quits, you can rebuild the DB up to the last commit from a backup and the after image file.

  3. Other considerations by Twylite · · Score: 3, Insightful

    First, you need to consider that because of data locality, you are never going to reach the STR you are pondering. Queries would have to magically occur in such a way that the disk load was perfectly balanced between all drives. Your worst case STR is therefore 43Mb/sec, ASSUMING you benchmarked the STR on random access rather than sequential access.

    Second, as many others have said, use RAID0+1 ; RAID5 has overheads that (can) involve other drives in the chain.

    Third, your Gb ethernet is gigaBIT. That provides for a maximum throughput of 125Mbyte/sec on a switched network. To improve on this you could use multiple Gb NICs ... but they are on the PCI bus, along with the RAID adapters. This doesn't necessarily halve performance (it could in the worst case scenario), but the degradation depends on the size of the request/response versus the data that must be retrieved and processed.

    Forth, you have a couple of more obvious software overheads. Issuing a read or write takes time. Software has to interpret the query, formulate the strategy to come up with a solution, and make read requests. Those requests are processed by file system handlers, which translate into raw disk operations. This means your file system and database software are adding a lot of overhead and latency that will reduce the STR.

    Basically you are wanting a heck of a lot of memory, and a vast amount of processing power to keep up with the potential of the hard drives.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net