Slashdot Mirror


Large File Problems in Modern Unices

david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."

17 of 290 comments (clear)

  1. Not really that groundbreaking... by CoolVibe · · Score: 4, Interesting

    The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.

  2. Re:Why large files by Big+Mark · · Score: 5, Insightful

    Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.

    And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.

    -Mark

  3. Re:Why large files by Anonymous Coward · · Score: 5, Interesting

    Real analytical work can easily produce files this large. Output for analyses of structures with more than half a million elements and several million degrees of freedom can EASILY produce output of over two gigs. Yes, these results can and should be split, but sometimes it makes sense to keep them together as a matter of convenience. Plus, there IS a small performance hit when dealing with multiple files on most of the major FEA packages.

  4. Re:Why large files by hbackert · · Score: 4, Informative

    vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.

    Feels like 64kbyte segments somehow...and I really don't want to have those back.

  5. data warehouse, and any database for that matter by CrudPuppy · · Score: 5, Insightful

    my data warehouse at work is 600GB and grows at a rate of 4GB per day.

    the production database that drives the sites is like 100GB

    welcome to last week. 2GB is tiny.

    --
    A year spent in artificial intelligence is enough to make one believe in God.
  6. It will happen with time_t, too by wowbagger · · Score: 5, Informative

    We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)

    However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.

    I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."

    Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.

    Let's just get it over with.

  7. A woman's perspective . . . by pariahdecss · · Score: 5, Funny

    So my wife says to me, "Honey, do I look fat in this filesystem ?"
    I replied, "Sweetie, I married you for your trust fund not your cluster size."

  8. Re:Why large files by CoolVibe · · Score: 5, Interesting
    raw video can easily exceed 2 GB in size. Why raw video? Because (like others said) it's easier to edit. Then you encode to MPEG2, which will shrink the size somewhat (usually still bigger than 2 GB, ever dumped a DVD to disk?), so it'll be "small" enough to burn onto a DVD or somesuch. Oh, editing 3 hours of raw wave data also chews away at the disk size. Also, since you need to READ the data from the media to see if it looks nice, you need to have support for those big files as well. Right, now why don't we need files bigger than 2 GB again? Well?

    Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?

    Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?

    Right...

  9. Re:Wrong point of view. by KDan · · Score: 4, Insightful

    Two words:

    Video Editing

    Daniel

    --
    Carpe Diem
  10. Funny...in AIX... by cshuttle · · Score: 4, Informative

    We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html

  11. Have you ever seen some people's email? by alen · · Score: 4, Insightful

    On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.

    1. Re:Have you ever seen some people's email? by nentwined · · Score: 5, Funny

      I agree with MS on this one. government employees shouldn't be allowed to hold their positions for longer than a year. DOWN WITH GOVERNMENTAL CORRUPTION! ... :)

      --
      heaven
  12. Re:Wrong point of view. by heby · · Score: 5, Funny

    "oh yes, those were the days." - misty eyed smile - "when i was young and filesizes were small. you should have seen it. today's youth is so spoiled that they don't even learn assembly language any more. i tell you, you're all going to die because of your large files, yes, die!" - madly waves his cane in the air - "2gb, that's more than anybody will ever need and you are greedy for even more! the holy bit will punish you for this, it will!" - dies of a heart attack.

  13. Re:Wrong point of view. by cvande · · Score: 5, Insightful

    In a world everything is small and manageable. Unfortunately, some databases need tables BIGGER than 2gb. Even splitting that table into multiple files still finds you with files larger than two gb. Try adding more tables? OK. Now they've grown to over 2gb and the more tables the more complicated everthing gets. I still need to back these suckers up and a backup vendor that I won't name can't help me because their software wasn't large file (for Linux) ready. So let's get into the game with this and make it the default so we don't need to worry about these problems in the future. Linux IS an enterprise solution.....(my $.02)

  14. Re:Wrong point of view. by costas · · Score: 4, Insightful

    Maybe in your problem domain that's true. I work with retailer data mines and we've hit the 2GB file limit, oh, 4-5 yrs ago? We've been forced to partition databases causing maintainance issues, scalability issues, and the like, just because of the size of a B-tree index.

    True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).

    I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.

    Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.

  15. Re:Wrong point of view. by Yokaze · · Score: 4, Interesting

    I'm not a specialist on this matter, so maybe you can enlighten me, where I am wrong or misunderstood you.

    > fragmentation: large files increase to fracmentation of most file systems
    What kind of fragmentation?

    Small files lead to more internal fragmentation.
    Large files are more likely to consist of more fragments, but when splitting this data into small files, those files are fragments of the same data.

    >entropy pollution
    What kind of entropy? Are you speaking of compression algorithms?

    Compression ratios are actually better with large files than small files, because similarities between files across file-boundaries can be found. Therefor, gzip(bzip2) compresses a single large tar-file. (Simple test, try zip on many files and then zip without compression and subsequent compression on the resulting file).

    >data pollution
    How should limiting file size improve that situation? Then, people tend to store data in lot of small files. What a success. People will waste space, whether there is a file size limit or not.

    >These limits are there for very good reasons and in my opinion they are even much to big.

    Actually, they are there for historical reasons.
    And should a DB spread all its tables over thousands of files instead of having only one table in one file and mmapping this single file into memory? Should a raw video stream be fragmented into several files to circumvent a file limit?

    >[...] original K&R Unix [...] was much faster than modern systems

    Faster? In what respect?

    --
    "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
  16. Re:BeOS Filesystem by Yokaze · · Score: 4, Informative

    Mine is bigger than yours :)

    Linux XFS: 9 exabytes

    Also supports extended attributes.

    --
    "Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"