Large File Problems in Modern Unices
david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Who needs more than 512k of RAM??
The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.
Question answered, move along, nothing to see here :)
Cover your eyes and click this link!
Video + Audio >= 2GB
Databases, Movie files, Backup files (think dumps to tapes). Animations, 3D modelling.... Lots of things need a > 2GB file size. Iain
---- "I would be careful in separating your weirdness, a good quirky quantum weirdness, from the disturbed weirdnes
Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.
And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.
-Mark
Ever heard of something like movie-editing? You can get huge files really fast.
-- we're dressed in green, and we're feeling mean
Real analytical work can easily produce files this large. Output for analyses of structures with more than half a million elements and several million degrees of freedom can EASILY produce output of over two gigs. Yes, these results can and should be split, but sometimes it makes sense to keep them together as a matter of convenience. Plus, there IS a small performance hit when dealing with multiple files on most of the major FEA packages.
vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.
Feels like 64kbyte segments somehow...and I really don't want to have those back.
Come on. Even Bill Gates admitted that half a meg ain't enough.
640K, on the other hand, should be enough for anyone...
-Mark
my data warehouse at work is 600GB and grows at a rate of 4GB per day.
the production database that drives the sites is like 100GB
welcome to last week. 2GB is tiny.
A year spent in artificial intelligence is enough to make one believe in God.
I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig
I can just laugh at them now...
Liberty freedom are no1, not dicks in suits.
For when Jaron Lanier decides to update his website with 10,000,000 lines of script
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Oh I see now raw video is larger than I thought, oops
Maybe high quality audio+vidio for say...
making a movie will be larger then that.
I guess a lot of the editing would probably be done scen by scene, and then you could on the fly merge and compress them so that at no point you use more then 2gb, but it seems that if you make a 2 hour dvd it would be nice to keep the 4gb image file on your hardrive if you planned to reburn it.
Not a scattering of scenes that it would recreate the image on the fly.
It is kind of a dumb question when we have computers being marketed as home dvd makers why would be need that big of a file.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
I am not agreeing (or disagreeing) with the original post, but having a database > 2 GB has nothing to do with having a single file over 2 GB. A db != a file system (except for MySQL perhaps).
I can think of some:
And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.
Every expression is true, for a given value of 'true'
Comment removed based on user account deletion
--Bill Gates
We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)
However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.
I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."
Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.
Let's just get it over with.
www.eFax.com are spammers
So my wife says to me, "Honey, do I look fat in this filesystem ?"
I replied, "Sweetie, I married you for your trust fund not your cluster size."
Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?
Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?
Right...
Dont be the good old fox .)
quote:port 17 udp
Yes. Just like "matrices" is the plural of "matrix". Not that the words have a similar etymology - according to dictionary.com it's, in the authors' words, "A weak pun on Multics".
Switch back to Slashdot's D1 system.
Oh come on, those were fun, when you had to load into memory and uncompress a file larger than that :-)
:-)
Oh the fond memories
Daniel
Carpe Diem
It doesn't give a specific filesize in the article...
sig.
As others have noted, there are plenty of good reasons to have files greater than two gigs including video editing and scientific research. The file size limits aren't there for a very good reason at all. Someone years ago had to weigh whether to make small files take up a huge amount of room by using 64 bit addresses that would allow multi-terabyte files to exist against using 32 bit addresses that would make small files smaller and create a 2 gb file limit. At the time, it made perfect sense because nobody was using files anywhere near 2 gb... But now they are.
Two words:
Video Editing
Daniel
Carpe Diem
I have most all of my older system images available to inspect. The loopback devices under Linux are tailor made for this type of thing.
I am puzzled as to why you mention the seek times. Surely you would agree that the seek time should be only inversely geometrically related to size, the particular factors depending on the filesystem. Any deviation from the theoretical ideal is the fault of a particular OS's implementation. My experience is that this is not significant.
(user dmanny on wife's machine, ergo posting as AC)
We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html
On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
In a couple of years, will todays large files be considered large? Ten years ago having hundreds of 4MB files on a pc would've been considered crazy. Now everyone with an mp3 player is used to it.
Can anyone give a good reason for needing files larger than 2gb?
Forensic analysis of disk images. And yes, from experience I can tell you that half the file tools on RedHat (like, say, Perl) aren't compiled to support >2GB files.
It's certainly something that George Orwell would have frowned upon, but it's not incorrect sentence construction per se.
PS: Read that Orwell article if you haven't yet, it's really very good
Carpe Diem
Because the sentences mean different things.
/Janne
"It is an interesting problem that some distro-compilers have to face."
talks about the problem facing distro compilers, whereas
"It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Talks about the article adressing these problems.
Trust the Computer. The Computer is your friend.
It has a nice small 1gb filesystem limit. I have partitioned my hard disk in to 64 little chunks and it runs very slowly, and unstabilly, but its completley open source and im happy.
"oh yes, those were the days." - misty eyed smile - "when i was young and filesizes were small. you should have seen it. today's youth is so spoiled that they don't even learn assembly language any more. i tell you, you're all going to die because of your large files, yes, die!" - madly waves his cane in the air - "2gb, that's more than anybody will ever need and you are greedy for even more! the holy bit will punish you for this, it will!" - dies of a heart attack.
Geeks seem to have a weird fascination for strange spellings. "-ces" is the traditional plural ending of Latin words ending in "x". Obviously, "Unix" does not originate from Latin, and "Unices" is thus nothing but a (bad) joke. (The same applies to "emacsen", and there are a few others around as well.)
Because it's not an interesting problem. It's a fucking boring problem if _you_ have to deal with it. But it's interesting to read about because it's the kind of thing you probably haven't thought about if you don't compile distributions. I meant what I wrote.
Many large-scale computing projects easily generate hundreds of gigabytes and even terabytes of data. They are writing to RAID systems and even parallel file systems to improve their IO.
Think beyond the little toy that you use. These projects are using Unix (Solaris, Linux, BSD and even MacOSX) on clusters of hundreds or thousands of nodes.
the use of large files tempts users to store all kinds of redundant, reducible, linear and irrelevant data wasting storage space and I/O time
As opposed to a million 4k files that are each 1k of header?
In a world everything is small and manageable. Unfortunately, some databases need tables BIGGER than 2gb. Even splitting that table into multiple files still finds you with files larger than two gb. Try adding more tables? OK. Now they've grown to over 2gb and the more tables the more complicated everthing gets. I still need to back these suckers up and a backup vendor that I won't name can't help me because their software wasn't large file (for Linux) ready. So let's get into the game with this and make it the default so we don't need to worry about these problems in the future. Linux IS an enterprise solution.....(my $.02)
We use a Unidata database here for an ERP system, each database is more than 2gb a piece (more like 20 gb) of relatively small files, when the directories are tarred for backup reasons they are usually over 2gb which means that gzip won't compress them. Unless I'm missing something I don't see an alternative for files large than 2gb in this case. Sure on the personal computing level the closest thing you probably get is ripping DVD's but there are other things out there, and I realize this is tiny in comparison to some places.
You obviously have never done any work with video before. Most DV will eat up 2GB easy with 15min of footage or less.
--sdem
Its how you use it.
I e-mailed somebody on the Board of Higher Ed of my State for some answers, and they simply replied
Please call me at #-###-###-###.
Thanks
He has a really good point if mail programs put archives in one big zip-equivalent file, because these CAN get huge.
Cover your eyes and click this link!
I have run into problems trying to compress a tar archive of my home directory which has been around since 1995 when I switched to Linux. The two gig limit runs into trouble here.
The seek times alone withinr these files must be huge
Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.
If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.
Do you care about the security of your wireless mouse?
I'd never heard emacsen, but VAXen is commonly used for multiple VAX machines, I believe.
Bitmap files for image setters can easily become huge. Think of 500x100(cm)x1000x1000(pixels).
I just wonder why we don't learn from past (limits) and remove this limits "forever". E.g. 1 month ago I recieved question of possibility building 10 TB Linux cluster (physics are crazy ;-)).
;-)
There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.
Just a crazy idea
1) Splitting up a big file turns an elegant solution into a an inelegant nightmare.
2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.
3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!
This is my sig.
Maybe in your problem domain that's true. I work with retailer data mines and we've hit the 2GB file limit, oh, 4-5 yrs ago? We've been forced to partition databases causing maintainance issues, scalability issues, and the like, just because of the size of a B-tree index.
True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).
I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.
Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.
the datafile size averages 8GB in the warehouse.
A year spent in artificial intelligence is enough to make one believe in God.
"of the kinds" really adds nothing to the meaning here, nor does "have to"
Thus we have:
The same sentence, but much cleaner!
Thanks! I'll be here all week.
My Ass hurts.
Lmao...
Your other trolls are nice too, but this one is hilarious... "entropy pollution", hehe :)
"Linux of Windows XP bootloader", this one is amazing. I wonder whether it's a typo, or intentional...
Virus comes from latin.
Science Data usually consist of huge multidimensional arrays. I have seen satellite data in huge netcdf files that are very close if not slightly larger than that.
database dumps - one of our smaller database dumps is 2.3 GB compressed. The dumps are the easiest method of backup and distribution - locally and (very) remotely.
Over Christmas and New Years, I helped my wife run a simulation of 1000 different patients for an acedemic pharmacokinetics paper. The run took ten days and had an input file of about 1.5 GB. If her computer was faster, or she had access to more computers, she would have wanted to simulate more patients and would easily have needed support for files larger than 4 GB. As CPUs get faster and hard disks get larger, there will be much more demand for these large files as well as more than 4 GB per process.
What a fool believes, he sees, no wise man has the power to reason away.
Who are we to tell them what they have to accomodate?
Don't like the way a particular *NIX works? Don't use it.
Try something else.
Another example of large file utility are the database files. In my job, the DB machine (Solaris) hasn't sufficient disk space to generate the DB dump. The biggest dump have 11GB and I wasn't able to put it in Linux box (RH 6.2), so I used FreeBSD 4.2 with sucess.
I remember reading in the BeOS Bible that the BeOS filesystem could contain files as large as 18 petabytes. Makes you wonder two things: What's the biggest filesystem that you could use with a BeOS machine? and Why don't other OSs have filesystem like this. Espcecially with those awesome extended attributes. I weep for the loss of the BeOS filesystem...
*slight crashing sound*
Yes. Sometimes you need to store a lot of data. Even DVD's has 4.3 GB of data these days. But that's not even much compared to the amount of data we handle in seismic research. I would believe astronomists, particle physicists and a lots of other people also routinely handle ridiculous amounts of data.
By the way, in producing the DVD, you would naturally work with uncompressed data. How would you handle that?
The seek times alone withinr these files must be huge, and it smacks a bit of inefficienecy
And because it is inefficient, we should not support it? As a matter of fact, any file larger than one disk-block is inefficient. Maybe we should stop supporting that as well?
sure its just as bad to have an app use hundreds of say 4kb files or so, but two GIGABYTES???
As I've said, it's not really that much, depending on the application.
Now this I can accept. I promise to think about what I write next time. ;)
Just like "matrices" is the plural of "matrix".
"Matrices" is a plural form of "matrix." The other one is "matrixes."
I'm not a specialist on this matter, so maybe you can enlighten me, where I am wrong or misunderstood you.
> fragmentation: large files increase to fracmentation of most file systems
What kind of fragmentation?
Small files lead to more internal fragmentation.
Large files are more likely to consist of more fragments, but when splitting this data into small files, those files are fragments of the same data.
>entropy pollution
What kind of entropy? Are you speaking of compression algorithms?
Compression ratios are actually better with large files than small files, because similarities between files across file-boundaries can be found. Therefor, gzip(bzip2) compresses a single large tar-file. (Simple test, try zip on many files and then zip without compression and subsequent compression on the resulting file).
>data pollution
How should limiting file size improve that situation? Then, people tend to store data in lot of small files. What a success. People will waste space, whether there is a file size limit or not.
>These limits are there for very good reasons and in my opinion they are even much to big.
Actually, they are there for historical reasons.
And should a DB spread all its tables over thousands of files instead of having only one table in one file and mmapping this single file into memory? Should a raw video stream be fragmented into several files to circumvent a file limit?
>[...] original K&R Unix [...] was much faster than modern systems
Faster? In what respect?
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
In my previous job we regularly processed credit data files >2 GB. All the data is processed serially (as someone else mentioned), so seek time is not an issue (nor is it an issue in a binary data file - seek to 1.4GB. Done. Next.).
The real issue we ran up against was compression... we wanted to have the original and interm data files available on-disk for awhile in case of reprocessing. The processing would generally take up 10x as much space as the original data file, so you compressed everything. Except that gzip can't handle files >2GB (at the time an alpha could, but we didn't want to touch it). Nor can zip. So we had to use compress. Yay. (bzip could handle it, but was decided against by the powers that be).
Compression of large files is still an issue, unless you want to split them up. Unless you download a beta version gzip still can't handle it. As I understand it zip won't ever be able to do it. There are some fringe compressors that can handle large files, but, well, they're fringe.
Getting back on topic, maybe the plural for Unix should be Unixen, like the plural for Vax is Vaxen?
What a fool believes, he sees, no wise man has the power to reason away.
The computer aided design databases for an automobile, when you have 3D models for the parts, the tooling, plant layout, etc. is in the low terabyte range. As another example, Boeing dedicates about 14 terabytes to commercial airplane geometry data storage.
Or Astronomy. A planning document talks about a project generating 300 terabytes per year.
No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
Last time I wrote a 7 gig file it was an image of a hard disk. Lots of other stuff (video) can get large too. Anyway, there is an error in the headline. 2 gigs is not a limit in modern unices, only in ancient or otherwise really crappy unices.
-D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE
This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2
Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.
maybe the plural for Unix should be Unixen
Sudden though of "Linuxen the HOOOOOUUUSSSSE, bizzach!"
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla
I sure hope that was a joke. Because otherwise it would be one of the most clueless comments I have seen.
Sure spliting data into a lot of smaller files is going to reduce the fragmentation slightly, but it is not going to improve your performance. Because the price of accessing different files is going to be higher than the price of the fragmentation.
In the next two arguments you managed to make two opposite statements both incorrect. That is actually quite impressive.
First you say large files increase the entropy of the data stored on the disk. Which is wrong as long as you compare to the same data stored in diffeerent files. Of course if the number of files on the disk is constant smaller files will lead to less entropy, but most people actually want to store some data on their disks.
Then you say large files are highly redundant, which is the opposite of having a large entropy as claimed in your previous argument. And in reality the redundancy does not tend to increase with filesize, but might of course depend on the format of the file.
All in all you are saying that people shouldn't store many data on their disks, and the little data they do store should be as compact as possible, while still allowing it to be compressed even further when doing backups. You might as well have said people shouldn't use their disks at all.
Finally claiming older Unix versions were faster is ridiculous, first of all they ran on different hardware. And surely on that hardware they were slower than todays systems. And even if you managed to port an ancient Unix version to modern hardware, I'm sure it wouldn't beat modern systems in todays tasks. Which DVD player would you suggest for K&R Unix?
Do you care about the security of your wireless mouse?
We do too learn assembly... I specifically learned about the MIPS architecture. Hated it but they do still teach it in CS classes. We touched on it a bit in Programming language concepts and then in Systems Architecture I and II, we actually had to write assembly code. I remember the happy day when I got my one assigmnet to work, we had to grab the keyboard interupts and display them. None of my non-CS friends could understand why I was so happy to have text that I typed appear on the screen.
-Chris
Some numbers for *uncompressed* video:
NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
PAL/YUV2/stereo: ~125gb for same
HTDV/surround: ~908gb for same
With huffyuv (very low CPU usage, lossless) you should be able to cut that by a factor of 2-3. But it's still *huge*
Kjella
Live today, because you never know what tomorrow brings
One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.
The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.
That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.
Bill Gates now claims that he was misquoted. What he really said was that "640K should be more than enough memory for anybody's toaster."
That tarball of 2002 stock quotes used to feed your stock research system.
The database files themselves, in the system.
Huh?
Other filesystems don't either :
http://www.sgi.com/software/xfs/techinfo.html
"Max. File Size
Designed to scale to 9 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 2 TB Max File Size. Solaris and Windows NT undergoing scalability testing"
"Max. File System Size
Designed to scale to 18 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 500 file systems of 2 TB each. Solaris and Windows NT undergoing scalability testing."
Unfortunately, it's not just a problem with the filesystem, but also and most often a problem with the applications. So, AIX does have this problem just as much as any other. Unless you've tested all the applications available for AIX.
There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.
it either
A) Wastes Memory Space
B) Wastes Code Space
C) Wastes Pointer Space
D) Or Violates some other tenant the programmer believes
So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.
And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.
Then, one comes along and blows right through it.
I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:
A) Database Servers
B) Video Streaming Servers
C) Video Editing Workstations
D) Photo Editing Workstations
E) Next Big Thing (tm) that hasn't come out yet.
As a rock-in-roll Physicist once said, No matter where you go, there you are.
- Backups so a single file (no, I don't want to copy a fscking whole directory structure, thank you very much.
- Video editing.
- Large sound editing (multi-channel).
- Ever tried to create a DVD ISO image? there you go...
- Speaking of DVD's, *you* try dumping one to your harddisk with 2GB files.
- Disk images (ever had to Ghost around a boot-disk or boot-DVD with a disk image?)
- 3D animation files (probably included in the "video editing" section).
want me to go on? the list is bigger...
Please mod this guy up as interesting or informative.
Huh?
I had a problem with HP-UX apparently not wanting to transfer via NFS (when the NFS server is on HP-UX 11.0) files larger than 2GB. I had to backup a Solaris computer's hard disk using DD across NFS. This usually worked when the NFS server is Solaris. However, last friday it failed, when the server was setup on HP-UX. I had to resort to my little Blade 100 as the NFS server, and I had no problems with it.
/etc/exports and then restart NFS daemon (or send SIGHUP)?
I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...
How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit
Sigged!
That's three words.
I didn't realize Daniel was so big, though.
Has he considered going lossy?
Keep your packets off my GNU/Girlfriend!
PAL: Max 720x576x25fps interlaced (50 Hz)
NTSC: Max 640x480x29.97fps interlaced (60 Hz)
No, the don't have same frequency, nor scanlines. Some european TVs will take PAL-60, like PAL only at 60Hz though. Also I don't think the color space works in the same way, but not sure about that one. That was why I used YUV2 (16bit) for both.
Kjella
Live today, because you never know what tomorrow brings
There is not a problem with support of large files in Unix system, there is a problem with incompetent people using too large files in Unix systems.
You are a troll. It is not up to administrators to decide how big a file needs to be. I do scientific research and deal regularly with datasets larger than 300GB. Single files often in the range of 2GB-10GB. For me to split up my data would create an enormous headache, and would be very slow.
-Sean
I remember like 4 or 5 years ago talking to my friend's dad, who works at kodak, and he would fill an entire 2gb jazz drive with one picture.
And the amazing thing is, everyone else seems to be taking it seriously.
Is it just me, or is Slashdot getting much less informed as the user count continues to increase ?
Even our Exchange private information store is somewhere around 10GB, and we are a small company by most standards
And that big y2k problem that was supposed to bring down mankind? How many years did it take to fix that? I very much doubt we started in 1965 ;)
Prediction: First distro to "suck it up" will be around 2035 or so. Personally, I think this is so far down on the priority list as you can get. Besides, with open source, is there really that problematic to grep the source for "time_t" and fix it? I don't think so.
Kjella
Live today, because you never know what tomorrow brings
However, I would recommend to stay away from > 2GB files in database environment. Even if your FS supports large files, you still loose performance on "double-driver": first your kernel provedes a partition, than it provides a file-system over it. But if you need so big files, why would you need file-system? Just use row partitions!
Of course you still need large files for video, but massive concurrent preformance overhead is not a typical problem in such case.
Less is more !
It's not just you.
I'm old enough to remember when discussions on Slashdot were well informed.
Hey everyone lets keep beating a dead horse and telling him the million and one ways that you need files greater than 2gb. Half of these posts just say "movies" anyway. So stop repeating yourselves.
Don't eat shrimp candy, just a heads up.
hate jar jar
Why'd they even mention DOS? All DOS programs are staticly linked. There are no dll's or anything like them (except overlays). The only thing close would be DOS Extenders. So, what does DOS have to do with it?
... 64-bit addressing before thinking this through. I couldn't see the significant advantage for more than a very tiny fraction of apps in being able to address more than a few gigabytes.
Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).
When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.
I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.
A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
Many servers now have the physical capacity of over 2TB on a filesystem storage device.
Unfortunately this is still a very significant limitation.
This problem is much more commonly encountered than file size limitations.
Maurice W. Hilarius Voice: (778) 347-9907
These are file on a regular partition (ie, ext2 or somesuch)?? It still sounds totaly in-effecient to me. I have nothing against large files, but I would hope a db would be using something more effecient or atleast using its own filesystem (making the 2bg limit irrelevant).
18 EXAbytes file sizes, real journals, life queries...
*SOB*
J.
Backup files, exporting a huge oracle database to a file. And, when I record divx quality video through my ATI card I can go through the GB like crazy.
A better question is, Who doesn't need largefile support?
As for the seek time...not everything is accessed like a random access file. I imagine that the backup data will be read in sequentially. The video file would mostly be handed sequentially other than when jumping to a chapter fast forwarding or reversing.
-- Good judgement comes with experience. -- Experience comes with bad judgement.
http://froogle.google.com/froogle?q=9.4+dvd-r&btnG =Froogle+Search
Can anyone give a good reason for needing files larger than 2gb?
.zip or .tgz of a collection of big files. Or creating the equivalant of an ISO image of a DVD. And so on.
Video/movie files, for one thing. Even compressed (eg DV or MPEG) those things are huge. A 2 GB file at professional DV compression (50 Mb/sec) is about 4 minutes worth. (DV is similar to MJPEG, so it's still lossy. Uncompressed or unlossy compressed video (critical for machine vision or image analysis apps) chews even more space.
I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.
Other fields also use huge data sets - seismic data analysis for example. Filesystems designed for supercomputer clusters (eg PVFS) have unlimited size on the total filesystem (tens of terabytes is not unusual) although the individual file size may still be limited by the underlying OS or hardware word size.
Then there's creating a
-- Alastair
The seek times alone within these files must be huge,
Depends on how your inodes are laid out, how big you have to get for triple indirect blocks, etc.
Shouldn't be any worse (and maybe better) than trying to seek through an equivalent collection of smaller files -- you've got to do all those directory searches, etc. (Exact comparisons will depend greatly on the filesystem and parameters chosen when the FS was created.)
-- Alastair
There is a need for a virtualizing filesystem which supports multiple volumes, offline and not, and files stored in segmented form to fit. It would be insanely handy in a clustering environment; The whole cluster could store the file (with some redundancy) and access it in a shared fashion. This would substantially improve the ease of working with inanely large data sets in a clustered scenario.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Anyway those using a M$ OS which does not support NTFS are fooling themselves. If you are using some form of windows prior to Windows 2000, then you are getting a terrible experience which is nothing like the real OS -- NT. NTFS is a pretty good filesystem with journaling, ACLs, and implicit support for encryption and compression. Fat32 is shite.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
> NTFS is a pretty good filesystem with journaling,
That's only partially true -- it doesn't journal data, only meta-data.
Are you trolling? That's the biggest load of shit I've read in this thread. You are by far the worst offender of the nimwits coming out of the woodwork whining that nobody needs files bigger than 2GB.
You even mention K&R Unix and claim it was faster than modern systems and use that as some kind of yardstick?
Jeses Christ, that's stupid. It's not 1975 any more, and none of your blathering has any relevence to the modern day. Technology progresses, take your dinosaur ass to a VMS shop and bore us all with your claims of how advanced VMS is, but don't tell people what they need and don't need, and certainly don't bandy about the term "incompetent" when you're so obviously projecting.
I was giving an example because the parent was 0, Offtopic at the time.
The example was that officials do worry about e-mail so they would either save it like he said or avoid typing it like I said. The point is that they would consider it important and that they would save e-mails that were sent.
Cover your eyes and click this link!
Sure, but that's good enough to save people in almost all cases. I've never, EVER lost data on NTFS5 due to a crash (which has happened plenty) or a power failure (only twice since I started using it.) FAT32, on the other hand... Or ext2 for that matter, it doesn't matter. A partially journaling filesystem gets the job done well enough for basically any purpose. If it's not good enough for you, perhaps a filesystem is not the best place to store your data in the first place, I'd considered a clustered replicating RDBMS :P
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)
When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).
gzip works over 4 GB but loses the ability to accurately report uncompressed file sizes (minor).
At least 2GB is better than the Multics large file support situation! Files were limited to the size of segments, which were at most 255K 36-bit words, which is equivalent to roughly one megabyte! The Multics designers didn't consider most users would have to ever have larger files than this. The first database product (ever!), MRDS, was severely limited, so Multics programmers created a (kludgy) workaround. Modern operating systems are designed differently and thus aren't limited to such (small) file sizes.
We have conquered this problem before, by redesigning filesystems to allow files bigger than segments, and we can conquer it again by allowing files bigger than the addressable range of a 32-bit processor's full word.
--TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
The time_t type must be signed, so that you can represent negative time differences. If you make time_t unsigned, when you try to do things like saying "if this file is older than that file" you will get a very large positive time, rather than a negative time. Not good.
www.eFax.com are spammers
Can anyone give a good reason for needing files larger than 2gb?
.iso images. :)
DVD
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
I would have snapped up puppy.mil in an instant.
Cheers,
Ian
;; signal/noise ratio is getting worse; I now read posts at +3 or above
;)
Heh, how ironic that your post is only at 2 now
Ostensibly your filesystem driver will be caching much of the list information in memory
Caching the tables in physical memory does of course help, but it doesn't remove the linear scan through a linked list. This linear scan takes time even if done in RAM. To improve performance the Linux driver for this filesystem caches a number of already resolved positions, I think this cache holds 8 entries. I found out about that once I needed simultaneous sequential access to 20 files on the same FAT32 filesystem. Performance was horrible. I had two options, either do access in very large blocks to keep the number of listscans low, or increase the cachesize and recompile my kernel. I don't remember which of the two options I chose.
Do you care about the security of your wireless mouse?
I recently tried recording a one hour TV show with xawtv, to AVI (MJPEG, 640x480, 15 fps, 16-bit stereo sound). It appeared to record okay, and ended up 5 gigs. But I could only play the first few minutes of it with aviplay. Something (either xawtv, aviplay, or the AVI file format itself) has a 2 (or 4 (unsigned)) GB limit.
I would guess that a router or firewall or any device, maybe even a cable modem would filter that. If you think you're accessing through a firewall, that's probably why.
It works on Win98/Internet Explorer 5 with a direct connection to the cable modem.
Cover your eyes and click this link!
Old news, Solaris 2.6 and 7. Solaris 8 is 64 by default. I hope they are not still developing for 2.6 :)
There is no plural, as virus is a plural word in latin already
Really? What declension does the word virus belong to?
I seem to recall that some declensions in latin have both the singular and plural ending in -us but it's ages since I studied latin - over a decade ago. I'm not even sure any more how to spell declension.
You make the mistake of thinking you can educate the fundamental stupidity out of people. You can't.
Examples: tax => taxes; sex => sexes; fox => foxes; box => boxes
The word ox, with its plural oxen, is a freak of English grammar. It is the exception, not the rule.
Examples of this bogus pluralization applied to similar words:
Both en and es have the same number of keystrokes and bits. en has no advantage, except the appearance of 1337ness to people who don't know better. So please stop using it and trying to one-up the dictionary. (This goes for virii and Unices too.) I know it's only being used with geeky words so far, but that only makes the rules of pluralization even more complicated.The English language is convoluted enough without deliberately introducing more irregularities.
Oh well. Anyway, I know that a linked list just plain isn't as efficient as a tree, but as you say there are ways to speed things up. I would assume that the windows driver probably throws away quite a bit of memory trying to make fat32 fast, microsoft has always been more than willing to squander memory willy-nilly. In fact, Mechwarrior IV:Vengeance used to have a habit of squandering it permanently, or until the process terminated... From what I hear, Excel still does, but I don't spend much time in there consecutively.
Also, I don't see any reason you couldn't build a tree in memory or in a cache (perhaps you build it in memory and design it so that you can swap most of it out automatically? That would be a really funky way to do things on chicago but it would be quite reasonable on any flavor of NT, or of course on your favorite open-source operating system. A non-trivial job to be sure but obviously not impossible. At least that way it would only be slow once per boot.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I thought only few programs used lseek(), e.g. databases. Wouldn't most programs read files sequentially, whitout using off_t at all?
-- 1.e4 c6 2.d4 d5 3.Sc3 de4: 4.Se4: Sd7 5.Sg5 Sgf6 6.Ld3 e6 7.S1f3 h6 8.Se6:
Don't you mean, increase the cache size, make modules... :)
I think FAT was compiled in my kernel at that time.
perhaps you build it in memory and design it so that you can swap most of it out automatically?
I wonder who really wants to spend a lot of time improving FAT performance when there are so many other filesystems that will always perform better than FAT.
Do you care about the security of your wireless mouse?
Except for scientific calculations where there will probably never be a reasonable limit on the size or precision of numbers needed I doubt anyone would need more than 64 bits for any scalar type, be it a char or an int or a double or whatever. Why not use 64 bits for everything and accept the wasted space for storing chars but not ever have to worry about running out of numbers? Even if you waste 7/8 of the space on your hard drive to store 8 byte long chars, the available storage has gone up exponentially by using a 64 bit address space. increasing the size of your data 8 times is negligable, negligable enough to not even bother with 1 byte chars.
Eat at Joe's.
Well, mostly Microsoft, I'm thinking. Also fat32 is a handy filesystem because just about everyone can read it these days. I'm about to set up a PC for my girlfriend's aunt, it's just a K6-2 300. It'll have 256mb ram, and minimal (1.2Gb) disk, because that's what I have lying around. I'm putting Windows 98 SE on the disk, and knoppix will be provided on a CD so she can play with linux, assuming I can get it to stop making idiot assumptions about refresh rates without requiring her to insert a floppy as well. That is god damned idiotic. But anyway I digress, the best FS for that OS is FAT32, so I'm going to use it, all data will be stored on a fat32 volume. I imagine this is becoming a fairly common scenario. Also of course many geeks multiboot to win98 for games, the only filesystem they'll have in their PC readable by all operating systems is FAT32 and they will likely be keeping media there.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I realise you're just a troll, but I'd like to point out that Win32 also has two forms of file API for most functions - one that can do 64-bit and one limited (or at least which encourages you to use) 32-bit. For 64-bit access to work in any given application, you're relying on the language runtime making the correct mapping, and/or the end developer choosing the right set of functions to use. In many cases the easiest and most obvious ones will limit you to 32-bits - so many applications will not work with such large files.
This is a problem which has affected pretty much every system - even in an OS where *only* 64-bit file APIs exist, you'll still find an occasional app which tries to fit a file location into a 32-bit variable.
This is my World Wide Web of Whatever
Oh, but you do. AIX 4 definitely wants -DLARGE_FILES (sp?), or bad things happen, and watch your longs and long-longs (and their aliases) carefully. (A buddy and I recently had to comb through exactly this problem in an app)
Yow! I'm supposed to have a plan?
--How about freaking TAR BACKUP FILES, you narrow minded moron?!
:)
--Have you tried backing up your 60GB Windoze partition to a compressed tar file, and gotten stung by that paltry 2Gig file limit under an older distro?? That pissed me right off!!
--Now I use Knoppix, and no worries.
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.
That's the way I edit. With the size of modern hard drives, it's a waste of time to do a traditional log/capture session. Instead, just dump everything to disk and then break it up from there. FCP even has a feature or two designed towards this direction (Start/stop detection). Hopefully they'll fix the subclip bug in version 4.
I tell FCP to parition my files, though. The only >2GB files I currently have are my toast DVD images. I try not to use >2GB files in general, though...there's still some mysterious HFS+ bugs floating around that I've been trying to avoid.
-Brett
Your boxen will be shipped in 4-6 weeks.
Sweet! I didn't know I'd get free boxen for reading your post!
This guy would have a field day with "All your boxen are belong to us"
This group produced three notable results:
I still have my T-shirt -- how about you?
I used to be a student admin for Clemson's College of Engr. and Science. We had several CAD tools that the Engr. students would use. There was this one tool that you could specify a duration the simulation was supposed to last, otherwise if the field was blank it would run forever. Besides that little bit of badness the field was blank by default, so many an unsuspecting student would run their simulations and they would run forever creating these huge output files, which the students also didn't know about.
The killer here, is that if you quit the program the wrong way ( something like Close instead of Quit ) the program would keep going, even after the student would log out.
So now you have N students who are all generating infinite files. However, the files would hit the 2GB limit and stop eating up space. ( Thank You )
The only other nasty ness of this is that once we found the file, if you simply removed it, the program (still running after log out) is just able to finally add more data. So you had to track down where the program was runnging and kill it first.
I was in charge of backups, and man of man was this annoying for them.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson