Large File Problems in Modern Unices
david-currie writes "Freshmeat is running an article that talks about the problems with the support for large files under some operating systems, and possible ways of dealing with these problems. It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Who needs more than 512k of RAM??
The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.
Question answered, move along, nothing to see here :)
Cover your eyes and click this link!
Video + Audio >= 2GB
Databases, Movie files, Backup files (think dumps to tapes). Animations, 3D modelling.... Lots of things need a > 2GB file size. Iain
---- "I would be careful in separating your weirdness, a good quirky quantum weirdness, from the disturbed weirdnes
Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.
And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.
-Mark
Ever heard of something like movie-editing? You can get huge files really fast.
-- we're dressed in green, and we're feeling mean
Unices plural for unix?
__
Sig: Marine Stock Photos
Real analytical work can easily produce files this large. Output for analyses of structures with more than half a million elements and several million degrees of freedom can EASILY produce output of over two gigs. Yes, these results can and should be split, but sometimes it makes sense to keep them together as a matter of convenience. Plus, there IS a small performance hit when dealing with multiple files on most of the major FEA packages.
Good answer. A 2gb movie would have to go for nearly 4 hours and that includes audio. Explain?
a 20mb mp3 can go well into an hour. Explain?
If you really need a movie which hits tjhat many hours you would be breaking it up into cd sized chunks anyway
Title says it all... Who are *YOU* to decide that *we* do not need 2GB files?
vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.
Feels like 64kbyte segments somehow...and I really don't want to have those back.
Come on. Even Bill Gates admitted that half a meg ain't enough.
640K, on the other hand, should be enough for anyone...
-Mark
my data warehouse at work is 600GB and grows at a rate of 4GB per day.
the production database that drives the sites is like 100GB
welcome to last week. 2GB is tiny.
A year spent in artificial intelligence is enough to make one believe in God.
I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig
I can just laugh at them now...
Liberty freedom are no1, not dicks in suits.
For when Jaron Lanier decides to update his website with 10,000,000 lines of script
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Oh I see now raw video is larger than I thought, oops
Ever heard of something called data processing? Banks, credit card companies, public utilities, etc. all have huge databases that get processed all the time, and involve working as well as final output files in the range of 2 to 20 gigabytes in size.
Seek times are irrelevant in these situations, since the files are processed serially.
Get out into the real world and see what real industry does with real computers.
Maybe high quality audio+vidio for say...
making a movie will be larger then that.
I guess a lot of the editing would probably be done scen by scene, and then you could on the fly merge and compress them so that at no point you use more then 2gb, but it seems that if you make a 2 hour dvd it would be nice to keep the 4gb image file on your hardrive if you planned to reburn it.
Not a scattering of scenes that it would recreate the image on the fly.
It is kind of a dumb question when we have computers being marketed as home dvd makers why would be need that big of a file.
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
I am not agreeing (or disagreeing) with the original post, but having a database > 2 GB has nothing to do with having a single file over 2 GB. A db != a file system (except for MySQL perhaps).
It's an old and well known problem that programmers and users tend to keep very large files for laziness and logical errors.
However it's also an old and well known fact that large files are bad for performance per se due to several reasons:
- fragmentation: large files increase to fracmentation of most file systems, at least of any system with uses single indexed trees/B-trees and nonlinear hashes
- entropy pollution: large files increase to overall entropy on the harddisk leading to worse compression ratios for backup and maintenance
- data pollution: the use of large files tempts users to store all kinds of redundant, reducible, linear and irrelevant data wasting storage space and I/O time
So I don't see why admins should provide a "work-around" for the filesize limits. These limits are there for very good reasons and in my opinion they are even much to big. You should always remember that the original K&R Unix had only 12 bits for file size storage and was much faster than modern systems, in fact it did run on 2,2 MHz processors and 32 kB of RAM which wouldn't be sufficient for even a Linux of Windows XP bootloader.Think about it.
Owner of a Mensa membership card.
I can think of some:
And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.
Every expression is true, for a given value of 'true'
Don't moderate up ignorance.
That's whining... But I see his point--the only reason right now is for video files. If you want to get your video from your camcorder, it's not going to go straight to CDRW or DVD, it's going to your HARD DRIVE storage. You are going to edit it, right?
Since you probably want to have the best quality, a single file will take a lot of space. (No I don't do this video thing, but I did my own research. Many people do have video, and for computer editing there is no reason to cap a file size.)
Ok fine, I guess he kind of has a point in that question....
Cover your eyes and click this link!
Comment removed based on user account deletion
--Bill Gates
We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)
However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.
I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."
Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.
Let's just get it over with.
www.eFax.com are spammers
wtf kind of sentence construction is this:
"It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."
Why not:
"It is an interesting problem that some distro-compilers have to face."
So my wife says to me, "Honey, do I look fat in this filesystem ?"
I replied, "Sweetie, I married you for your trust fund not your cluster size."
Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?
Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?
Right...
Dont be the good old fox .)
quote:port 17 udp
ought to be
Oh come on, those were fun, when you had to load into memory and uncompress a file larger than that :-)
:-)
Oh the fond memories
Daniel
Carpe Diem
It doesn't give a specific filesize in the article...
sig.
Can anyone give a good reason for needing files larger than 2gb?
A msg-id history file on a newsserver with long retention.
I have most all of my older system images available to inspect. The loopback devices under Linux are tailor made for this type of thing.
I am puzzled as to why you mention the seek times. Surely you would agree that the seek time should be only inversely geometrically related to size, the particular factors depending on the filesystem. Any deviation from the theoretical ideal is the fault of a particular OS's implementation. My experience is that this is not significant.
(user dmanny on wife's machine, ergo posting as AC)
We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html
On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
Can anyone give a good reason for needing files larger than 2gb?
Forensic analysis of disk images. And yes, from experience I can tell you that half the file tools on RedHat (like, say, Perl) aren't compiled to support >2GB files.
It has a nice small 1gb filesystem limit. I have partitioned my hard disk in to 64 little chunks and it runs very slowly, and unstabilly, but its completley open source and im happy.
Well, the applications I support store and interpret Seismic data. One survey can routinely be in the >100GB range. The visualisation apps we make are often asked to load 2-20GB in memory alone (that's why we still use Sun and SGI systems to do it, though we are actively pursuing Linux too). So 64-bit filesystems and files are kinda important to us.
Many large-scale computing projects easily generate hundreds of gigabytes and even terabytes of data. They are writing to RAID systems and even parallel file systems to improve their IO.
Think beyond the little toy that you use. These projects are using Unix (Solaris, Linux, BSD and even MacOSX) on clusters of hundreds or thousands of nodes.
We use a Unidata database here for an ERP system, each database is more than 2gb a piece (more like 20 gb) of relatively small files, when the directories are tarred for backup reasons they are usually over 2gb which means that gzip won't compress them. Unless I'm missing something I don't see an alternative for files large than 2gb in this case. Sure on the personal computing level the closest thing you probably get is ripping DVD's but there are other things out there, and I realize this is tiny in comparison to some places.
You obviously have never done any work with video before. Most DV will eat up 2GB easy with 15min of footage or less.
--sdem
Not funny, troll.
s /gates_memory.html
http://www.urbanlegends.com/celebrities/bill.gate
Its how you use it.
I e-mailed somebody on the Board of Higher Ed of my State for some answers, and they simply replied
Please call me at #-###-###-###.
Thanks
He has a really good point if mail programs put archives in one big zip-equivalent file, because these CAN get huge.
Cover your eyes and click this link!
I have run into problems trying to compress a tar archive of my home directory which has been around since 1995 when I switched to Linux. The two gig limit runs into trouble here.
The seek times alone withinr these files must be huge
Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.
If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.
Do you care about the security of your wireless mouse?
Bitmap files for image setters can easily become huge. Think of 500x100(cm)x1000x1000(pixels).
I'm posting AC on this one.
I can tell you that, to my knowledge, the AutoZone corporation has databases which exceed a terabyte in size. Yes, that's 1 terabyte. When you consider the sheer number of AutoZone retail locations, combined with their giant inhouse catalog, sales records going back umpteen years, customer data, etc. it's not hard to imagine such a large database.
I'm not saying that one huge database is the way to go. But I am saying that AFAIK it's in practice. 2 gigs is nothing when it comes to file size.
I just wonder why we don't learn from past (limits) and remove this limits "forever". E.g. 1 month ago I recieved question of possibility building 10 TB Linux cluster (physics are crazy ;-)).
;-)
There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.
Just a crazy idea
1) Splitting up a big file turns an elegant solution into a an inelegant nightmare.
2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.
3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!
This is my sig.
the datafile size averages 8GB in the warehouse.
A year spent in artificial intelligence is enough to make one believe in God.
Science Data usually consist of huge multidimensional arrays. I have seen satellite data in huge netcdf files that are very close if not slightly larger than that.
database dumps - one of our smaller database dumps is 2.3 GB compressed. The dumps are the easiest method of backup and distribution - locally and (very) remotely.
Over Christmas and New Years, I helped my wife run a simulation of 1000 different patients for an acedemic pharmacokinetics paper. The run took ten days and had an input file of about 1.5 GB. If her computer was faster, or she had access to more computers, she would have wanted to simulate more patients and would easily have needed support for files larger than 4 GB. As CPUs get faster and hard disks get larger, there will be much more demand for these large files as well as more than 4 GB per process.
What a fool believes, he sees, no wise man has the power to reason away.
Another example of large file utility are the database files. In my job, the DB machine (Solaris) hasn't sufficient disk space to generate the DB dump. The biggest dump have 11GB and I wasn't able to put it in Linux box (RH 6.2), so I used FreeBSD 4.2 with sucess.
I remember reading in the BeOS Bible that the BeOS filesystem could contain files as large as 18 petabytes. Makes you wonder two things: What's the biggest filesystem that you could use with a BeOS machine? and Why don't other OSs have filesystem like this. Espcecially with those awesome extended attributes. I weep for the loss of the BeOS filesystem...
*slight crashing sound*
Yes. Sometimes you need to store a lot of data. Even DVD's has 4.3 GB of data these days. But that's not even much compared to the amount of data we handle in seismic research. I would believe astronomists, particle physicists and a lots of other people also routinely handle ridiculous amounts of data.
By the way, in producing the DVD, you would naturally work with uncompressed data. How would you handle that?
The seek times alone withinr these files must be huge, and it smacks a bit of inefficienecy
And because it is inefficient, we should not support it? As a matter of fact, any file larger than one disk-block is inefficient. Maybe we should stop supporting that as well?
sure its just as bad to have an app use hundreds of say 4kb files or so, but two GIGABYTES???
As I've said, it's not really that much, depending on the application.
In my previous job we regularly processed credit data files >2 GB. All the data is processed serially (as someone else mentioned), so seek time is not an issue (nor is it an issue in a binary data file - seek to 1.4GB. Done. Next.).
The real issue we ran up against was compression... we wanted to have the original and interm data files available on-disk for awhile in case of reprocessing. The processing would generally take up 10x as much space as the original data file, so you compressed everything. Except that gzip can't handle files >2GB (at the time an alpha could, but we didn't want to touch it). Nor can zip. So we had to use compress. Yay. (bzip could handle it, but was decided against by the powers that be).
Compression of large files is still an issue, unless you want to split them up. Unless you download a beta version gzip still can't handle it. As I understand it zip won't ever be able to do it. There are some fringe compressors that can handle large files, but, well, they're fringe.
The computer aided design databases for an automobile, when you have 3D models for the parts, the tooling, plant layout, etc. is in the low terabyte range. As another example, Boeing dedicates about 14 terabytes to commercial airplane geometry data storage.
Or Astronomy. A planning document talks about a project generating 300 terabytes per year.
No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
Last time I wrote a 7 gig file it was an image of a hard disk. Lots of other stuff (video) can get large too. Anyway, there is an error in the headline. 2 gigs is not a limit in modern unices, only in ancient or otherwise really crappy unices.
-D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE
This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2
Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.
maybe the plural for Unix should be Unixen
Sudden though of "Linuxen the HOOOOOUUUSSSSE, bizzach!"
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla
Some numbers for *uncompressed* video:
NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
PAL/YUV2/stereo: ~125gb for same
HTDV/surround: ~908gb for same
With huffyuv (very low CPU usage, lossless) you should be able to cut that by a factor of 2-3. But it's still *huge*
Kjella
Live today, because you never know what tomorrow brings
what about next-generation video discs, should they spilt their scenes into 4k files as well?
Nor double-layer, for that matter. DVD-Rs amax out at 4.7GB period, end of story. There are two-sided DVD-RAMs though. But the two sides are used separately, so no double-sized images. Heck, no images at all since DVD-RAM is fully random access.
My (Windows) machine has no problem with >4GB files, BTW. Stupid WinZip can't expand files to that size (or zip them from that size) though.
One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.
The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.
That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.
We say: "Bill Gates said something criminally stupid and short-sighted."
You say: "Bill Gates says he didn't." (read the link).
Gates said it... along with a great many other moronic things. Get over it.
Bill Gates now claims that he was misquoted. What he really said was that "640K should be more than enough memory for anybody's toaster."
That tarball of 2002 stock quotes used to feed your stock research system.
The database files themselves, in the system.
Huh?
Other filesystems don't either :
http://www.sgi.com/software/xfs/techinfo.html
"Max. File Size
Designed to scale to 9 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 2 TB Max File Size. Solaris and Windows NT undergoing scalability testing"
"Max. File System Size
Designed to scale to 18 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 500 file systems of 2 TB each. Solaris and Windows NT undergoing scalability testing."
Unfortunately, it's not just a problem with the filesystem, but also and most often a problem with the applications. So, AIX does have this problem just as much as any other. Unless you've tested all the applications available for AIX.
One not-so-everyday reason....
:(
Research.
Right now im doing data-cache research that requires reference traces that are post-processed for various statistics (aka. every load & store is written to a file and then examined by other apps).
These files are HUGE. Some of the benchmarks we're running have well over a billion memory references. For each reference you have 4 to 8 bytes for the address and various additional bytes for additional statistics.
On the low side these files are ~ 4GB
There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.
it either
A) Wastes Memory Space
B) Wastes Code Space
C) Wastes Pointer Space
D) Or Violates some other tenant the programmer believes
So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.
And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.
Then, one comes along and blows right through it.
I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:
A) Database Servers
B) Video Streaming Servers
C) Video Editing Workstations
D) Photo Editing Workstations
E) Next Big Thing (tm) that hasn't come out yet.
As a rock-in-roll Physicist once said, No matter where you go, there you are.
- Backups so a single file (no, I don't want to copy a fscking whole directory structure, thank you very much.
- Video editing.
- Large sound editing (multi-channel).
- Ever tried to create a DVD ISO image? there you go...
- Speaking of DVD's, *you* try dumping one to your harddisk with 2GB files.
- Disk images (ever had to Ghost around a boot-disk or boot-DVD with a disk image?)
- 3D animation files (probably included in the "video editing" section).
want me to go on? the list is bigger...
Please mod this guy up as interesting or informative.
Huh?
I had a problem with HP-UX apparently not wanting to transfer via NFS (when the NFS server is on HP-UX 11.0) files larger than 2GB. I had to backup a Solaris computer's hard disk using DD across NFS. This usually worked when the NFS server is Solaris. However, last friday it failed, when the server was setup on HP-UX. I had to resort to my little Blade 100 as the NFS server, and I had no problems with it.
/etc/exports and then restart NFS daemon (or send SIGHUP)?
I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...
How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit
Sigged!
most people recommend breaking them up.
That's three words.
I didn't realize Daniel was so big, though.
Has he considered going lossy?
Keep your packets off my GNU/Girlfriend!
PAL: Max 720x576x25fps interlaced (50 Hz)
NTSC: Max 640x480x29.97fps interlaced (60 Hz)
No, the don't have same frequency, nor scanlines. Some european TVs will take PAL-60, like PAL only at 60Hz though. Also I don't think the color space works in the same way, but not sure about that one. That was why I used YUV2 (16bit) for both.
Kjella
Live today, because you never know what tomorrow brings
I remember like 4 or 5 years ago talking to my friend's dad, who works at kodak, and he would fill an entire 2gb jazz drive with one picture.
Even our Exchange private information store is somewhere around 10GB, and we are a small company by most standards
And that big y2k problem that was supposed to bring down mankind? How many years did it take to fix that? I very much doubt we started in 1965 ;)
Prediction: First distro to "suck it up" will be around 2035 or so. Personally, I think this is so far down on the priority list as you can get. Besides, with open source, is there really that problematic to grep the source for "time_t" and fix it? I don't think so.
Kjella
Live today, because you never know what tomorrow brings
George Orwell may have wrote some nifty stories but the guy was no linguist mmmkay.
That sounds like fun with backups.
However, I would recommend to stay away from > 2GB files in database environment. Even if your FS supports large files, you still loose performance on "double-driver": first your kernel provedes a partition, than it provides a file-system over it. But if you need so big files, why would you need file-system? Just use row partitions!
Of course you still need large files for video, but massive concurrent preformance overhead is not a typical problem in such case.
Less is more !
Gates said it...
Why? Because you say so? Maybe he said it, maybe he didn't. Show me the source of that quote. Until you can, your assertion is worth exactly nothing. People who deliberately spread misinformation disgust me, doubly so if they claim to be technical types. I'd have you taken outside and flogged if I could.
-- a random AC
Hey everyone lets keep beating a dead horse and telling him the million and one ways that you need files greater than 2gb. Half of these posts just say "movies" anyway. So stop repeating yourselves.
Don't eat shrimp candy, just a heads up.
hate jar jar
Why'd they even mention DOS? All DOS programs are staticly linked. There are no dll's or anything like them (except overlays). The only thing close would be DOS Extenders. So, what does DOS have to do with it?
... 64-bit addressing before thinking this through. I couldn't see the significant advantage for more than a very tiny fraction of apps in being able to address more than a few gigabytes.
Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).
When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.
I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.
A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
Many servers now have the physical capacity of over 2TB on a filesystem storage device.
Unfortunately this is still a very significant limitation.
This problem is much more commonly encountered than file size limitations.
Maurice W. Hilarius Voice: (778) 347-9907
These are file on a regular partition (ie, ext2 or somesuch)?? It still sounds totaly in-effecient to me. I have nothing against large files, but I would hope a db would be using something more effecient or atleast using its own filesystem (making the 2bg limit irrelevant).
18 EXAbytes file sizes, real journals, life queries...
*SOB*
J.
Backup files, exporting a huge oracle database to a file. And, when I record divx quality video through my ATI card I can go through the GB like crazy.
A better question is, Who doesn't need largefile support?
As for the seek time...not everything is accessed like a random access file. I imagine that the backup data will be read in sequentially. The video file would mostly be handed sequentially other than when jumping to a chapter fast forwarding or reversing.
-- Good judgement comes with experience. -- Experience comes with bad judgement.
Can anyone give a good reason for needing files larger than 2gb?
.zip or .tgz of a collection of big files. Or creating the equivalant of an ISO image of a DVD. And so on.
Video/movie files, for one thing. Even compressed (eg DV or MPEG) those things are huge. A 2 GB file at professional DV compression (50 Mb/sec) is about 4 minutes worth. (DV is similar to MJPEG, so it's still lossy. Uncompressed or unlossy compressed video (critical for machine vision or image analysis apps) chews even more space.
I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.
Other fields also use huge data sets - seismic data analysis for example. Filesystems designed for supercomputer clusters (eg PVFS) have unlimited size on the total filesystem (tens of terabytes is not unusual) although the individual file size may still be limited by the underlying OS or hardware word size.
Then there's creating a
-- Alastair
The seek times alone within these files must be huge,
Depends on how your inodes are laid out, how big you have to get for triple indirect blocks, etc.
Shouldn't be any worse (and maybe better) than trying to seek through an equivalent collection of smaller files -- you've got to do all those directory searches, etc. (Exact comparisons will depend greatly on the filesystem and parameters chosen when the FS was created.)
-- Alastair
There is a need for a virtualizing filesystem which supports multiple volumes, offline and not, and files stored in segmented form to fit. It would be insanely handy in a clustering environment; The whole cluster could store the file (with some redundancy) and access it in a shared fashion. This would substantially improve the ease of working with inanely large data sets in a clustered scenario.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Anyway those using a M$ OS which does not support NTFS are fooling themselves. If you are using some form of windows prior to Windows 2000, then you are getting a terrible experience which is nothing like the real OS -- NT. NTFS is a pretty good filesystem with journaling, ACLs, and implicit support for encryption and compression. Fat32 is shite.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
1995 was just after the Alpha was first released IIRC. Back then (and still today to a lesser extent), the phrase "if you need 64-bit pointers (or 64-bit pointers into a file), get a 64-bit machine" made at least some sense.
I've compressed files larger than 2 gb with gzip on linux 2.4 without any problems at all. We have some 20 gig and larger database tables which compress nicely to around 5 gig.
When I encode 120 minutes of video into MPEG2 at around 640x480 it reaches about 3.5G.
> NTFS is a pretty good filesystem with journaling,
That's only partially true -- it doesn't journal data, only meta-data.
I was giving an example because the parent was 0, Offtopic at the time.
The example was that officials do worry about e-mail so they would either save it like he said or avoid typing it like I said. The point is that they would consider it important and that they would save e-mails that were sent.
Cover your eyes and click this link!
Sure, but that's good enough to save people in almost all cases. I've never, EVER lost data on NTFS5 due to a crash (which has happened plenty) or a power failure (only twice since I started using it.) FAT32, on the other hand... Or ext2 for that matter, it doesn't matter. A partially journaling filesystem gets the job done well enough for basically any purpose. If it's not good enough for you, perhaps a filesystem is not the best place to store your data in the first place, I'd considered a clustered replicating RDBMS :P
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
At least come up with something funny, like "In Soviet Russia, filesystem seeks YOU."
Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)
When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).
The time_t type must be signed, so that you can represent negative time differences. If you make time_t unsigned, when you try to do things like saying "if this file is older than that file" you will get a very large positive time, rather than a negative time. Not good.
www.eFax.com are spammers
Can anyone give a good reason for needing files larger than 2gb?
.iso images. :)
DVD
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
Exactly, and MPEG-2 compressed video can still exceed 2GB in size.
I have a friend that does some video work on the side -- he takes peoples home movies and such and puts them on DVD (he turned a hobby/present into a small business). Consequently, he uses a mac. One hour of uncompressed (compressed by only loss-less) at DVD quality (720x480) @ 24.97 fps and stereo audio takes up about 15-16GB. If you want to then bumb up to like 720p that would be like what 1280x720 I think -- your talking about 40-42GB per hour.
Of cource std tv 320x240 resolution and stereo would only take up approx 3.5GB uncompressed which is still larger than 2GB.
I would have snapped up puppy.mil in an instant.
You just made my day :)
I'd rather that the article didn't even bother with the lip service to other OSes; as it is, the article reads:
"The BSDs handle large file support without any problems. Solaris and Linux have some problems. Here's all the problems with linux, with nary a mention of what happens with Solaris, or how other OSes have managed to deal gracefully with it."
If you're gonna write a lignux article, make it a lignux article. Jeez.
Cheers,
Ian
Ostensibly your filesystem driver will be caching much of the list information in memory
Caching the tables in physical memory does of course help, but it doesn't remove the linear scan through a linked list. This linear scan takes time even if done in RAM. To improve performance the Linux driver for this filesystem caches a number of already resolved positions, I think this cache holds 8 entries. I found out about that once I needed simultaneous sequential access to 20 files on the same FAT32 filesystem. Performance was horrible. I had two options, either do access in very large blocks to keep the number of listscans low, or increase the cachesize and recompile my kernel. I don't remember which of the two options I chose.
Do you care about the security of your wireless mouse?
I recently tried recording a one hour TV show with xawtv, to AVI (MJPEG, 640x480, 15 fps, 16-bit stereo sound). It appeared to record okay, and ended up 5 gigs. But I could only play the first few minutes of it with aviplay. Something (either xawtv, aviplay, or the AVI file format itself) has a 2 (or 4 (unsigned)) GB limit.
It is not already a plural. Read a book (I suggest a dictionary, English or Latin).
That part of writing an OS from scratch is trivial.
By the Unix standard tar and cpio will never support files bigger than 2 GB. Maybe a new utility called tar64 and cpio64 will. AIX backup/restore support files > 2 GB. Maybe HP-UX dump/restore can do the same.
Good luck.
I would guess that a router or firewall or any device, maybe even a cable modem would filter that. If you think you're accessing through a firewall, that's probably why.
It works on Win98/Internet Explorer 5 with a direct connection to the cable modem.
Cover your eyes and click this link!
" Ever heard of something like movie-editing? You can get huge files really fast."
Heard of it. Live it. That's why the original stays on the source machine (DV), while a lower quality (preview) is loaded into the computer doing the editing. Once editing is done then the software pulls the higher quality originals from the source and assemble and process appropriately. Outputting in format desired. Keeps file size managable.
Old news, Solaris 2.6 and 7. Solaris 8 is 64 by default. I hope they are not still developing for 2.6 :)
I don't get why people continue to quote this. I've never ever seen a cite (besides "well, I heard it from my friend), and Bill Gates has said he didn't say it. Two strikes is enough for me.
"Mod me down, please."
--cyber_rigger
Oh well. Anyway, I know that a linked list just plain isn't as efficient as a tree, but as you say there are ways to speed things up. I would assume that the windows driver probably throws away quite a bit of memory trying to make fat32 fast, microsoft has always been more than willing to squander memory willy-nilly. In fact, Mechwarrior IV:Vengeance used to have a habit of squandering it permanently, or until the process terminated... From what I hear, Excel still does, but I don't spend much time in there consecutively.
Also, I don't see any reason you couldn't build a tree in memory or in a cache (perhaps you build it in memory and design it so that you can swap most of it out automatically? That would be a really funky way to do things on chicago but it would be quite reasonable on any flavor of NT, or of course on your favorite open-source operating system. A non-trivial job to be sure but obviously not impossible. At least that way it would only be slow once per boot.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I thought only few programs used lseek(), e.g. databases. Wouldn't most programs read files sequentially, whitout using off_t at all?
-- 1.e4 c6 2.d4 d5 3.Sc3 de4: 4.Se4: Sd7 5.Sg5 Sgf6 6.Ld3 e6 7.S1f3 h6 8.Se6:
Don't you mean, increase the cache size, make modules... :)
I think FAT was compiled in my kernel at that time.
perhaps you build it in memory and design it so that you can swap most of it out automatically?
I wonder who really wants to spend a lot of time improving FAT performance when there are so many other filesystems that will always perform better than FAT.
Do you care about the security of your wireless mouse?
Except for scientific calculations where there will probably never be a reasonable limit on the size or precision of numbers needed I doubt anyone would need more than 64 bits for any scalar type, be it a char or an int or a double or whatever. Why not use 64 bits for everything and accept the wasted space for storing chars but not ever have to worry about running out of numbers? Even if you waste 7/8 of the space on your hard drive to store 8 byte long chars, the available storage has gone up exponentially by using a 64 bit address space. increasing the size of your data 8 times is negligable, negligable enough to not even bother with 1 byte chars.
Eat at Joe's.
Well, mostly Microsoft, I'm thinking. Also fat32 is a handy filesystem because just about everyone can read it these days. I'm about to set up a PC for my girlfriend's aunt, it's just a K6-2 300. It'll have 256mb ram, and minimal (1.2Gb) disk, because that's what I have lying around. I'm putting Windows 98 SE on the disk, and knoppix will be provided on a CD so she can play with linux, assuming I can get it to stop making idiot assumptions about refresh rates without requiring her to insert a floppy as well. That is god damned idiotic. But anyway I digress, the best FS for that OS is FAT32, so I'm going to use it, all data will be stored on a fat32 volume. I imagine this is becoming a fairly common scenario. Also of course many geeks multiboot to win98 for games, the only filesystem they'll have in their PC readable by all operating systems is FAT32 and they will likely be keeping media there.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I realise you're just a troll, but I'd like to point out that Win32 also has two forms of file API for most functions - one that can do 64-bit and one limited (or at least which encourages you to use) 32-bit. For 64-bit access to work in any given application, you're relying on the language runtime making the correct mapping, and/or the end developer choosing the right set of functions to use. In many cases the easiest and most obvious ones will limit you to 32-bits - so many applications will not work with such large files.
This is a problem which has affected pretty much every system - even in an OS where *only* 64-bit file APIs exist, you'll still find an occasional app which tries to fit a file location into a 32-bit variable.
This is my World Wide Web of Whatever
Oh, but you do. AIX 4 definitely wants -DLARGE_FILES (sp?), or bad things happen, and watch your longs and long-longs (and their aliases) carefully. (A buddy and I recently had to comb through exactly this problem in an app)
Yow! I'm supposed to have a plan?
is ~3.15 billion nucleotides long. Yeah, you can compress it (2 bits per nucleotide = 4 nucleotides/byte), but it makes it a pain in the ass to work with.
I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.
That's the way I edit. With the size of modern hard drives, it's a waste of time to do a traditional log/capture session. Instead, just dump everything to disk and then break it up from there. FCP even has a feature or two designed towards this direction (Start/stop detection). Hopefully they'll fix the subclip bug in version 4.
I tell FCP to parition my files, though. The only >2GB files I currently have are my toast DVD images. I try not to use >2GB files in general, though...there's still some mysterious HFS+ bugs floating around that I've been trying to avoid.
-Brett
Your boxen will be shipped in 4-6 weeks.
Sweet! I didn't know I'd get free boxen for reading your post!
This guy would have a field day with "All your boxen are belong to us"
This group produced three notable results:
I still have my T-shirt -- how about you?
I used to be a student admin for Clemson's College of Engr. and Science. We had several CAD tools that the Engr. students would use. There was this one tool that you could specify a duration the simulation was supposed to last, otherwise if the field was blank it would run forever. Besides that little bit of badness the field was blank by default, so many an unsuspecting student would run their simulations and they would run forever creating these huge output files, which the students also didn't know about.
The killer here, is that if you quit the program the wrong way ( something like Close instead of Quit ) the program would keep going, even after the student would log out.
So now you have N students who are all generating infinite files. However, the files would hit the 2GB limit and stop eating up space. ( Thank You )
The only other nasty ness of this is that once we found the file, if you simply removed it, the program (still running after log out) is just able to finally add more data. So you had to track down where the program was runnging and kill it first.
I was in charge of backups, and man of man was this annoying for them.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson