Large File Problems in Modern Unices

Re:Why large files by mr.henry · 2003-01-26 02:59 · Score: 3, Funny

Who needs more than 512k of RAM??

Not really that groundbreaking... by CoolVibe · 2003-01-26 02:59 · Score: 4, Interesting

The problem is nonexistant in the BSD's, which use the large file (64 bit) versions anyway. And that you have to use a certain -D flag if your OS (like Linux) doesn't use the 64 bit versions. Whoopdiedoo. Not so hard. Recompile and be happy.

Re:Not really that groundbreaking... by Anonymous Coward · 2003-01-26 09:05 · Score: 1, Informative

What happens when you need more than 2^64 bytes storage? Cheat with granularity? The same problem still exists and isn't solved. Your train of thought is the same which allowed 32-bits to be used in the first place. Recurssive expansion would be the only real solution.
Re:Not really that groundbreaking... by statusbar · 2003-01-26 10:31 · Score: 2, Funny

2^64 = 17,179,869,184 gigabytes!

17,179,869,184 gigabytes ought to be enough for ANYBODY!

--jeff++

--
ipv6 is my vpn
Re:Not really that groundbreaking... by Citizen+of+Earth · 2003-01-26 14:49 · Score: 1

17,179,869,184 gigabytes ought to be enough for ANYBODY!

That would be 16 exabytes, but I think that, alas, realistically you will only be getting eight exabytes since you will probably deal with signed integers at some stage in most systems. (Are we allowed to use the term 'exabyte' since it's a registered trademark of a corporation? Does this trademark have any substance given that it is a common English word?)

video, mp3's, even dvds are beyond 2gb by xintegerx · 2003-01-26 02:59 · Score: 2, Informative

Question answered, move along, nothing to see here :)

--

Cover your eyes and click this link!

Re:video, mp3's, even dvds are beyond 2gb by bns_robson · 2003-01-26 10:09 · Score: 2, Funny

Your link doesn't work. I get a DNS failure loking up host 578.291.762.662

Re:Why large files by tgeerts · 2003-01-26 03:00 · Score: 1

Video + Audio >= 2GB

Re:Why large files by voodoopriestess · 2003-01-26 03:01 · Score: 3, Informative

Databases, Movie files, Backup files (think dumps to tapes). Animations, 3D modelling.... Lots of things need a > 2GB file size. Iain

--
---- "I would be careful in separating your weirdness, a good quirky quantum weirdness, from the disturbed weirdnes

Re:Why large files by Big+Mark · 2003-01-26 03:01 · Score: 5, Insightful

Video. Raw, uncompressed, high-quality video with a sound channel is fucking HUGE. Look how big DivX files are, and they're compressed many, many times over.

And compressing video on-the-fly isn't feasible if you're going to be tweaking with it, so that's why people use raw video.

-Mark

Re:Why large files by Ogion · 2003-01-26 03:01 · Score: 2, Insightful

Ever heard of something like movie-editing? You can get huge files really fast.

--
-- we're dressed in green, and we're feeling mean

Re:Why large files by Anonymous Coward · 2003-01-26 03:02 · Score: 5, Interesting

Real analytical work can easily produce files this large. Output for analyses of structures with more than half a million elements and several million degrees of freedom can EASILY produce output of over two gigs. Yes, these results can and should be split, but sometimes it makes sense to keep them together as a matter of convenience. Plus, there IS a small performance hit when dealing with multiple files on most of the major FEA packages.

Re:Why large files by hbackert · 2003-01-26 03:02 · Score: 4, Informative

vmware uses files as virtual disks. 2GB would be a really, really small disk. UML does the same, using the loop device feature of Linux. Again, a filesystem in a file. Again, 2GB is not much. Simulating 20GB would need 10 files.

Feels like 64kbyte segments somehow...and I really don't want to have those back.

Re:Why large files by Big+Mark · 2003-01-26 03:03 · Score: 2, Funny

Come on. Even Bill Gates admitted that half a meg ain't enough.

640K, on the other hand, should be enough for anyone...

-Mark

data warehouse, and any database for that matter by CrudPuppy · 2003-01-26 03:04 · Score: 5, Insightful

my data warehouse at work is 600GB and grows at a rate of 4GB per day.

the production database that drives the sites is like 100GB

welcome to last week. 2GB is tiny.

--
A year spent in artificial intelligence is enough to make one believe in God.

Its funny how some lamers dont listen... by cheekyboy · 2003-01-26 03:05 · Score: 3, Insightful

I said this to some unix 'so called experts' in 95, and they said, oh why why do you need >2gig

I can just laugh at them now...

--
Liberty freedom are no1, not dicks in suits.

Re:Its funny how some lamers dont listen... by FooBarWidget · 2003-01-26 03:27 · Score: 1

No you can't, both Linux and FreeBSD support files > 2 GB. Apparenly you've laughed all for nothing.
Re:Its funny how some lamers dont listen... by abirdman · 2003-01-26 10:28 · Score: 1

In 1995, IIRC the biggest hard drive in common use was (quite a bit) less than a gig. It's not surprising the designers decided to use the smaller, more efficient 32 bit addressing. As in all other things related to computers in general (didn't billg once ask why anyone would need more than 640K of RAM, or is that urban myth?), needs change over time.

As hardware develops, the software develops to address it. I remember someone who was shocked that Lotus 123 could create a spreadsheet larger than a (320K 5 1/4" DS) diskette, because how could you save it?

We've come a long way. We'll go a lot further. 64 bit file sizes will seem small and quaint to our childrens' children.

Now, If I could just tar my Linux file system over to my son's spare 60 gig hard drive through SAMBA, I'd have cheap, fast, effective backup. But I have to do it 2 gigs at a time. Grrr... gotta go check up on that BSD stuff.

--
Everything I've ever learned the hard way was based on a statistically invalid sample.
Re:Its funny how some lamers dont listen... by abirdman · 2003-01-26 13:12 · Score: 1

Mine stops with an error at 2 gigs, believe me. Guess I've got to go jiggle the wires or something. The problem might be SAMBA. Or I have to upgrade from RH 7.2

--
Everything I've ever learned the hard way was based on a statistically invalid sample.
Re:Its funny how some lamers dont listen... by Wolfrider · 2003-01-27 11:23 · Score: 1

--I have had the same problem under SuSE 7.3, even with the latest Linus kernel. The problem is fixed in Debian / Knoppix:

www.knopper.net
www.knoppix.net

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??

Re:Why large files by Timesprout · 2003-01-26 03:05 · Score: 1

For when Jaron Lanier decides to update his website with 10,000,000 lines of script

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe

Re:Why large files by amigaluvr · 2003-01-26 03:06 · Score: 1

Oh I see now raw video is larger than I thought, oops

Re:Why large files by AvitarX · 2003-01-26 03:07 · Score: 1

Maybe high quality audio+vidio for say...
making a movie will be larger then that.

I guess a lot of the editing would probably be done scen by scene, and then you could on the fly merge and compress them so that at no point you use more then 2gb, but it seems that if you make a 2 hour dvd it would be nice to keep the 4gb image file on your hardrive if you planned to reburn it.

Not a scattering of scenes that it would recreate the image on the fly.

It is kind of a dumb question when we have computers being marketed as home dvd makers why would be need that big of a file.

--
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg

Re:data warehouse, and any database for that matte by hector13 · 2003-01-26 03:07 · Score: 2, Insightful

my data warehouse at work is 600GB and grows at a rate of 4GB per day. the production database that drives the sites is like 100GB welcome to last week. 2GB is tiny.

And you store this "production database" as one file? didn't think so (or atleast I hope you don't).

I am not agreeing (or disagreeing) with the original post, but having a database > 2 GB has nothing to do with having a single file over 2 GB. A db != a file system (except for MySQL perhaps).

Re:Why large files by Idaho · 2003-01-26 03:08 · Score: 2, Insightful

Can anyone give a good reason for needing files larger than 2gb?

I can think of some:

A/V streaming/timeshifting
Backups of large filesystems (since there exist 320 GB harddisks now, I don't think I should create 160 .tgz files just to back it up, do I?)
Large databases. E.g. the slashdot posts table will be easily >2 GB, or so I'd guess. Should the DB cut it in two (or more) files, just...because the OS doesn't understand files >2 GB? I don't think so...

And that's just without thinking twice...there are probably many more reasons why people would want files >2 GB.

--
Every expression is true, for a given value of 'true'

Comment removed by account_deleted · 2003-01-26 03:08 · Score: 1

Comment removed based on user account deletion

640 K ought to be enough for anybody by cyber_rigger · 2003-01-26 03:10 · Score: 3, Funny

--Bill Gates

Re: 640 K ought to be enough for anybody by Looke · 2003-01-26 03:28 · Score: 1

Yeah, we should all switch to OpenOffice. I had a 20,000 row, 13 MB Excel file, which I resaved in OpenOffice Calc format. It came out at a sweet 640 KB ;-)
Re: 640 K ought to be enough for anybody by commanderfoxtrot · 2003-01-26 10:42 · Score: 1

OpenOffice is good in that all of its files are easily taken apart by hand. They are ZIPped archives, which are then gzipped to save, as you say, a lot of space.

I had a problem with OpenOffice last week when it didn't want to open my saved report. Or the last 6 hourly backups either :-(. Luckily I was able to get the bare text out and redo the report in LaTeX which worked (as usual) like a dream.

Does anyone know when the **MAJOR** OpenOffice multitasking bug will be fixed? Basically AFAIK, OO calls sched_yield all the time so running e.g. seti will stop OO from doing anything.

--
http://blog.grcm.net/
Re: 640 K ought to be enough for anybody by acid_zebra · 2003-01-29 01:30 · Score: 1

Of course, when you moved the files between formats, you lost any and all undo information. Excel stores this undo information in the file, so when you change a lot of things in this file, over time it will grow. Of course, OO also compresses the files, which also helps a lot.

--
-- No Sig is a Good Sig

It will happen with time_t, too by wowbagger · 2003-01-26 03:11 · Score: 5, Informative

We are seeing problems with off_t growing from 32 to 64 bits. We are also going to see this when we start going to a 64 bit time_t, as well (albeit not as badly - off_t is probably used more than time_t is.)

However, the pain is coming - remember we have only about 35 years before a 64 bit time_t is a MUST.

I'd like to see the major distro venders just "suck it up" and say "off_t and time_t are 64 bits. Get over it."

Sure, it will cause a great deal of disruption. So did the move from aout to elf, the move from libc to glibc, etc.

Let's just get it over with.

--
www.eFax.com are spammers

Re:It will happen with time_t, too by koreth · 2003-01-26 06:49 · Score: 1

First of all, it's a Y2038 problem rather than a Y2106 problem because time_t is signed in many places. Simply switching to an unsigned time_t (who uses time_t to represent pre-1970 values?) will buy us an extra 68 years with minimal application grief, but the underlying problem will still be there.
It boggles my mind that Sun, for example, went to the trouble of building a whole host of interfaces and a porting process for 64-bit file offsets (see the lf64 and lfcompile64 manpages on Solaris) and yet they didn't bother to increase the size of time_t at the same time. If everyone is going to be recompiling their apps anyway, why not fix it all in one go?
On the application side, it should be noted that this isn't a problem for code written in Java, whose equivalent of time_t is already 64-bit (in milliseconds, granted, but that only eats about 10 of the extra 32 bits.) Obviously the Java VM won't be able to make up for the underlying OS not supporting large time values, but at least the applications won't have to change.
First one to start whining about Java's year-584544016 problem gets whacked with a wet noodle.
Re:It will happen with time_t, too by Ozric · 2003-01-26 07:03 · Score: 1

One going support, would be my guess.
Re:It will happen with time_t, too by stripes · 2003-01-26 08:09 · Score: 1

First one to start whining about Java's year-584544016 problem gets whacked with a wet noodle.

I remember seeing a Sun press relase about Java being Y2K complient, how long it would last, and that Sun promised to fix it at least 3000 years beofre it became a problem. Or something like that. It amused me greatly at the time.
Re:It will happen with time_t, too by Spoing · 2003-01-26 09:48 · Score: 1

Only 35 years? Phew! Talk about a cutting it close!

--
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Re:It will happen with time_t, too by John+Sullivan · 2003-01-27 05:57 · Score: 1

A lot of the Y2K problem was caused by code written within the last 35 years. It wasn't a disaster, but it was a big problem, and a huge drain on many companies' resources as the tried to fix things up at the last minute.

--
This is my World Wide Web of Whatever
Re:It will happen with time_t, too by Spoing · 2003-01-27 13:55 · Score: 1

A lot of the Y2K problem was caused by code written within the last 35 years. It wasn't a disaster, but it was a big problem, and a huge drain on many companies' resources as the tried to fix things up at the last minute.
Agreed, somewhat. Unlike the Y2K problems, the time issue in Unix systems is fairly transparent to the implementation. Update the OS, and the problem goes away in many cases. Y2K, in the case where 2 bits where used instead of 4, were usually program specific.
Because of that, in 35 years few programs will be impacted.

--
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Re:It will happen with time_t, too by John+Sullivan · 2003-01-28 00:52 · Score: 1

Hmm, not really. The OS and applications often exchange time_t's, and they're often stored as 4-byte quantities in data files. Update the OS and you either break binary compatibility, or have to dual the API which provides no benefit to applications until they are updated too. Update the app and you may still have an enormous data conversion problem. Which defines the scale of the problem - there's an awful *lot* of bad application code out there. In mission critical systems. Even in places where the people using it day to day have forgotton or never known it exists.
Plus history teaches us that organisations *will* wait until the last minute before updating mission critical systems. Ignore that lesson and you're certainly doomed to a repeat of Y2K.

--
This is my World Wide Web of Whatever

A woman's perspective . . . by pariahdecss · 2003-01-26 03:12 · Score: 5, Funny

So my wife says to me, "Honey, do I look fat in this filesystem ?"
I replied, "Sweetie, I married you for your trust fund not your cluster size."

Re:A woman's perspective . . . by egreB · 2003-01-26 12:30 · Score: 1

Now, THAT's about the best thing I've read all day (-8 Thank you, my friend, you just made it to the Quotes-section of my door!

Re:Why large files by CoolVibe · 2003-01-26 03:13 · Score: 5, Interesting

raw video can easily exceed 2 GB in size. Why raw video? Because (like others said) it's easier to edit. Then you encode to MPEG2, which will shrink the size somewhat (usually still bigger than 2 GB, ever dumped a DVD to disk?), so it'll be "small" enough to burn onto a DVD or somesuch. Oh, editing 3 hours of raw wave data also chews away at the disk size. Also, since you need to READ the data from the media to see if it looks nice, you need to have support for those big files as well. Right, now why don't we need files bigger than 2 GB again? Well?

Oh, you're still not convinced, well see it this way: when in the future will you ever need to burn a DVD?

Well? A typical one sided DVD-R holds around 4 GB of data (somewhat more), if you use both sides, you can get more than 8 GB of data on it. That's way bigger than 2 GB, no? Now, how big must your image be before you burn it on there? well?

Right...

Re:Why large grapes by edox. · 2003-01-26 03:13 · Score: 1

Dont be the good old fox .)

--
quote:port 17 udp

Re:Unices? by moonbender · 2003-01-26 03:13 · Score: 2, Informative

Yes. Just like "matrices" is the plural of "matrix". Not that the words have a similar etymology - according to dictionary.com it's, in the authors' words, "A weak pun on Multics".

--
Switch back to Slashdot's D1 system.

64KB memory segments by KDan · 2003-01-26 03:14 · Score: 1

Oh come on, those were fun, when you had to load into memory and uncompress a file larger than that :-)

Oh the fond memories :-)

Daniel

--
Carpe Diem

How large are we talking? by httpamphibio.us · 2003-01-26 03:15 · Score: 1

It doesn't give a specific filesize in the article...

--
sig.

Re:How large are we talking? by voodoopriestess · 2003-01-26 03:18 · Score: 1

2^32 Bytes (aka 2GB).

Iain

--
---- "I would be careful in separating your weirdness, a good quirky quantum weirdness, from the disturbed weirdnes
Re:How large are we talking? by kasperd · 2003-01-26 04:03 · Score: 1

2^32 Bytes (aka 2GB).

Make that 2^31.

--

Do you care about the security of your wireless mouse?
Re:How large are we talking? by NoOneInParticular · 2003-01-26 04:50 · Score: 1

Ah, this 2^31 brings back memories of the time I had a box for scientific work with appr 4Gb of addressable memory (most of it RAM, but also some swapspace), and wanted to view some kind of lame proprietary video format, with proprietary viewer. When starting up the application it would complain I had less than 4 MB of memory (while in fact I had a thousandfold of that).
Hmm, the programmers seemed to store the information in an int, so by allocating 2 MB of memory (through Matlab, zeros(10000,10000) is quite a chunk), I could finally convince the application that I did not have negative memory, but actually enough to display the movie.
But then the video was lame.
Re:How large are we talking? by kasperd · 2003-01-26 11:32 · Score: 1

When starting up the application it would complain I had less than 4 MB of memory

I recall some story about a similar bug in MS Basic. If it was used on a PC with more than 512KB of RAM it would say there was not enough RAM. In DOS RAM was measured with a 16bit number counting units of 16 bytes.

--

Do you care about the security of your wireless mouse?

Re:Wrong point of view. by Anonymous Coward · 2003-01-26 03:16 · Score: 1, Interesting

As others have noted, there are plenty of good reasons to have files greater than two gigs including video editing and scientific research. The file size limits aren't there for a very good reason at all. Someone years ago had to weigh whether to make small files take up a huge amount of room by using 64 bit addresses that would allow multi-terabyte files to exist against using 32 bit addresses that would make small files smaller and create a 2 gb file limit. At the time, it made perfect sense because nobody was using files anywhere near 2 gb... But now they are.

Re:Wrong point of view. by KDan · 2003-01-26 03:17 · Score: 4, Insightful

Two words:

Video Editing

Daniel

--
Carpe Diem

Q: Why large files? A: Disk images too by Anonymous Coward · 2003-01-26 03:17 · Score: 2, Interesting

While almost all the examples given are good, I don't think anyone has mentioned complete disk images. I have recently had to do this in order to recover from a hardware issue (drive cable failure resulted loss of MBR, nasty) and on a TiVo unit that had a bad drive.

I have most all of my older system images available to inspect. The loopback devices under Linux are tailor made for this type of thing.

I am puzzled as to why you mention the seek times. Surely you would agree that the seek time should be only inversely geometrically related to size, the particular factors depending on the filesystem. Any deviation from the theoretical ideal is the fault of a particular OS's implementation. My experience is that this is not significant.

(user dmanny on wife's machine, ergo posting as AC)

Funny...in AIX... by cshuttle · 2003-01-26 03:18 · Score: 4, Informative

We don't have this problem-- 4 petabyte maximum file size 1 terabyte tested at present http://www-1.ibm.com/servers/aix/os/51spec.html

Re:Funny...in AIX... by n3m6 · 2003-01-26 04:11 · Score: 2, Insightful

whenever something like this comes up. somebody just has to say "we dont' have a problem, we use X"

that's just so lame. we have XFS and JFS. you can keep your AIX and your expensive hardware with you.

thanks.
Re:Funny...in AIX... by Lu+Xun · 2003-01-26 04:31 · Score: 1

Ok, but how many Library of Congresses is that?

--
That's not a soda... it's a caffeine delivery device!
Re:Funny...in AIX... by Anonymous Coward · 2003-01-26 05:15 · Score: 1

> we have XFS and JFS. you can keep your AIX and your expensive hardware with you.

whenever something like this comes up. somebody just has to say "we dont' have a problem ($), we use X" that's just so lame.

I guess its also a well known fact that cheap-ass hardware held together with spit and rubber bands has no scaling limits, especially dealing with terabyte+ files.

By the way.... you do know where JFS came from, right????? And XFS? If not, your lamer points just went through the roof.
Re:Funny...in AIX... by SN74S181 · 2003-01-26 13:28 · Score: 1

Expensive hardware?

My only AIX hardware cost $35 at the auction a few weeks ago. Granted it only has 128 MB, and it's PPC chip is less than 200 MHz, but hey.

I don't run AIX on it, of course. It runs NetBSD.

Have you ever seen some people's email? by alen · 2003-01-26 03:19 · Score: 4, Insightful

On the Windows side many people like to save every message they send or receive to cover their ass just in case. This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.

Re:Have you ever seen some people's email? by nentwined · 2003-01-26 03:45 · Score: 5, Funny

I agree with MS on this one. government employees shouldn't be allowed to hold their positions for longer than a year. DOWN WITH GOVERNMENTAL CORRUPTION! ... :)

--
heaven
Re:Have you ever seen some people's email? by sqrlbait5 · 2003-01-26 03:46 · Score: 2, Informative

Yeah, but if you're using NTFS, where there doesn't appear to be a max file size, you still get the 2GB limit on Outlook files. Every damn version of Outlook has had this 2GB limit, but OutlookXP doesn't actually fix the problem, just warns the user at 1.87GB. We have people hitting their limit all the time at work, but that's because they like to send artwork and whatnot and not clear out their folders.

--
LDAA #$80 BITA 0x40 BNE END
Re:Have you ever seen some people's email? by kasperd · 2003-01-26 04:06 · Score: 2, Insightful

2GB in a year or less.

They probably don't write emails but instead write Word documents and attach them to empty emails.

--

Do you care about the security of your wireless mouse?
Re:Have you ever seen some people's email? by sean23007 · 2003-01-26 04:47 · Score: 1

Don't call that just a Windows phenomenon. There are many cases where it is a good idea to save every email you get. Then again, there are others where it is a good idea to destroy all the evidence. Either situation can happen to you regardless of what OS you use.

Just saying, is all.

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.
Re:Have you ever seen some people's email? by spongman · 2003-01-26 04:53 · Score: 1

ouch. you're not using exchange, i take it?
Re:Have you ever seen some people's email? by jonathanbearak · 2003-01-26 05:36 · Score: 1

i thought they don't guarantee anything past 70mb.
of course, if they switched to something like maildir, there goes half the job market for mcse's.
Re:Have you ever seen some people's email? by ediron2 · 2003-01-26 08:34 · Score: 1

This is very popular among US Government employees. Some people who get a lot of email can have their personal folders file grow to 2GB in a year or less. At this level MS recommends breaking it up since corruption can occur.
Hmm... is that the filesystem or the Government that gets corrupt when it gets too fat?
I also laughed at the thought of a convicted monopolist like MS recommending this breakup.
Re:Have you ever seen some people's email? by InfiniteWisdom · 2003-01-26 08:36 · Score: 1

> Have you ever seen some people's email?

Stop being so nosy and stick to reading your OWN e-mail!!!
Re:Have you ever seen some people's email? by benzapp · 2003-01-26 10:29 · Score: 1

Now that MS has file system level encryption, you would think they would simply store each email as a file. I can think of no other reason why storing every email in a single file makes sense.

If NTFS is so good, it should be able to handle tens of thousands of small files no problem.

--
I don't read or respond to AC posts
Re:Have you ever seen some people's email? by egreB · 2003-01-26 12:43 · Score: 1

Do not make jokes about that - it's actually quite true. I've seen a lot of mails with no content or "See the attached file" where an attached MS Word document contains the .. content.

Sometimes, the reason for this is because the content is formatted especially in Word, but most of the time it's just a letter or something.

I've gotten some of these. I just reply and request the information in an open format.
Re:Have you ever seen some people's email? by kasperd · 2003-01-26 12:57 · Score: 2, Funny

Do not make jokes about that - it's actually quite true.

Guess what..... I wasn't joking.

I just reply and request the information in an open format.

So do I. Sometimes I send the reply in a .dvi file. I got a surprise the day a friend of mine managed to read the .dvi file I had attached.

--

Do you care about the security of your wireless mouse?
Re:Have you ever seen some people's email? by egreB · 2003-01-27 03:14 · Score: 1

Guess what..... I wasn't joking.

Heh.. Sorry (-8 That'll teach me to read posts twice before replying to them. English isn't my native language, so I misunderstood the tone. It could have been a joke, though, if people didn't actually send their mails as MS Word attachments. Oh well.

As for sending replies in DVI-format, that's a good idea. I'll do that next time (-8

Re:Wrong point of view. by N1KO · 2003-01-26 03:20 · Score: 1

In a couple of years, will todays large files be considered large? Ten years ago having hundreds of 4MB files on a pc would've been considered crazy. Now everyone with an mp3 player is used to it.

Re:Why large files by bourne · 2003-01-26 03:21 · Score: 2, Interesting

Can anyone give a good reason for needing files larger than 2gb?

Forensic analysis of disk images. And yes, from experience I can tell you that half the file tools on RedHat (like, say, Perl) aren't compiled to support >2GB files.

Re:huh? by KDan · 2003-01-26 03:21 · Score: 1

It's certainly something that George Orwell would have frowned upon, but it's not incorrect sentence construction per se.

PS: Read that Orwell article if you haven't yet, it's really very good

--
Carpe Diem

Re:huh? by JanneM · 2003-01-26 03:21 · Score: 2, Informative

Because the sentences mean different things.

"It is an interesting problem that some distro-compilers have to face."

talks about the problem facing distro compilers, whereas

"It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."

Talks about the article adressing these problems. /Janne

--
Trust the Computer. The Computer is your friend.

Switch to gnu/hurd by Anonymous Coward · 2003-01-26 03:22 · Score: 3, Funny

It has a nice small 1gb filesystem limit. I have partitioned my hard disk in to 64 little chunks and it runs very slowly, and unstabilly, but its completley open source and im happy.

Re:Switch to gnu/hurd by /dev/trash · 2003-01-26 06:50 · Score: 1

Is this true? If so I don't think I'll ever try it out.
Re:Switch to gnu/hurd by shepd · 2003-01-26 12:14 · Score: 1

It's 2GB, but yes, it is true, HURD is the pinnacle of what happens when you just let people do what the hell they like without any management whatsoever. All you programmers might hate your managers, but honestly, without them, you'd end up with HURD-like projects -- a decade late, and still half a decade to go.

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC

Re:Wrong point of view. by heby · 2003-01-26 03:22 · Score: 5, Funny

"oh yes, those were the days." - misty eyed smile - "when i was young and filesizes were small. you should have seen it. today's youth is so spoiled that they don't even learn assembly language any more. i tell you, you're all going to die because of your large files, yes, die!" - madly waves his cane in the air - "2gb, that's more than anybody will ever need and you are greedy for even more! the holy bit will punish you for this, it will!" - dies of a heart attack.

Re:Unices? by Looke · 2003-01-26 03:23 · Score: 1

Geeks seem to have a weird fascination for strange spellings. "-ces" is the traditional plural ending of Latin words ending in "x". Obviously, "Unix" does not originate from Latin, and "Unices" is thus nothing but a (bad) joke. (The same applies to "emacsen", and there are a few others around as well.)

Re:huh? by david-currie · 2003-01-26 03:28 · Score: 1

Because it's not an interesting problem. It's a fucking boring problem if _you_ have to deal with it. But it's interesting to read about because it's the kind of thing you probably haven't thought about if you don't compile distributions. I meant what I wrote.

Umm, scientific computing by Anonymous Coward · 2003-01-26 03:29 · Score: 1, Insightful

Many large-scale computing projects easily generate hundreds of gigabytes and even terabytes of data. They are writing to RAID systems and even parallel file systems to improve their IO.

Think beyond the little toy that you use. These projects are using Unix (Solaris, Linux, BSD and even MacOSX) on clusters of hundreds or thousands of nodes.

Re:Wrong point of view. by Anonymous Coward · 2003-01-26 03:30 · Score: 1, Insightful

the use of large files tempts users to store all kinds of redundant, reducible, linear and irrelevant data wasting storage space and I/O time

As opposed to a million 4k files that are each 1k of header?

Re:Wrong point of view. by cvande · 2003-01-26 03:30 · Score: 5, Insightful

In a world everything is small and manageable. Unfortunately, some databases need tables BIGGER than 2gb. Even splitting that table into multiple files still finds you with files larger than two gb. Try adding more tables? OK. Now they've grown to over 2gb and the more tables the more complicated everthing gets. I still need to back these suckers up and a backup vendor that I won't name can't help me because their software wasn't large file (for Linux) ready. So let's get into the game with this and make it the default so we don't need to worry about these problems in the future. Linux IS an enterprise solution.....(my $.02)

Re:Why large files by benevold · 2003-01-26 03:31 · Score: 2, Insightful

We use a Unidata database here for an ERP system, each database is more than 2gb a piece (more like 20 gb) of relatively small files, when the directories are tarred for backup reasons they are usually over 2gb which means that gzip won't compress them. Unless I'm missing something I don't see an alternative for files large than 2gb in this case. Sure on the personal computing level the closest thing you probably get is ripping DVD's but there are other things out there, and I realize this is tiny in comparison to some places.

Wrong. by I+Am+The+Owl · 2003-01-26 03:32 · Score: 1

You obviously have never done any work with video before. Most DV will eat up 2GB easy with 15min of footage or less.

--

--sdem

Its not the size of the file... by bananaape · 2003-01-26 03:34 · Score: 1, Funny

Its how you use it.

MOD UP by xintegerx · 2003-01-26 03:34 · Score: 1

I e-mailed somebody on the Board of Higher Ed of my State for some answers, and they simply replied

Please call me at #-###-###-###.

Thanks

He has a really good point if mail programs put archives in one big zip-equivalent file, because these CAN get huge.

--

Cover your eyes and click this link!

Re:MOD UP by DAldredge · 2003-01-26 03:54 · Score: 1, Insightful

Thats not why he wanted you to call him. If he answered your questions via email there would have been a record of what he had said.
Re:MOD UP by Webmonger · 2003-01-26 12:33 · Score: 1

I sometimes do tech support, and I often find it much easier to arrive at the right answer over the phone than via email. The immediate feedback, the ability to get clarification, to discuss alternatives all make it my preferred method.

Though your theory may be correct in this case. Who can say?

Re:Why large files by Veteran · 2003-01-26 03:35 · Score: 1

I have run into problems trying to compress a tar archive of my home directory which has been around since 1995 when I switched to Linux. The two gig limit runs into trouble here.

Re:Why large files by kasperd · 2003-01-26 03:35 · Score: 3, Insightful

The seek times alone withinr these files must be huge

Who moded that as Insightful? Sure, if you are using a filesystem designed for floppy disks, it might not work well with 2GB files. In the old days where the metadata could fit in 5KB a linked list of diskblocks could be acceptable. But any modern filesystem uses tree structures which makes a seek faster than it would be to open another file. Such a tree isn't complicated, even the minix filesystem has it.

If you are still using FAT... bad luck for you. AFAIK Microsoft was stupid enough to keep using linked lists in FAT32, which certainly did not improve the seek time.

--

Do you care about the security of your wireless mouse?

Re:Unices? by david-currie · 2003-01-26 03:36 · Score: 1

I'd never heard emacsen, but VAXen is commonly used for multiple VAX machines, I believe.

Re:Why large files by martinschrder · 2003-01-26 03:37 · Score: 1

Bitmap files for image setters can easily become huge. Think of 500x100(cm)x1000x1000(pixels).

Why not to learn from past? by Libor+Vanek · 2003-01-26 03:41 · Score: 2

I just wonder why we don't learn from past (limits) and remove this limits "forever". E.g. 1 month ago I recieved question of possibility building 10 TB Linux cluster (physics are crazy ;-)).

There surely MUST be some way how to do this - I just imagine some file (e.g. defined in LSB) which would define this limits for COMPLETE system (from kernel, filesystems, utils to network daemons). I know there are efforts to things like this but if we'd say (for example) thay that distribution in 2004 won't be marked "LSB compatible" if ANY of programs will use any other limits I think it will create enough preasure on Linux vendors.

Just a crazy idea ;-)

Re:Why not to learn from past? by n3m6 · 2003-01-26 04:15 · Score: 1

there is no spoon and there is always a limit.

the problem is where its sticking at . ;)
Re:Why not to learn from past? by Libor+Vanek · 2003-01-26 04:37 · Score: 1

The point of my posting is to have limits in only one file for complete system and need to just change it there.

The O/S should do it and do it well. by tjstork · 2003-01-26 03:41 · Score: 3, Interesting

1) Splitting up a big file turns an elegant solution into a an inelegant nightmare.

2) Instead of 10 different applications writing code to support splitting up an otherwise sound model, why not have 1 operating system have provisions for dealing with large files.

3) You are going to need the bigger files with all those 32 bit wchar_t and 64 time_ts you got!

--
This is my sig.

Re:Wrong point of view. by costas · 2003-01-26 03:42 · Score: 4, Insightful

Maybe in your problem domain that's true. I work with retailer data mines and we've hit the 2GB file limit, oh, 4-5 yrs ago? We've been forced to partition databases causing maintainance issues, scalability issues, and the like, just because of the size of a B-tree index.

True, it looks like the optimal solution is lower-level partitioning, rather than expanding the index to 64bits (tests showed that the latter is slower), but that still means that the practical limit of 1.5-1.7 GB per file (because you have to have some safety margin) is far too constraining. I know installations who could have 200GB files tomorrow if the tech was there (which it isn't, even with large file support).

I am also guessing that numerical simulations and bioinformatics apps can probably produce output files (which would then need to be crunched down to something more meaningful to mere humans) in the TB range.

Computing power will never be enough: there will always be problems that will be just feasible with today's tech that will only improve with better, faster technology.

Re:data warehouse, and any database for that matte by CrudPuppy · 2003-01-26 03:42 · Score: 2, Informative

the datafile size averages 8GB in the warehouse.

--
A year spent in artificial intelligence is enough to make one believe in God.

Re:huh? by RumpRoast · 2003-01-26 03:44 · Score: 2, Interesting

Actually you changed the meaning of that sentence. I think really we object to:

"It's an interesting look into some of the kinds of less obvious problems that distro-compilers have to face."

"of the kinds" really adds nothing to the meaning here, nor does "have to"

Thus we have:

"It's an interesting look into some of the less obvious problems that distro-compilers face."

The same sentence, but much cleaner!

Thanks! I'll be here all week.

--

My Ass hurts.

Re:Wrong point of view. by Q+Who · 2003-01-26 03:47 · Score: 1

Lmao...

Your other trolls are nice too, but this one is hilarious... "entropy pollution", hehe :)

"Linux of Windows XP bootloader", this one is amazing. I wonder whether it's a typo, or intentional...

Re:Unices? by N1KO · 2003-01-26 03:47 · Score: 1

Virus comes from latin.

Re:Why large files by Perl-Pusher · 2003-01-26 03:48 · Score: 1

Science Data usually consist of huge multidimensional arrays. I have seen satellite data in huge netcdf files that are very close if not slightly larger than that.

Re:Why large files by markz · 2003-01-26 03:49 · Score: 1

database dumps - one of our smaller database dumps is 2.3 GB compressed. The dumps are the easiest method of backup and distribution - locally and (very) remotely.

Re:Why large files by bunratty · 2003-01-26 03:49 · Score: 2, Interesting

Over Christmas and New Years, I helped my wife run a simulation of 1000 different patients for an acedemic pharmacokinetics paper. The run took ten days and had an input file of about 1.5 GB. If her computer was faster, or she had access to more computers, she would have wanted to simulate more patients and would easily have needed support for files larger than 4 GB. As CPUs get faster and hard disks get larger, there will be much more demand for these large files as well as more than 4 GB per process.

--
What a fool believes, he sees, no wise man has the power to reason away.

Re:640K is enough for you! by SoSueMe · 2003-01-26 03:51 · Score: 1

Who are we to tell them what they have to accomodate?
Don't like the way a particular *NIX works? Don't use it.
Try something else.

Re:Why large files by gbitten · 2003-01-26 03:51 · Score: 1

Another example of large file utility are the database files. In my job, the DB machine (Solaris) hasn't sufficient disk space to generate the DB dump. The biggest dump have 11GB and I wasn't able to put it in Linux box (RH 6.2), so I used FreeBSD 4.2 with sucess.

BeOS Filesystem by SixArmedJesus · 2003-01-26 03:51 · Score: 2

I remember reading in the BeOS Bible that the BeOS filesystem could contain files as large as 18 petabytes. Makes you wonder two things: What's the biggest filesystem that you could use with a BeOS machine? and Why don't other OSs have filesystem like this. Espcecially with those awesome extended attributes. I weep for the loss of the BeOS filesystem...

--

*slight crashing sound*

Re:BeOS Filesystem by Yokaze · 2003-01-26 04:06 · Score: 4, Informative

Mine is bigger than yours :)

Linux XFS: 9 exabytes

Also supports extended attributes.

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
Re:BeOS Filesystem by SixArmedJesus · 2003-01-26 07:55 · Score: 1

Hmmmm... yes, yours is bigger. Thanks for the info. As far as the BeOS filesystem, though, I really liked the attributes because they could be any size, AND they could be arbitrary binary data. So, for example, file icons could be kept in the attributes. Or sounds (although how that would be useful, I dunno). Another good example of this was the default BeOS text editor. It could handle colors, bold, italic, and underline, much like a word processor, but it was still a plain text file. The style data was kept in the attributes. Also, I think my real disappointment with the whole XFS thing is that there are no file managers that can handle those extended attributes. In BeOS, the file manager utilized these attributes with ease. Icons were displayed, you could layout a manager window so that it would display the various types of data, and you could basically use the filesystem as a database. And the tools for it were so easy to use and readily available. I just haven't seen it with Linux yet. Although, if I found one, I'd surely give it a go.

Another quick note, I've read somehwere that FreeBSD's UFS2 also has extended attributes. It would interesting to know how that would compare to both the BeOS FS and XFS as far as file size and what types of attributes it supports.

Thanks again for the info!

--

*slight crashing sound*

Re:Why large files by joto · 2003-01-26 03:51 · Score: 1

Can anyone give a good reason for needing files larger than 2gb?

Yes. Sometimes you need to store a lot of data. Even DVD's has 4.3 GB of data these days. But that's not even much compared to the amount of data we handle in seismic research. I would believe astronomists, particle physicists and a lots of other people also routinely handle ridiculous amounts of data.

By the way, in producing the DVD, you would naturally work with uncompressed data. How would you handle that?

The seek times alone withinr these files must be huge, and it smacks a bit of inefficienecy

And because it is inefficient, we should not support it? As a matter of fact, any file larger than one disk-block is inefficient. Maybe we should stop supporting that as well?

sure its just as bad to have an app use hundreds of say 4kb files or so, but two GIGABYTES???

As I've said, it's not really that much, depending on the application.

Re:huh? by david-currie · 2003-01-26 03:52 · Score: 1

Now this I can accept. I promise to think about what I write next time. ;)

Re:Unices? by Q+Who · 2003-01-26 03:52 · Score: 1

Just like "matrices" is the plural of "matrix".

"Matrices" is a plural form of "matrix." The other one is "matrixes."

Re:Wrong point of view. by Yokaze · 2003-01-26 03:52 · Score: 4, Interesting

I'm not a specialist on this matter, so maybe you can enlighten me, where I am wrong or misunderstood you.

> fragmentation: large files increase to fracmentation of most file systems
What kind of fragmentation?

Small files lead to more internal fragmentation.
Large files are more likely to consist of more fragments, but when splitting this data into small files, those files are fragments of the same data.

>entropy pollution
What kind of entropy? Are you speaking of compression algorithms?

Compression ratios are actually better with large files than small files, because similarities between files across file-boundaries can be found. Therefor, gzip(bzip2) compresses a single large tar-file. (Simple test, try zip on many files and then zip without compression and subsequent compression on the resulting file).

>data pollution
How should limiting file size improve that situation? Then, people tend to store data in lot of small files. What a success. People will waste space, whether there is a file size limit or not.

>These limits are there for very good reasons and in my opinion they are even much to big.

Actually, they are there for historical reasons.
And should a DB spread all its tables over thousands of files instead of having only one table in one file and mmapping this single file into memory? Should a raw video stream be fragmented into several files to circumvent a file limit?

>[...] original K&R Unix [...] was much faster than modern systems

Faster? In what respect?

--
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"

Re:Why large files by Zathrus · 2003-01-26 03:53 · Score: 2, Interesting

In my previous job we regularly processed credit data files >2 GB. All the data is processed serially (as someone else mentioned), so seek time is not an issue (nor is it an issue in a binary data file - seek to 1.4GB. Done. Next.).

The real issue we ran up against was compression... we wanted to have the original and interm data files available on-disk for awhile in case of reprocessing. The processing would generally take up 10x as much space as the original data file, so you compressed everything. Except that gzip can't handle files >2GB (at the time an alpha could, but we didn't want to touch it). Nor can zip. So we had to use compress. Yay. (bzip could handle it, but was decided against by the powers that be).

Compression of large files is still an issue, unless you want to split them up. Unless you download a beta version gzip still can't handle it. As I understand it zip won't ever be able to do it. There are some fringe compressors that can handle large files, but, well, they're fringe.

Re:Unices? by bunratty · 2003-01-26 03:53 · Score: 1

Oh, that brings up a pet peeve of mine -- when people call a matrix a "matricee"! When I hear someone say that word, I roll my eyes and think "this guy has no idea what he's talking about!"

Getting back on topic, maybe the plural for Unix should be Unixen, like the plural for Vax is Vaxen?

--
What a fool believes, he sees, no wise man has the power to reason away.

Re:Why large files by imnoteddy · 2003-01-26 03:55 · Score: 1

Databases.

The computer aided design databases for an automobile, when you have 3D models for the parts, the tooling, plant layout, etc. is in the low terabyte range. As another example, Boeing dedicates about 14 terabytes to commercial airplane geometry data storage.

Or Astronomy. A planning document talks about a project generating 300 terabytes per year.

--
No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.

Re:Why large files by Markus+Landgren · 2003-01-26 03:56 · Score: 1

Last time I wrote a 7 gig file it was an image of a hard disk. Lots of other stuff (video) can get large too. Anyway, there is an error in the headline. 2 gigs is not a limit in modern unices, only in ancient or otherwise really crappy unices.

Somewhat cumbersome, even on Linux by topologist · 2003-01-26 03:58 · Score: 2, Informative

To enable LFS (Large File Support) in glibc (which not all filesystems support), you need to recompile your application with
-D_FILE_OFFSET_BITS=64 and -D_LARGEFILE_SOURCE

This forces all file access calls to their 64-bit variants, and you'll explicitly need to use structs like off64_t instead of off_t where needed. And I believe most large file support is really available only past glibc 2.2

Additionally you need to use O_LARGEFILE with open etc. So legacy applications that use glibc fs calls have to be recompiled to take advantage of this, and may need source level changes. Won't work on older kernels either.

Re:Somewhat cumbersome, even on Linux by topologist · 2003-01-26 04:04 · Score: 1

Okay, I didn't see the link to the page which talks about all of this at the end of the article. Oops.

snicker by Rhinobird · 2003-01-26 03:59 · Score: 1

maybe the plural for Unix should be Unixen

Sudden though of "Linuxen the HOOOOOUUUSSSSE, bizzach!"

--
If Mr. Edison had thought smarter he wouldn't sweat as much. --Nikola Tesla

Re:Wrong point of view. by kasperd · 2003-01-26 04:00 · Score: 2, Interesting

I sure hope that was a joke. Because otherwise it would be one of the most clueless comments I have seen.

Sure spliting data into a lot of smaller files is going to reduce the fragmentation slightly, but it is not going to improve your performance. Because the price of accessing different files is going to be higher than the price of the fragmentation.

In the next two arguments you managed to make two opposite statements both incorrect. That is actually quite impressive.

First you say large files increase the entropy of the data stored on the disk. Which is wrong as long as you compare to the same data stored in diffeerent files. Of course if the number of files on the disk is constant smaller files will lead to less entropy, but most people actually want to store some data on their disks.

Then you say large files are highly redundant, which is the opposite of having a large entropy as claimed in your previous argument. And in reality the redundancy does not tend to increase with filesize, but might of course depend on the format of the file.

All in all you are saying that people shouldn't store many data on their disks, and the little data they do store should be as compact as possible, while still allowing it to be compressed even further when doing backups. You might as well have said people shouldn't use their disks at all.

Finally claiming older Unix versions were faster is ridiculous, first of all they ran on different hardware. And surely on that hardware they were slower than todays systems. And even if you managed to port an ancient Unix version to modern hardware, I'm sure it wouldn't beat modern systems in todays tasks. Which DVD player would you suggest for K&R Unix?

--

Do you care about the security of your wireless mouse?

Re:Wrong point of view. by Daytona955i · 2003-01-26 04:02 · Score: 1

We do too learn assembly... I specifically learned about the MIPS architecture. Hated it but they do still teach it in CS classes. We touched on it a bit in Programming language concepts and then in Systems Architecture I and II, we actually had to write assembly code. I remember the happy day when I got my one assigmnet to work, we had to grab the keyboard interupts and display them. None of my non-CS friends could understand why I was so happy to have text that I typed appear on the screen.
-Chris

Yep... by Kjella · 2003-01-26 04:06 · Score: 2, Informative

Some numbers for *uncompressed* video:

NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
PAL/YUV2/stereo: ~125gb for same

HTDV/surround: ~908gb for same

With huffyuv (very low CPU usage, lossless) you should be able to cut that by a factor of 2-3. But it's still *huge*

Kjella

--
Live today, because you never know what tomorrow brings

Re:Yep... by kasperd · 2003-01-26 04:10 · Score: 1

NTSC/YUV2/stereo: ~111gb for a cinema movie (1hr 45min)
PAL/YUV2/stereo: ~125gb for same

How did you reach those numbers? AFAIK NTSC and PAL use the same line frequency and have the same number of pixels per line, so that would lead to the same size.

--

Do you care about the security of your wireless mouse?

Error Prevention by Veteran · 2003-01-26 04:13 · Score: 2, Interesting

One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.

The 31 bit limit on time_t overflows in this century - 63 bits outlasts the probable life of the Universe so it is unlikely to run into trouble.

That is the best argument I know for a 64 bit file size; in the long run it is one less thing to worry about.

Re:Error Prevention by jhines · 2003-01-26 04:38 · Score: 1

The next significant problem with time will come in the year 9999, when the four digit field that lazy programmers have used for thousands of years overflows. Didn't they learn their lessons the first time around?

Digital took a bug report on this for Vax/VMS and promised a fix, some time in a future release.
Re:Error Prevention by Anonymous Coward · 2003-01-26 08:02 · Score: 1, Insightful

> limits on things so high that you can never reach them in the practical world.

The 2 GByte limit came from a time when 14 inch disks held 30 MByte and disk space and RAM was too precious to waste an extra 32 bits when these would always be all zero for the forseeable futute.

The concept of a hard drive that was as large as 2 GByte was just silly - it would fill the whole computer room, and in any case this is a limit on each file, not on the file system.
Re:Error Prevention by Thing+1 · 2003-01-26 08:44 · Score: 2, Interesting

One of the ways to keep errors from creeping into programs is to put limits on things so high that you can never reach them in the practical world.
Anyone ever thought of a variable-bit filesystem?
Start with 64-bit, but make it 63-bit. If the 64th bit is on, then there's another 64-bit value following which is prepended to the value (making it a 126-bit address -- again, reserve one bit for another 64-bit descriptor).
Chances are it won't ever need the additional descriptors since 64-bits is a lot, but it would solve the problem once-and-for-all.

--
I feel fantastic, and I'm still alive.
Re:Error Prevention by Ben+Hutchings · 2003-01-26 13:34 · Score: 1

This is not a problem for file-systems as stored on disk so much as it is a problem for the file-system API. Passing around and manipulating arbitrarily long numbers in memory is substantially slower than using fixed-length numbers and could result in a big performance penalty for file operations.
Re:Error Prevention by Istealmymusic · 2003-01-26 15:23 · Score: 1

I believe this is known as BER encoding (Perl's unpack uses the "w" format specifier to decode these types of integers). For each byte (or in your example, qword), the MSB is set if another unit follows, unset if not. Compresses quite well, but practically, its not worth it. Reading a fixed-size integer is an O(1) operation, BER integers are read much slower and mess up alignment.

--
"The lesson to be learned is not to take the comments on slashdot too literally." --Vinnie Falco, BearShare

Re:Why large files by perfects · 2003-01-26 04:15 · Score: 1

Bill Gates now claims that he was misquoted. What he really said was that "640K should be more than enough memory for anybody's toaster."

Re:Why large files by wideBlueSkies · 2003-01-26 04:17 · Score: 1

That tarball of 2002 stock quotes used to feed your stock research system.

The database files themselves, in the system.

--
Huh?

Re:Funny...in IRIX/XFS... by Anonymous Coward · 2003-01-26 04:17 · Score: 1, Informative

Other filesystems don't either :

http://www.sgi.com/software/xfs/techinfo.html

"Max. File Size
Designed to scale to 9 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 2 TB Max File Size. Solaris and Windows NT undergoing scalability testing"

"Max. File System Size
Designed to scale to 18 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 500 file systems of 2 TB each. Solaris and Windows NT undergoing scalability testing."

Unfortunately, it's not just a problem with the filesystem, but also and most often a problem with the applications. So, AIX does have this problem just as much as any other. Unless you've tested all the applications available for AIX.

It's all about efficiency. by OS24Ever · 2003-01-26 04:18 · Score: 2, Insightful

There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

it either

A) Wastes Memory Space
B) Wastes Code Space
C) Wastes Pointer Space
D) Or Violates some other tenant the programmer believes

So, When they go out and create a file structure, or something similar, they don't feel like exceeding some 'built-in' restriction to their way of thinking.

And usually, at the time, it's such a big number that the programmer can't think of an application to exceed it.

Then, one comes along and blows right through it.

I've been amused by all the people jumping on the 'it don't need to be that big' bandwagon. I can think of many applications that ext3 or whatever would need to use to make big files. they include:

A) Database Servers
B) Video Streaming Servers
C) Video Editing Workstations
D) Photo Editing Workstations
E) Next Big Thing (tm) that hasn't come out yet.

--

As a rock-in-roll Physicist once said, No matter where you go, there you are.

Re:It's all about efficiency. by dvdeug · 2003-01-26 14:10 · Score: 2, Insightful

There is something innate in the education, learning, and daily working of a programmer that makes them not want to use 'too big' of a number for a certain task.

We have code for infinite precision integers. The problem is, if it were used for filesystem code, you still couldn't do real-time video or DVD burning, because the computer would be spending too long handling infinite precision integers.

As long as you're careful with it, setting a "really huge" number, and fixing it when you reach that limit is usually good enough.

A few more words: by JohnnyBigodes · 2003-01-26 04:18 · Score: 1

- Backups so a single file (no, I don't want to copy a fscking whole directory structure, thank you very much.
- Video editing.
- Large sound editing (multi-channel).
- Ever tried to create a DVD ISO image? there you go...
- Speaking of DVD's, *you* try dumping one to your harddisk with 2GB files.
- Disk images (ever had to Ghost around a boot-disk or boot-DVD with a disk image?)
- 3D animation files (probably included in the "video editing" section).

want me to go on? the list is bigger...

Please mod parent up. by wideBlueSkies · 2003-01-26 04:23 · Score: 1

Please mod this guy up as interesting or informative.

--
Huh?

I can't believe this...superSynchronicity??? by haggar · 2003-01-26 04:23 · Score: 2, Interesting

I had a problem with HP-UX apparently not wanting to transfer via NFS (when the NFS server is on HP-UX 11.0) files larger than 2GB. I had to backup a Solaris computer's hard disk using DD across NFS. This usually worked when the NFS server is Solaris. However, last friday it failed, when the server was setup on HP-UX. I had to resort to my little Blade 100 as the NFS server, and I had no problems with it.

I have noticed that on the SAME DAY some folks have asked question about the 2 GB filesize limit in HP-UX on comp.sys.hp.hpux !! Apparently, HP-UX default tar and cpio don't support files over 2 GB, either. Not even in HP-UX 11i. I never thought HP-UX stinked this bad...

How does Linux on x86 stack up? I decided not to use it for this backup, since I had my Blade 100, but would it have worked? Oh, btw, is there finally implemented on Linux a command like "share" (exsts in Solaris) to share directories via NFS, or do I still need to edit /etc/exports and then restart NFS daemon (or send SIGHUP)?

--
Sigged!

Re:I can't believe this...superSynchronicity??? by Arethan · 2003-01-26 06:48 · Score: 1

the command that is equivalent to 'share' is 'exportfs', it can usually be found in /usr/sbin/.

It allows you to push NFS exports to the kernel and nfsd without having to edit /etc/exports. Thus, they do not persist across reboots. However, you cannot use exportfs until nfsd is running, and nfsd will auto kill itself if /etc/exports is completely empty. So you must share at least 1 directory tree in /etc/exports before you can use exportfs.

I believe Solaris has this same problem with share though. I don't remember these days, it's been a while since my SCSA cert. (Heh, i guess that's what man pages are for :)
Re:I can't believe this...superSynchronicity??? by haggar · 2003-01-26 06:55 · Score: 1

Thanks.
And no, Solaris doesn't have this kind of problem. In Solaris, you have (a more general) /etc/dfs/* for sharing filesystems. Even if there is no fs shared in /etc/dfs/dfstab, nfsd and mountd will happily run. This autokill thing is really stupid.

--
Sigged!
Re:I can't believe this...superSynchronicity??? by haggar · 2003-01-26 06:58 · Score: 1

Oh yeah, so how does Linux cope with > 2 GB files transferred via NFS TO a Linux server? So far, only Solaris seem to support our solution. I have not tried Linux because the test takes some relatively considerable time, and if large files aren't supported to be transferred via NFS, I better not even try.

--
Sigged!
Re:I can't believe this...superSynchronicity??? by Arethan · 2003-01-26 10:59 · Score: 1

*shrug*
I can't comment on the autokill thing. That's how the NFS implementation in all the distros I've ever used worked. Unix is Unix, except for all the different little quirks. ;)

BTW: I just checked RedHat 8.0 for the autokill "feature". Looks like they fixed it. I just ran "/etc/init.d/nfs start" and it started, didn't bitch about /etc/exports only containing a single '#', and allowed me to then add an export using exportfs and I mounted it successfully afterwards. I even tried it after completely deleting /etc/exports. Still ran fine.

Long story short, the autokill is gone in RedHat 8. That's good in my book. I never liked that behaviour either. As for the >2GB files. I'm running a test right now just for shits and giggles, but I don't see why it wouldn't work. As long as the file system can handle the destination file's size, nfs3 should behave admirably. (BTW, Solaris 8 uses nfs ver3 last I checked.)
Re:I can't believe this...superSynchronicity??? by haggar · 2003-01-26 11:21 · Score: 1

Yes, Unix is Unix, yet time and time again I see that Solaris satisfies me a bit more than the rest of the crop. It's all small things, I know, but the fact that my crappy little Blade 100 turned effectively to be more powerful than a HP L2000 was quite shocking. What a difference an OS makes...
It's really good that this autokill is gone in RH 8.0. Still, if I think about the fact that Solaris 2.7 had it, it feels as if it took Linux really long to get it's act together on this little detail. On the other hand, Sun invented NFS, so it's no wonder they have it done right, even in the little details. It's no wonder that Solaris 8 has NFS ver. 3.

I'll be honest with you: if I manage to get a Linux server do the NFS server job for this backup procedure, it'll get a huge plus in my book, and it will get a lot of visibility with some large customers. But last time I had to work with Linux as NFS server, something was wrong with the locking.

--
Sigged!
Re:I can't believe this...superSynchronicity??? by Arethan · 2003-01-26 17:20 · Score: 1

For the record, my little test was successful. I moved a 3.7GB file across NFS between Linux boxes. Filesizes and md5sums match between the original and the NFS copy. The source and destination were both running RedHat 8, though Linux has used the NFS ver3 for quite a while now. Definitely as of the 2.2 kernel, probably back into the 1.x series even.

I will agree with you, Sun really has their act together when it comes to making a good commercial Unix. They have nice hardware, and their OS is pretty solid. They have a few memory leaks in some libraries (as of 2.8), but nothing that can't be worked around, and most applications that care take the bugs into account. Of course, Sun has been in the biz for quite a while. ;)

Anyhow, always nice chatting with fellow Slowaris er..Solaris junkie. ;)

Cheers!
Re:I can't believe this...superSynchronicity??? by haggar · 2003-01-26 17:47 · Score: 1

OK, I will then do my test here in the lab. If it works, it will be the second supported solution, apart from Solaris 8 and 9.

probably back into the 1.x series even
Umm... even if that was so (I tend to doubt that.. 1.x? 1.2.x was the first I saw doing anyting useful), back then there wasn't anything but ext2 surely didn't yet support large files. Th patch for ext2 for such support came out somewhere near the end of 1999. I think it was RedHat 6.0 the first to support it. If I recall correctly. But c'mon... Linux 1.x, I'm almost getting nostalgic. It didn't even have ext2, just the crappy old extfs...

--
Sigged!
Re:I can't believe this...superSynchronicity??? by Wolfrider · 2003-01-29 06:51 · Score: 1

--Eh, it was probably intended as a "security feature." Just ends up being annoying for the REAL root.

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:I can't believe this...superSynchronicity??? by haggar · 2003-01-29 23:48 · Score: 1

I see. OK. I am not sure how it would improve security, but I can imagine that it was the thought behind it. It's not like it's some bug, more like a deliberate decision, isn't it.

--
Sigged!

Cripes! by Hubert_Shrump · 2003-01-26 04:25 · Score: 1

That's three words.

I didn't realize Daniel was so big, though.

Has he considered going lossy?

--
Keep your packets off my GNU/Girlfriend!

PAL & NTSC by Kjella · 2003-01-26 04:30 · Score: 2, Informative

PAL: Max 720x576x25fps interlaced (50 Hz)
NTSC: Max 640x480x29.97fps interlaced (60 Hz)

No, the don't have same frequency, nor scanlines. Some european TVs will take PAL-60, like PAL only at 60Hz though. Also I don't think the color space works in the same way, but not sure about that one. That was why I used YUV2 (16bit) for both.

Kjella

--
Live today, because you never know what tomorrow brings

Re:Wrong point of view. by smoondog · 2003-01-26 04:32 · Score: 1

There is not a problem with support of large files in Unix system, there is a problem with incompetent people using too large files in Unix systems.

You are a troll. It is not up to administrators to decide how big a file needs to be. I do scientific research and deal regularly with datasets larger than 300GB. Single files often in the range of 2GB-10GB. For me to split up my data would create an enormous headache, and would be very slow.

-Sean

Re:Why large files by Proneax · 2003-01-26 04:32 · Score: 1

I remember like 4 or 5 years ago talking to my friend's dad, who works at kodak, and he would fill an entire 2gb jazz drive with one picture.

Re:Wrong point of view. by mickwd · 2003-01-26 04:37 · Score: 1

And the amazing thing is, everyone else seems to be taking it seriously.

Is it just me, or is Slashdot getting much less informed as the user count continues to increase ?

Re:data warehouse, and any database for that matte by SwissCheese · 2003-01-26 04:38 · Score: 1

Even our Exchange private information store is somewhere around 10GB, and we are a small company by most standards

Only 35 years... by Kjella · 2003-01-26 04:41 · Score: 1

And that big y2k problem that was supposed to bring down mankind? How many years did it take to fix that? I very much doubt we started in 1965 ;)

Prediction: First distro to "suck it up" will be around 2035 or so. Personally, I think this is so far down on the priority list as you can get. Besides, with open source, is there really that problematic to grep the source for "time_t" and fix it? I don't think so.

Kjella

--
Live today, because you never know what tomorrow brings

Re:Only 35 years... by Dan+Ost · 2003-01-26 05:26 · Score: 3, Informative

For most programs, it would require little more
than to change the typedef that defines __time_t
in bits/types.h.

For stupidly written programs that assume the
size of __time_t or that use __time_t in unions,
each will need to be addressed individually to
make sure things still work correctly.

--

*sigh* back to work...
Re:Only 35 years... by edhall · 2003-01-26 07:11 · Score: 1

The FreeBSD folks have already done a considerable amount of work on this, even to the point of making time_t 64 bits for both kernel and userland and testing for issues. Enough is known that the main worry now is how to handle the change in ports, some of which need a fair amount of work to move away from 32-bit time_t. But at the rate things are going, I'd expect that they will make the transition to 64-bit time_t for FreeBSD 6.0. I've no idea how they will handle the legacy issues (ports and pre-6.0 binaries) though.

-Ed

row partitions by axxackall · 2003-01-26 04:57 · Score: 1

I agree that 2 GB limit is obsolete today, especially for projects with large databases and with video editing tasks.

However, I would recommend to stay away from > 2GB files in database environment. Even if your FS supports large files, you still loose performance on "double-driver": first your kernel provedes a partition, than it provides a file-system over it. But if you need so big files, why would you need file-system? Just use row partitions!

Of course you still need large files for video, but massive concurrent preformance overhead is not a typical problem in such case.

--

Less is more !

Re:row partitions by lenski · 2003-01-26 06:26 · Score: 1

I agree, 2GB is "inadequate" for current large system applications, and for "new media", etc.
On the other hand, I question whether having >2GB flat files is a reasonable way to organize big data. (Movies have "scenes", music is often divided into "songs", or "movements" for the classically minded, plays have "acts", and so on. Hierarchy and subdivision come naturally in many domains of activity.
On the gripping hand, as 64-bit CPUs become more common (MMmm... Hammer...), I expect a relatively natural though not necessarily monotonic progression toward 64-bit addressing in flat files.
Re:row partitions by iamacat · 2003-01-26 06:35 · Score: 1

I don't think most database programmers can write better space allocation, I/O buffering or virtual memory code than good OS programmers. Did any of you guys write a database buffering code and used something better than a simple LRU list? Like taking physical disk layout into account? If you did, and it performed better than the OS on realistic benchmarks, why not write a reusable device driver that will improve performance of everything, not just the database?
Now it's possible that somehow you have a very good knowledge of your application-specific disk usage pattern and can get a speed up that outweighs user-mode overhead, system swapping your buffers in and out of memory and so on. In this case, you better use a dedicated disk rather than just a partitition. Otherwise, your I/O scheduling code will have interesting interactions with system's swapfile and other normal filesystem activity.
Even then you run a risk that OS code will one day improve and outperform your homegrown changes. Most programmers are better off just tuning their code to work well with OS native filesystem, virtual memory and so on.
Re:row partitions by pstemari · 2003-01-26 08:19 · Score: 1

Physical disk layout is no longer available with modern devices. Database layout across multiple physical devices is precisely what a good DBA is trained to handle.
As far as buffer management and filespace allocation inside a tablespace, that's precisely what Oracle or DB2 specialize in, using very sophisticated cross-process buffering techniques and cache hit scoring. None of that is home-grown. It's why you spring the big bucks for a serious database.

Re:Wrong point of view. by Simon+Brooke · 2003-01-26 05:09 · Score: 1

Is it just me, or is Slashdot getting much less informed as the user count continues to increase ?

It's not just you.

--
I'm old enough to remember when discussions on Slashdot were well informed.

Re:Why large files by addps4cat · 2003-01-26 05:18 · Score: 1

Hey everyone lets keep beating a dead horse and telling him the million and one ways that you need files greater than 2gb. Half of these posts just say "movies" anyway. So stop repeating yourselves.

--
Don't eat shrimp candy, just a heads up.

three words by Nick+Mitchell · 2003-01-26 05:20 · Score: 1

hate jar jar

What the hell? by White_Lightning · 2003-01-26 05:38 · Score: 1

Why'd they even mention DOS? All DOS programs are staticly linked. There are no dll's or anything like them (except overlays). The only thing close would be DOS Extenders. So, what does DOS have to do with it?

Re:What the hell? by mabinogi · 2003-01-26 11:46 · Score: 1

the problem is not DLLs specifically, static libraries cause problems too....

when the header file says off_t, and the library thinks off_t is 32 bits and the program linking to the library thinks it's 64 bits, then you have a problem.

The same sort of problem would presumably occur when a DOS library was compiled in large mode, but the program linking to it used small, or vice versa....

--
Advanced users are users too!
Re:What the hell? by White_Lightning · 2003-01-26 15:46 · Score: 1

Wouldn't that be a programmers error then? Either when writing the header file or using the wrong link library?
Re:What the hell? by mabinogi · 2003-01-26 22:11 · Score: 1

Yes, but if the programmer doesn't know how the library was compiled, then it's the distributor of the library's fault.

Which is why this article is putting the emphasis on getting the distros to ensure that they provide a consistent platform.

--
Advanced users are users too!
Re:What the hell? by White_Lightning · 2003-01-26 22:46 · Score: 1

I suppose I should have read the full article instead of just skimming over it.

Admittedly, I had problems with the need for... by constantnormal · 2003-01-26 05:41 · Score: 1

... 64-bit addressing before thinking this through. I couldn't see the significant advantage for more than a very tiny fraction of apps in being able to address more than a few gigabytes.

Now I can't wait for OS X to have 64-bit support for the IBM 970 processors (I do realize that it will take several releases before default 64-bit operation is practical).

When compared to clustered 32-bit filesystems, I would think that a "pure" 64-bit filesystem would have a number of very practical advantages.

I could easily see the journalled filesystem becoming one of the first 64-bit subsystems in OS X, right after VM.

Large filesystem lack more of a problem by mauriceh · 2003-01-26 05:45 · Score: 3, Interesting

A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
Many servers now have the physical capacity of over 2TB on a filesystem storage device.
Unfortunately this is still a very significant limitation.
This problem is much more commonly encountered than file size limitations.

--
Maurice W. Hilarius Voice: (778) 347-9907

Re:Large filesystem lack more of a problem by Xilman · 2003-01-27 00:27 · Score: 1

A much bigger problem is that Linux filesystems have a capacity limit of 2TB.
Many servers now have the physical capacity of over 2TB on a filesystem storage device.
Unfortunately this is still a very significant limitation.
This problem is much more commonly encountered than file size limitations
An interesting observation, but not one I've ever made. I'm much more likely to want to store over 2Gb of data in a single file than to want a 2Tb file system. Indeed, I don't have 2TB of disk to make into a file system, but I create large files relatively often.
Paul

--
Lasciate ogne speranza, voi ch'intrate
Re:Large filesystem lack more of a problem by mauriceh · 2003-01-27 00:38 · Score: 2, Informative

Roughly 50% of of the servers we build at present have over 1TB of storage.
Roughly 30% have over 2TB.

With a 3Ware 7500-12 IDE RAID card and 11x200GB disks we hit 2.1TB.

This costs about $6,000 in a server, so is a fairly popular option.

Next month Maxtor ships their 300GB drives (MAYBE, Maxtor have been lying about their release schedules lately). Once that happens, it will be a very common problem.

--
Maurice W. Hilarius Voice: (778) 347-9907

Re:data warehouse, and any database for that matte by hector13 · 2003-01-26 05:56 · Score: 1

These are file on a regular partition (ie, ext2 or somesuch)?? It still sounds totaly in-effecient to me. I have nothing against large files, but I would hope a db would be using something more effecient or atleast using its own filesystem (making the 2bg limit irrelevant).

I miss BeFS... by jonr · 2003-01-26 06:01 · Score: 1

18 EXAbytes file sizes, real journals, life queries...
*SOB*
J.

Re:Why large files by Sayjack · 2003-01-26 06:11 · Score: 1

Backup files, exporting a huge oracle database to a file. And, when I record divx quality video through my ATI card I can go through the GB like crazy.

A better question is, Who doesn't need largefile support?

As for the seek time...not everything is accessed like a random access file. I imagine that the backup data will be read in sequentially. The video file would mostly be handed sequentially other than when jumping to a chapter fast forwarding or reversing.

--

-- Good judgement comes with experience. -- Experience comes with bad judgement.

Re:there is no such thing as a double-sided DVD-R by psm321 · 2003-01-26 06:14 · Score: 1

Sorry, you're wrong.

http://froogle.google.com/froogle?q=9.4+dvd-r&btnG =Froogle+Search

Re:Why large files by AJWM · 2003-01-26 06:22 · Score: 1

Can anyone give a good reason for needing files larger than 2gb?

Video/movie files, for one thing. Even compressed (eg DV or MPEG) those things are huge. A 2 GB file at professional DV compression (50 Mb/sec) is about 4 minutes worth. (DV is similar to MJPEG, so it's still lossy. Uncompressed or unlossy compressed video (critical for machine vision or image analysis apps) chews even more space.

I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.

Other fields also use huge data sets - seismic data analysis for example. Filesystems designed for supercomputer clusters (eg PVFS) have unlimited size on the total filesystem (tens of terabytes is not unusual) although the individual file size may still be limited by the underlying OS or hardware word size.

Then there's creating a .zip or .tgz of a collection of big files. Or creating the equivalant of an ISO image of a DVD. And so on.

--
-- Alastair

Re:Why large files by AJWM · 2003-01-26 06:25 · Score: 1

The seek times alone within these files must be huge,

Depends on how your inodes are laid out, how big you have to get for triple indirect blocks, etc.

Shouldn't be any worse (and maybe better) than trying to seek through an equivalent collection of smaller files -- you've got to do all those directory searches, etc. (Exact comparisons will depend greatly on the filesystem and parameters chosen when the FS was created.)

--
-- Alastair

Re:Why large files by drinkypoo · 2003-01-26 06:26 · Score: 1

There is a need for a virtualizing filesystem which supports multiple volumes, offline and not, and files stored in segmented form to fit. It would be insanely handy in a clustering environment; The whole cluster could store the file (with some redundancy) and access it in a shared fashion. This would substantially improve the ease of working with inanely large data sets in a clustered scenario.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Why large files by drinkypoo · 2003-01-26 06:29 · Score: 1

Ostensibly your filesystem driver will be caching much of the list information in memory, thus for the uses to which fat32 is applied, it is still a reasonable method. There's a reason it's called fat32, it's a direct descendant.

Anyway those using a M$ OS which does not support NTFS are fooling themselves. If you are using some form of windows prior to Windows 2000, then you are getting a terrible experience which is nothing like the real OS -- NT. NTFS is a pretty good filesystem with journaling, ACLs, and implicit support for encryption and compression. Fat32 is shite.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Why large files by UnknownSoldier · 2003-01-26 06:48 · Score: 1

> NTFS is a pretty good filesystem with journaling,

That's only partially true -- it doesn't journal data, only meta-data.

Re:Wrong point of view. by ProtonMotiveForce · 2003-01-26 06:58 · Score: 1

Are you trolling? That's the biggest load of shit I've read in this thread. You are by far the worst offender of the nimwits coming out of the woodwork whining that nobody needs files bigger than 2GB.

You even mention K&R Unix and claim it was faster than modern systems and use that as some kind of yardstick?

Jeses Christ, that's stupid. It's not 1975 any more, and none of your blathering has any relevence to the modern day. Technology progresses, take your dinosaur ass to a VMS shop and bore us all with your claims of how advanced VMS is, but don't tell people what they need and don't need, and certainly don't bandy about the term "incompetent" when you're so obviously projecting.

RTFPP by xintegerx · 2003-01-26 07:09 · Score: 1

I was giving an example because the parent was 0, Offtopic at the time.

The example was that officials do worry about e-mail so they would either save it like he said or avoid typing it like I said. The point is that they would consider it important and that they would save e-mails that were sent.

--

Cover your eyes and click this link!

Re:RTFPP by DAldredge · 2003-01-26 07:41 · Score: 1

The have to save it for the same reason they do not like sending it. Open Records laws. It is much easier to take 2 or 3 different stands on an issue if those you talk to have no record...

Re:Why large files by drinkypoo · 2003-01-26 07:11 · Score: 1

Sure, but that's good enough to save people in almost all cases. I've never, EVER lost data on NTFS5 due to a crash (which has happened plenty) or a power failure (only twice since I started using it.) FAT32, on the other hand... Or ext2 for that matter, it doesn't matter. A partially journaling filesystem gets the job done well enough for basically any purpose. If it's not good enough for you, perhaps a filesystem is not the best place to store your data in the first place, I'd considered a clustered replicating RDBMS :P

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

The "l" in lseek() by edhall · 2003-01-26 07:26 · Score: 3, Informative

Once upon a time (prior to 1978) there was no lseek() call in Unix. The value for the offset was 16 bits . Larger seeks were handled by using the different value for "whence" (the third argument to seek()) which causes seeks to occur in 512-byte increments. This resulted in a maximum seek of 16,777,216 bytes, with an arbitrary seek() often requiring two calls, one to get to the right 512-byte block and a second to get to the right byte within the block. (Thank goodness they haven't done any such silliness to break the 2GB barrier.)

When Research Edition 7 Unix came out, it introduced lseek() with a 32-bit offset. 2,147,483,648 bytes should be enough for anyone, hmmm? :-).

-Ed

Re:gzip handles large files fine by Whelkman · 2003-01-26 08:00 · Score: 1

gzip works over 4 GB but loses the ability to accurately report uncompressed file sizes (minor).

Re:Wrong point of view. by orangesquid · 2003-01-26 08:01 · Score: 1

At least 2GB is better than the Multics large file support situation! Files were limited to the size of segments, which were at most 255K 36-bit words, which is equivalent to roughly one megabyte! The Multics designers didn't consider most users would have to ever have larger files than this. The first database product (ever!), MRDS, was severely limited, so Multics programmers created a (kludgy) workaround. Modern operating systems are designed differently and thus aren't limited to such (small) file sizes.

We have conquered this problem before, by redesigning filesystems to allow files bigger than segments, and we can conquer it again by allowing files bigger than the addressable range of a 32-bit processor's full word.

--
--TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive

Needs to be signed... by wowbagger · 2003-01-26 08:30 · Score: 1

The time_t type must be signed, so that you can represent negative time differences. If you make time_t unsigned, when you try to do things like saying "if this file is older than that file" you will get a very large positive time, rather than a negative time. Not good.

--
www.eFax.com are spammers

Re:Needs to be signed... by koreth · 2003-01-26 11:08 · Score: 1

No, the type of time_t - time_t must be signed. That doesn't imply that time_t must be signed. For example, (unsigned int) - (unsigned int) is int, not unsigned int.
And anyway, "if (time_t > time_t)" works fine with unsigned values.
Re:Needs to be signed... by Ben+Hutchings · 2003-01-26 13:29 · Score: 2, Informative

No, the type of time_t - time_t must be signed. That doesn't imply that time_t must be signed. For example, (unsigned int) - (unsigned int) is int, not unsigned int.

Wrong. The C99 standard says in section 6.3.1.8 paragraph 1:

Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result. For the specified operands, each operand is converted, without change of type domain, to a type whose corresponding real type is the common real type. Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result, whose type domain is the type domain of the operands if they are the same, and complex otherwise.

Here, the common real type is unsigned int, and the description of the addition and subtraction operators (section 6.5.6) does not specify a different type for the result when both operands have arithmetic type.

If you disagree, please cite relevant parts of the standard to support your case.
Re:Needs to be signed... by koreth · 2003-01-26 13:45 · Score: 1

I stand corrected. My assertion was based on the fact that, well, it works:
unsigned int a = 1, b = 2;
int c = a - b;
That consistently results in c == -1, at least on every C compiler I've used for the last 20-odd years. But if that behavior isn't actually part of the standard, then I guess we'd need to define some standard macros for unsigned time_t math to produce correct results portably.
Re:Needs to be signed... by doug363 · 2003-01-26 16:10 · Score: 1

You'd probably want to cast a and b to signed long longs before doing the subtraction to ensure that you got the right answer. Even then, an int wouldn't necessarily be large enough to hold the answer. (i.e. what if a=-1u and b=0? You get -1, when it should be 4 billion or so...)
Re:Needs to be signed... by Ben+Hutchings · 2003-01-27 00:07 · Score: 1

In fact you get undefined behaviour when you cast a value of unsigned type to the corresponding signed type and the value is out of range. Usually you'll just get a negative result though.
Re:Needs to be signed... by ejasons · 2003-01-27 07:31 · Score: 1

unsigned int a = 1, b = 2;
int c = a - b;

That consistently results in c == -1, at least on every C compiler I've used for the last 20-odd years. But if that behavior isn't actually part of the standard, then I guess we'd need to define some standard macros for unsigned time_t math to produce correct results portably.

This works because your system uses two's complement arithmetic. It would fail on a system with a different arithmetic system. It's unlikely that you'll ever work on such a system, but they could exist (and, more importantly, could exist, which is why the operation is undefined in the C standard!).

Re:Why large files by LarsG · 2003-01-26 08:35 · Score: 1

Can anyone give a good reason for needing files larger than 2gb?

DVD .iso images. :)

--
If J.K.R wrote Windows: Puteulanus fenestra mortalis!

obvious by larsl · 2003-01-26 09:52 · Score: 1

I would have snapped up puppy.mil in an instant.

Re:Why large files by mccalli · 2003-01-26 10:17 · Score: 1

Yes, I can give two.

Virtual PC (or VMWare or whatever), whereby various different OS installations are contained within their own virtual file systems (usually a single file of over three gig).
Video capture, whereby raw footage from my digital camcorder is dumped down onto the hard drive ready to be edited. Those files can be pretty vast as well.

Cheers,
Ian

Re:Wrong point of view. by binford2k · 2003-01-26 11:08 · Score: 1

;; signal/noise ratio is getting worse; I now read posts at +3 or above

Heh, how ironic that your post is only at 2 now ;)

Re:Why large files by kasperd · 2003-01-26 11:39 · Score: 1

Ostensibly your filesystem driver will be caching much of the list information in memory

Caching the tables in physical memory does of course help, but it doesn't remove the linear scan through a linked list. This linear scan takes time even if done in RAM. To improve performance the Linux driver for this filesystem caches a number of already resolved positions, I think this cache holds 8 entries. I found out about that once I needed simultaneous sequential access to 20 files on the same FAT32 filesystem. Performance was horrible. I had two options, either do access in very large blocks to keep the number of listscans low, or increase the cachesize and recompile my kernel. I don't remember which of the two options I chose.

--

Do you care about the security of your wireless mouse?

Re:Why large files by Admiral+Burrito · 2003-01-26 13:17 · Score: 1

I recently tried recording a one hour TV show with xawtv, to AVI (MJPEG, 640x480, 15 fps, 16-bit stereo sound). It appeared to record okay, and ended up 5 gigs. But I could only play the first few minutes of it with aviplay. Something (either xawtv, aviplay, or the AVI file format itself) has a 2 (or 4 (unsigned)) GB limit.

you probably use a firewall or something by xintegerx · 2003-01-26 15:36 · Score: 1

I would guess that a router or firewall or any device, maybe even a cable modem would filter that. If you think you're accessing through a firewall, that's probably why.

It works on Win98/Internet Explorer 5 with a direct connection to the cable modem.

--

Cover your eyes and click this link!

Not in Solaris 8 and above by jsimon12 · 2003-01-26 15:57 · Score: 1

Old news, Solaris 2.6 and 7. Solaris 8 is 64 by default. I hope they are not still developing for 2.6 :)

Re:Unices? by yuri+benjamin · 2003-01-26 18:20 · Score: 1

There is no plural, as virus is a plural word in latin already

Really? What declension does the word virus belong to?
I seem to recall that some declensions in latin have both the singular and plural ending in -us but it's ages since I studied latin - over a decade ago. I'm not even sure any more how to spell declension.

--
You make the mistake of thinking you can educate the fundamental stupidity out of people. You can't.

Re:Unices? by superyooser · 2003-01-26 19:33 · Score: 1

Rule of Grammar: When a word ends in x, you make it plural by adding es.
Examples: tax => taxes; sex => sexes; fox => foxes; box => boxes

The word ox, with its plural oxen, is a freak of English grammar. It is the exception, not the rule.

Examples of this bogus pluralization applied to similar words:

I hate doing my taxen.
Both sexen have positive and negative characteristic qualities.
The hunters shot three foxen in the woods.
Your boxen will be shipped in 4-6 weeks.

Both en and es have the same number of keystrokes and bits. en has no advantage, except the appearance of 1337ness to people who don't know better. So please stop using it and trying to one-up the dictionary. (This goes for virii and Unices too.) I know it's only being used with geeky words so far, but that only makes the rules of pluralization even more complicated.

The English language is convoluted enough without deliberately introducing more irregularities.

Re:Why large files by drinkypoo · 2003-01-26 22:16 · Score: 1

Don't you mean, increase the cache size, make modules... :)

Oh well. Anyway, I know that a linked list just plain isn't as efficient as a tree, but as you say there are ways to speed things up. I would assume that the windows driver probably throws away quite a bit of memory trying to make fat32 fast, microsoft has always been more than willing to squander memory willy-nilly. In fact, Mechwarrior IV:Vengeance used to have a habit of squandering it permanently, or until the process terminated... From what I hear, Excel still does, but I don't spend much time in there consecutively.

Also, I don't see any reason you couldn't build a tree in memory or in a cache (perhaps you build it in memory and design it so that you can swap most of it out automatically? That would be a really funky way to do things on chicago but it would be quite reasonable on any flavor of NT, or of course on your favorite open-source operating system. A non-trivial job to be sure but obviously not impossible. At least that way it would only be slow once per boot.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Why is open() concerned? by dfgdfgdfg · 2003-01-27 02:41 · Score: 1

Why does it make a difference for open() whether the size of off_t is 32 or 64 bits? Shouldn't only lseek() be affected?

I thought only few programs used lseek(), e.g. databases. Wouldn't most programs read files sequentially, whitout using off_t at all?

--
-- 1.e4 c6 2.d4 d5 3.Sc3 de4: 4.Se4: Sd7 5.Sg5 Sgf6 6.Ld3 e6 7.S1f3 h6 8.Se6:

Re:Why large files by kasperd · 2003-01-27 03:32 · Score: 1

Don't you mean, increase the cache size, make modules... :)

I think FAT was compiled in my kernel at that time.

perhaps you build it in memory and design it so that you can swap most of it out automatically?

I wonder who really wants to spend a lot of time improving FAT performance when there are so many other filesystems that will always perform better than FAT.

--

Do you care about the security of your wireless mouse?

Except for astronomical calculations.. by A55M0NKEY · 2003-01-27 04:48 · Score: 1

Except for scientific calculations where there will probably never be a reasonable limit on the size or precision of numbers needed I doubt anyone would need more than 64 bits for any scalar type, be it a char or an int or a double or whatever. Why not use 64 bits for everything and accept the wasted space for storing chars but not ever have to worry about running out of numbers? Even if you waste 7/8 of the space on your hard drive to store 8 byte long chars, the available storage has gone up exponentially by using a 64 bit address space. increasing the size of your data 8 times is negligable, negligable enough to not even bother with 1 byte chars.

--

Eat at Joe's.

Re:Why large files by drinkypoo · 2003-01-27 05:04 · Score: 1

I wonder who really wants to spend a lot of time improving FAT performance when there are so many other filesystems that will always perform better than FAT.

Well, mostly Microsoft, I'm thinking. Also fat32 is a handy filesystem because just about everyone can read it these days. I'm about to set up a PC for my girlfriend's aunt, it's just a K6-2 300. It'll have 256mb ram, and minimal (1.2Gb) disk, because that's what I have lying around. I'm putting Windows 98 SE on the disk, and knoppix will be provided on a CD so she can play with linux, assuming I can get it to stop making idiot assumptions about refresh rates without requiring her to insert a floppy as well. That is god damned idiotic. But anyway I digress, the best FS for that OS is FAT32, so I'm going to use it, all data will be stored on a fat32 volume. I imagine this is becoming a fairly common scenario. Also of course many geeks multiboot to win98 for games, the only filesystem they'll have in their PC readable by all operating systems is FAT32 and they will likely be keeping media there.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Use Microsoft Windows NT and forget about the by John+Sullivan · 2003-01-27 05:50 · Score: 1

Use Microsoft Windows NT and forget about the little things.

I realise you're just a troll, but I'd like to point out that Win32 also has two forms of file API for most functions - one that can do 64-bit and one limited (or at least which encourages you to use) 32-bit. For 64-bit access to work in any given application, you're relying on the language runtime making the correct mapping, and/or the end developer choosing the right set of functions to use. In many cases the easiest and most obvious ones will limit you to 32-bits - so many applications will not work with such large files.

This is a problem which has affected pretty much every system - even in an OS where *only* 64-bit file APIs exist, you'll still find an occasional app which tries to fit a file location into a 32-bit variable.

--
This is my World Wide Web of Whatever

Re: Painfull, in ACHES by RoboProg · 2003-01-27 09:04 · Score: 1

Oh, but you do. AIX 4 definitely wants -DLARGE_FILES (sp?), or bad things happen, and watch your longs and long-longs (and their aliases) carefully. (A buddy and I recently had to comb through exactly this problem in an app)

--
Yow! I'm supposed to have a plan?

Re:Wrong point of view. by Wolfrider · 2003-01-27 11:28 · Score: 1

--How about freaking TAR BACKUP FILES, you narrow minded moron?!

--Have you tried backing up your 60GB Windoze partition to a compressed tar file, and gotten stung by that paltry 2Gig file limit under an older distro?? That pissed me right off!!

--Now I use Knoppix, and no worries. :)

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??

Re:Why large files by asparagus · 2003-01-27 13:03 · Score: 1

I know I've wanted to be able to just dump a mini-DV tape (about 13 GB) directly to a single disk file for later editing.

That's the way I edit. With the size of modern hard drives, it's a waste of time to do a traditional log/capture session. Instead, just dump everything to disk and then break it up from there. FCP even has a feature or two designed towards this direction (Start/stop detection). Hopefully they'll fix the subclip bug in version 4.

I tell FCP to parition my files, though. The only >2GB files I currently have are my toast DVD images. I try not to use >2GB files in general, though...there's still some mysterious HFS+ bugs floating around that I've been trying to avoid.

-Brett

Re:Unices? All your boxen are belong to us!!! by andrewjjenkins · 2003-01-27 17:33 · Score: 1

Your boxen will be shipped in 4-6 weeks.
Sweet! I didn't know I'd get free boxen for reading your post!
This guy would have a field day with "All your boxen are belong to us"

Anybody else still have the T-shirt? by Ripsaw · 2003-01-28 08:51 · Score: 1

Way back in January of 1995 a group called the Large File Summit was formed to standardize large file access in Unix systems.

This group produced three notable results:

A specification, which was ultimately submitted to X/Open,
A declaration that 2**64 bytes is a "bubbabyte", and
A really cool T-shirt.

I still have my T-shirt -- how about you?

Benefits of File Size Caps by Flamesplash · 2003-02-01 07:47 · Score: 1

I used to be a student admin for Clemson's College of Engr. and Science. We had several CAD tools that the Engr. students would use. There was this one tool that you could specify a duration the simulation was supposed to last, otherwise if the field was blank it would run forever. Besides that little bit of badness the field was blank by default, so many an unsuspecting student would run their simulations and they would run forever creating these huge output files, which the students also didn't know about.

The killer here, is that if you quit the program the wrong way ( something like Close instead of Quit ) the program would keep going, even after the student would log out.

So now you have N students who are all generating infinite files. However, the files would hit the 2GB limit and stop eating up space. ( Thank You )

The only other nasty ness of this is that once we found the file, if you simply removed it, the program (still running after log out) is just able to finally add more data. So you had to track down where the program was runnging and kill it first.

I was in charge of backups, and man of man was this annoying for them.

--
"Not knowing when the dawn will come, I open every door." - Emily Dickinson

Slashdot Mirror

Large File Problems in Modern Unices

218 of 290 comments (clear)