EvanED · Slashdot Mirror

Re:Why not both? on The Most Expensive One-Byte Mistake · 2011-08-03 02:21 · Score: 1

Um, so do that for strings too.

Let me rephrase: why aren't you passing by const reference except where you need a copy?

Re:register starvation on The Most Expensive One-Byte Mistake · 2011-08-03 02:19 · Score: 1

No, I'm not sure. I did say that it's been a while since I read that. :-) But given the x86->microoperation translation that chip front ends do (at least Intel's), it seems entirely plausible.

It may even have just been that anywhere in the L1 cache is only a cycle or two slower than registers nowadays, or something like that.

The main thing I remember was taking away the impression of what I said before: that x86 register pressure is not nearly as bad in real terms as it looks.

Re:The trouble is arrays, not strings. on The Most Expensive One-Byte Mistake · 2011-08-02 19:05 · Score: 1

This is not strictly the same thing, but don't underestimate the problems that are caused (particularly for automatic analysis) by the fact that you can't tell apart something that is semantically a pointer to a single character vs a pointer to an array.

Re:register starvation on The Most Expensive One-Byte Mistake · 2011-08-02 18:54 · Score: 1

OK, now think about how you would compile a loop; say, strcpy. With null-terminated strings, you can do this with one source register, one destination register, and one temp register to hold the value you pull out of memory.

With counted strings, you "need" one more. You either need to store the base and offset of the source string, or the current address and the ending address... then you also need the temp and destination register.

Re:Why not both? on The Most Expensive One-Byte Mistake · 2011-08-02 18:48 · Score: 1

While in other cases (when you pass a std::string by-value and invoke the copy constructor, which tends to happen a lot), you have a hefty performance penalty.

And calling strcpy a bunch of times for the hell of it also does.

What sort of situations are you running into when you are passing a std::string by value and you don't need the copy for correctness (because you're going to modify one)?

Re:Missed the point on The Most Expensive One-Byte Mistake · 2011-08-02 18:42 · Score: 1

Getting a free strlen() is NOT an advantage, by the way. In fact, that became a liability when UTF-8 arrived. With a library strlen() function, all you had to do was update the library, but when the compiler was hardcoded to just return the byte count, that wasn't an option.
Update the library to... do what? Take into account multi-byte sequences? strlen doesn't and shouldn't do this, and the goal of the count field of counted strings should absolutely not be to count the number of characters -- it should be the buffer size. And UTF-8 doesn't change that.

Re:PHK wide of the mark on The Most Expensive One-Byte Mistake · 2011-08-02 18:35 · Score: 1

Hmm, looking at your post again, I think you may have been singling that out for exactly that reason. If so, never mind.

(Though if so, I'll point out that much of C is designed around "you don't need to check your preconditions" -- from general array bound accesses to union accesses to casts to all sorts of stuff. strcpy is basically exactly in line with the rest of the language.)

Re:PHK wide of the mark on The Most Expensive One-Byte Mistake · 2011-08-02 18:30 · Score: 3, Insightful

If you want to point to a real steaming pile, strcpy() was designed by a moron with a bad hang-over and no copy of Dijkstra within a 100 mile radius. It was tantamount to declaring "you don't really need to test your preconditions ... what kind of sissy would do that?"

To play Devil's advocate, strcpy cannot check it's precondition. You can't tell whether a pointer you're given is valid, or how much space is left in the buffer.

(Well, I guess you could go make malloc record far more information than it otherwise has to, and make strcpy grovel through that and some other data, but even I don't think that'd have been worth it. And I'm pretty far on the side of "why the heck are we using languages that are as unsafe as C".)

Re:Got it wrong on The Most Expensive One-Byte Mistake · 2011-08-02 18:25 · Score: 1

I don't know that that's true. Operations that do need to know the length of the string could be quicker, and I'm not sure that these cases are less frequent.

So I will back up the OP in a couple small respect here: it is still possible to track the length yourself, and you can do all the operations that do need to know the length of the string in a different way using that information. (E.g. if you have s1 ands2 and the length of s1 is n and you want to concatenate them, you can just do strcpy(s1+n, s2) instead of strcat(s1, s2). (Or whatever the invocations are.)

You get the O(1) strlen operation when you need it, and don't suffer the overhead of maintaining the counts when you don't. The only problem arises with modules that aren't written this way: you know the length of the string, foo() needs to know the length of the string, but you don't control the implementation of foo() and it's written without a way for you to tell it the length.

The second bit is that there are a couple representations you could use for a string. First, you could have a pointer to a block containing a count and a pointer to the actual data. This adds a level of indirection to each access, and it adds more allocation and deallocation overhead. Tiny amount, but nonzero, and it's on every access, including reads. (This is even more true back before optimizers would have been able to do stuff like save the address of the real block, hoist that out of the loop, and only do it once. Though I guess you could still do that manually.)

The second representation is to have a pointer to a the string where that block is prefixed by the count. However, you then can't create a string that's a suffix of the original without copying the whole string. (With C's representation, if p is a non-empty string, then p+1 is also a string. This also makes things like iterating through a string quite nice.)

I've spent some time thinking about this in the past, and I've developed a reasonably strong opinion of how a C-like language would "best" handle strings, but there are substantial benefits and substantial drawbacks to every option. (I like a variant of counted strings ... but I also have some fairly unconventional and strong opinions on some programming language and OS fronts as well. :-))

Re:Got it wrong on The Most Expensive One-Byte Mistake · 2011-08-02 18:12 · Score: 1

So why suffer the overhead of keeping track?

Because you usually need to for correctness anyway, to make sure you don't overflow your buffers.

For the kind of language C aimed to be -- a replacement for assembly language -- the choice of null terminated strings was both obvious and correct.

For the kind of language C aimed to be, it sure as hell gets used in a lot of inappropriate venues. Like OS kernels.

Re:register starvation on The Most Expensive One-Byte Mistake · 2011-08-02 18:04 · Score: 1

x86's register situation, while not nearly as good as it should be (even x64 isn't all that good), is not nearly as bad as it seems. First, register renaming does a bit to help, but my understanding is that x86 chips pull a special trick: they are able to specially detect most reads and writes to the top several stack slots and redirect those accesses to a register as well. (It's been a while since I've read that, and I forget where.)

(BTW, your "4 or 5" is a little low: it's really 6 or 7 registers that are generally available. You definitely get eax through edx, esi, and edi. That's 6. If you turn on frame pointer optimization, you've also got ebp.)

Re:The cost of a byte - or was that the value? on The Most Expensive One-Byte Mistake · 2011-08-02 17:55 · Score: 1

String can be any size and only restricted by available memory.

It's not like you can't get that with counted strings. If you're in the "infinite" limit case, then you're already doing something very different than just treating a block of memory as a string, and so you can either use a terminated string in that (very unusual) case or allow for a variable-sized count field.

What's the size? What byte order? What bit size? How will this affect communications between platforms?

The size of the counter I'll grant you -- IMO this may be the biggest reason that I'm glad for historical reasons that C didn't go with count fields. (I'm worried that we'd still be using 2-byte fields or something nowadays.) But I think you're overstating the problems with it... you already have to worry about all of those problems.

It's entirely (theoretically anyway) possible - given the kind of code I've seen in browsers and server code -that the web couldn't have existed without some of these assumptions. The "streaming" so core to unix depends on this... how else does one know when one hits the end of a file or a buffer?

I don't buy that one iota.

So first, "how do you know when you git the end of a file"? That's not signaled by null in the first place, so the same way you do now. End of a buffer? Because you reached the count.

Second, it's not like if there was a situation where you'd frequently not know the size of the data a priori you wouldn't be able to change the protocol and include a terrminator in that instance. (You could use this to still provide something like find's -print0 and xarg's -0 if you didn't want lengths to show up on standard out.)

Third, think about what your assertion basically boils down to: that you can't do web programming in languages that give you counted strings. And of course that's crazy.

Personally I think there's something you don't see much in this debate: there are actually three pieces of information that matter: the string data, the length of the string, and the size of the buffer. It's always necessary to track the first, but any time you want to extend the length of the string you have to track the third. (And that's a fair bit.) In my ideal world, C's "standard" string representation (supported by the language-provided APIs) would have been like that. (Windows has it right.)

Re:Don't care for it, but... on The Next Firefox UI · 2011-08-01 12:54 · Score: 1

To each his own. :-) I would much much rather lose the status bar than the title bar. I use Chrome (well, did, until just a couple days ago) on Linux because I run xmonad and that fixes both of my problems. (I don't manually move around windows, and all my window names are shown in full in a bar at the top of the screen.) But on Windows dropping the title bar like that is enough to send me to another browser.

For the title bar, the number of times that I've needed to see the complete page title (and the title has been too long to fit in the tab header) is pretty small.

For me, "need" is pretty low, but "want" is fairly often. And "is too long to fit in the tab" is "most of the time".

Meanwhile, I don't mind the pop-up text that Chrome gives you, and don't use any extensions that make much use of the status bar, so I don't really miss that.

(I don't want to convince you that I'm "right" of course, just try to answer your question about why some of us don't like the Chrome-style UI.)

Re:Don't care for it, but... on The Next Firefox UI · 2011-08-01 08:50 · Score: 3, Informative

I'll be the first to admit that I was very hesitant about putting tabs on the title bar, but after letting myself get used to it for a while I see a at least a couple distinct advantages. First the obvious, you gain some vertical screen space, which is always handy on modern widescreen monitors.

Tabs above/below the address bar I couldn't care less about, but I do not like tabs in the title bar. That comes at the cost of losing some vertical grabbing range for the mouse and no longer having a place to put the (full) page title.

On removing the status bar I couldn't agree with you more though...

And yet you like the extra vertical space from removing the title bar? (Not that I'm a fan of the loss of a status bar.)

Re:I don't Git it.... on The Rise of Git · 2011-07-27 05:27 · Score: 1

(That sparse checkout is what I did to split up the repository.)

This sentence is wrong and dumb. Ignore it.

Re:I don't Git it.... on The Rise of Git · 2011-07-27 05:25 · Score: 1

No, because the .git directory is in the wrong place relative to the contents. (I needed ~/.emacs/.git, basically; .git couldn't be at the ~ level.)

You could maybe use it if you only ever wanted to pull, because you could do that sparse checkout, move everything up the directory tree, and work with that locally. I'm not sure what happens if you pulled changes, whether Git is smart enough to change the files in their new locations. However, I'm pretty sure you definitely couldn't push back without some directory finagling.

(That sparse checkout is what I did to split up the repository.)

Re:I don't Git it.... on The Rise of Git · 2011-07-27 04:16 · Score: 1

To be honest, very few of those changes sound particularly appealing to me. Really the only three that jump out are:

The "one .svn directory" is kind of nice, but like I say in another post, there are lots of times where I actually really like the fact that a subdirectory of a working copy is a working copy.
svn switch checking ancestory is good.
I don't have enough experience with svn's mergeinfo stuff to know how that stuff works out.

But as far as I can tell, my big issues with Subversion still stand: not being able to work without repository access (of course, you really need a DVCS for that), an absolutely braindead ignore feature (no ability to set "ignore this pattern in any subdirectory"), and the "you're screwed if you update and get bad conflicts" problems. Or support the interactive add feature that I don't use very often with git but when I do is an absolute godsend.

Like I said, I'm sure there are cases where it's not really appropriate.. but I'd really struggle to come up with one where I'd rather use Subversion.

Re:Git could use revision numbers on The Rise of Git · 2011-07-27 03:57 · Score: 1

I have a couple objections to your arguments.

First, I feel that it falls into place with one of the big pieces of what I see as the Unix philosophy (or at least very common in Unix tools) and which I almost always very much disagree with: refuse to be helpful in the common case if there are cases either where you know you can't be helpful or cases where you'll make mistakes. (In this case, "common case" = "no branch from this point".)

Second, what if I'm only interested in a single branch? There there can't be more than one child, because the second child is on another branch. If I'm on the commit that the branch head points to (I'm not sure the technical Git term), then there are no children, otherwise there is exactly one. If I took a branch from the current head, doesn't matter. I'll see the merge commit when I get there, or if it hasn't yet been merged, it doesn't matter one iota.

Third, take Subversion's revision numbers in particular. Since that number is global, if you cycle through them you will eventually hit every child. Now, this argument is not terribly compelling because you won't know when you're there without some inspection, but you will hit it. (On a related note, how do I get a list of every commit ever made to a Git repository, on any branch? I have no clue. With Subversion, it's easy -- you just ask for the log higher up the tree.)

How much do these things matter in actual use? Very little, especially with the graphical tools that are available (for both Git and Subversion). But it's not zero, IMO.

Re:I don't Git it.... on The Rise of Git · 2011-07-27 02:29 · Score: 1

*bzzzt* [Though I did figure it out a good workflow when typing up this post!]

I use a bare repository (so I can push to it from home, where I "can't" really pull from), and then a clone of that. You forgot to set that up.

'cause see, here's what I would think would make sense:

mkdir repo cd repo git init --bare cd .. git clone repo working cd working touch foo.txt git add foo.txt git commit -m 'Added foo.txt' git push

but it doesn't work, at least under 1.7.4.1. (Current is 1.7.6 so I'm not that far out-of-date, and upgrading is far from trivial in my environment.)

$ git push No refs in common and none specified; doing nothing. Perhaps you should specify a branch such as 'master'. fatal: The remote end hung up unexpectedly error: failed to push some refs to '(...)/repo'

However, git push origin master seems to work... I could have sworn I tried that before and it didn't. So maybe my complaint is sort-of-solved.

Still, I firmly believe that my workflow makes sense, and the fact that it doesn't work I see as a blemish on Git. It is that sort of thing that turns people off of software, especially if they're less patient, and especially when there are other good alternatives like Hg. (No clue how I would want to structure things.)

Re:I don't Git it.... on The Rise of Git · 2011-07-27 02:19 · Score: 1

Regarding the "forget to push", that's not quite what i mean. I mean, I like to checkin things to a branch regularly, whether it's broken or not. ... That's not something you want to push to your primary repository.

Wait, so what do you do under Subversion? I'm confused.

Re:I don't Git it.... on The Rise of Git · 2011-07-27 02:15 · Score: 1

I HATED this shit with SVN, I wanted to check out the entire repository but it was just too big, so I could only check out parts of it.

Let me give you a concrete example of where I was screwed by this. I and a version control fiend... I put most stuff that I do that is in some textual format into one, and that includes config files.

The way I set up config files on Linux is I have the .config directory that well-behaved programs use, and those files are just sitting there normally. I also have the actual files of not well-behaved programs (those that make you dump .blah files in ~) under that directory, with symlinks at the location the program expects.

So, I figured I'd put my .config directory into git. Seemed (and still seems) reasonable to me. I like it way better than making a git repo for just .xsession, a git repo for .zshrc and .zshenv, etc. Not only does it seem silly to make a repository for a single file, but that would also allow me to share stuff between, say, .zshrc and .xsession if I ever get around to doing so and have commits be atomic and such.

Until I wanted to check out just my emacs config directory so I could put the contents into .emacs on a machine where I didn't want the rest of my configuration, and Git said "you want to do what? not on my watch."

So now I have a git repo with only .xsession. (Okay, it also has an xmodmap config in it. Still, two files, and no sharing if I decide it's worthwhile to merge .xsession and .zshrc.)

Re:Git could use revision numbers on The Rise of Git · 2011-07-26 18:25 · Score: 2

That's precisely why most DVCS don't use version numbers, but you'll also notice that the poster who started this thread proposed having a master repository which sets the numbers.

You'll also notice I didn't say "DVCS should have version numbers" in my post, I said "here are the drawbacks with the fact that DVCSs don't (usually) have version numbers."

Also you could look at Bzr... another poster in this thread has elaborated on the way it does numbering in a distributed setting.

Re:I don't Git it.... on The Rise of Git · 2011-07-26 18:22 · Score: 2

I'm talking abou the GUI tools that are currently available. They suck, and doing tasks like cherry picking files is a pain in the but. Of course, the fact that there's a term called "Cherry Pick Commit" that has nothing to do with "Cherry picking" files for commit.. might be part of the reason... You are right, though.. not having to checkin all files in one command is nice.

So I can't speak to GUI tools on anything but Windows, but there's a TortoiseGit that functions nearly identically to TortiseSVN. It even (at least mostly) hides the index from you.

My other major beef is that, while it's nice to be able to do version control disconnected, I dislike having my check ins local.. version control is also a "save my ass", and if my laptop takes a trip down a flight of stairs, anything that's not pushed is lost as well.

That sort of gets back to the "it's easy to forget to push" problem. If you're not subject to this problem, then I disagree that there's much of a difference: if I lose work because I deliberately didn't push, that's because I don't have repository access, and then I'd have "lost" that work under Subversion anyway because I wouldn't have done it in the first place.

As for remembering to push, there is a problem there. Tortoise is nice because on the "yes, you've committed" dialog there's a nice "push" button staring you in the face, so it's pretty easy to remember there, especially if you get in the habit of pushing after every commit.

For the command line, I haven't found a perfect solution... I think I want to write a shell alias that will run git as normal, but if I said "git commit" will print out "don't forget to push!" when it's done. I haven't gotten to that yet.

And one of the two biggest repository tangles I've had to unravel had at its root the fact that I forgot to push from one copy of a repository, developed in another, and then tried to sync everything up. That took some time to even figure out what happened, and rather longer to figure out the best way to fix it.

That said, I've also had a time when I've left dirty copies of files sitting around in a Subversion working copy for months without noticing, and that caused a problem too.

TLDR I do think that this is a drawback of Git, but for me it's so drastically outweight by being able to work disconnected that it almost doesn't register.

Re:Eclipse has adopted Git [for] for Eclipse proje on The Rise of Git · 2011-07-26 18:08 · Score: 1

I've been using TortoiseGit for some time, and it's pretty awesome.

I only remember one problem. I'm not sure exactly what happened, but it had something to do with the combination of (1) trying to hide the index from you (which I'm actually totally behind in its case) and (2) something getting into the index by other means. I think what I was getting was an empty commit dialog even though there was uncommitted stuff in the index. But that was easily solved via the command line, which may have been how I got into that state to begin with. That problem was a while ago so I don't recall really what happened. :-)

Re:Git could use revision numbers on The Rise of Git · 2011-07-26 17:56 · Score: 1

I'm not sure how branches play into this, but as someone who used Subversion for a few years and then switched to Git for the last few years, I forgot to mention version numbers when I said what I think Svn does better in another post. (Mostly that post talks about what Git does better, so this is an exception. :-))

The nice thing about version numbers is they are predictable. If you're on revision 100, the previous revision was 99. The next revision is 101. With hashes... there's nothing. If the current revision is 483b3ced, that tells you nothing about what it's parent revision or child revision is.

Now, Git works around this mostly, because you can say 483b3ced^ to go to the previous revision (and actually SVN supports this too because you can say HEAD^). But it's not a full solution. What's the next revision? Git doesn't have a way of getting you that information.

This is not very commonly useful (at least in my experience), and most of the time it is useful is because you're doing what 'git bisect' does natively, but every once in a while at least I hit an edge case where it would be useful to specify "the next revision". And I don't know of a way Git gives you this, even just scanning over the rev-parse man page.

Slashdot Mirror

User: EvanED

Comments · 6,434