I just checked it now on my end and it seems to be fine. Maybe it was just a transient failure?
$ host eztv.it
eztv.it has address 162.159.244.249
eztv.it has address 162.159.243.249
eztv.it has IPv6 address 2400:cb00:2048:1::a29f:f3f9
eztv.it has IPv6 address 2400:cb00:2048:1::a29f:f4f9
eztv.it mail is handled by 10 ezmail.es.
$ grep nameserver/etc/resolv.conf
nameserver 8.8.4.4
nameserver 8.8.8.8
Don't try this at in a University of Texas parking lot. They'll fine you for it. When it happened to me, it was something like $35. "Improper Method of Parking," or some such bunkum. Oh, and Texas requires front plates, so you've already lost that aspect anyway.
I wonder if that was what was up with a truck I saw a few months back, with a huge ol' camera on the side. It was just a boring black pickup truck, and just one camera on the driver's side.
I've seen the Google Car, and it was much smaller, painted rather obviously, and had cameras facing multiple directions.
It refers to the C Run Time, aka. the C standard library. Back in the day, only C programmers were able to operate radar. Nowadays, they can monitor radar with jQuery and node.js.
Did the OTA process itself cause the instability, or would your device been just as unstable had you updated it over a cable? My comments regarding OTA updates are meant to apply to the OTA aspect only, not whether the update itself is good. That is, for a given update X, do you deliver that update via a programming cable plugged into the ECU at the dealership, or do you deliver that exact same update OTA. That was the point in debate.
Or is your (unstated) argument that by lowering the barrier for making updates (ie. OTA is easier and cheaper than calling everyone into the shop), that would tempt auto manufacturers to take shortcuts in their QA process in the name of getting updates out there more quickly?
Ok, that's enough rant. Every one of those folks earned their place on a stamp. I just wanted to point out your double standard. It's easy to dismiss one person or another with cherry picked criteria.
If you walked up to a random 20 or 30 something on the street today and asked them if they knew who Carson, Bergman, Presley, Brown or Jobs was, I imagine Steve would beat out most of them.
I'm no Steve Jobs fanboy. (I've never owned an iPhone or iPod, and I'm posting this from a Linux box.) But even I can recognize the reality of the situation.
Actually, proper OTA updates have a number of safeguards built into them to ensure the process has clean "before" and "after" states for each step of the update process, with no crash-inducing intermediate state. I can think of at least one vendor that has a product in this space. (Note: The link is not meant as an endorsement; it's merely an example.)
The only real thing I imagine you need to worry about is if the car has had damage or after-market "upgrades" that might interfere with the validity of the update, leading to safety issues with the combination. A trip to the dealer would at least give the dealer a chance to notice such things. I find it hard to imagine that in practice, though, that it would uncover many negative interactions at the dealer.
I usually don't state that explicitly, at least at first. I want to let the idea sink in first. I will state that if someone doesn't get it at first, though.
The way I like to summarize it when talking to non-technical types is this: The odds of any one ticket winning the lottery jackpot are astronomically small. Regardless, people win the jackpot quite regularly.
Low probability per trial × many trials = reasonable probability of occurrence overall.
Rounding small probabilities down doesn't fully explain all the ways folks get tripped up thinking about probabilities. For example, the Birthday Paradox doesn't fit that model directly, because it's counter-intuitive what constitutes a "trial". As the number of people involved grows linearly, the number of potential pairings grows quadradically, and most folks don't really take that into account.
Extending that to the lottery example: It's far, far more likely that two people bought the same numbers than it is that anyone matched the jackpot numbers. (And that's before taking into account the fact that folks that pick their own numbers rarely pick very random numbers.) But nobody's interested in that coincidence until the folks with the same number also match the jackpot number.
The notifications seem to be going out in waves, slowly. I'm not sure why. Across three folks I know (including myself) with Kickstarter accounts, the emails themselves all seem to have gone out within minutes of each other, but one of them arrived just minutes ago.
I'm guessing with the volume of emails, it got throttled along the way. You can see this in the Received: headers:
Received: from o2.e2.kickstarter.com (o2.e2.kickstarter.com. [74.63.202.49])
by xx.example.com with SMTP id xxxxxxxxxx
for <username@example.com>;
Sat, 15 Feb 2014 21:49:50 -0800 (PST) ...
Received: by filter-219.sjc1.sendgrid.net with SMTP id xxxxxxxxxx
Sat, 15 Feb 2014 21:18:46 +0000 (UTC)
Received: from MTEzNDg (unknown [10.42.83.122])
by localhost.localdomain (SG) with HTTP id xxxxxxxxxx
for <no-reply@kickstarter.com>; Sat, 15 Feb 2014 21:18:46 +0000 (GMT)
Notice that the earlier time stamps (corresponding to when the emails were generated) are around 21:18 GMT, but the arrival timestamps are around 21:49 PST, about 8 and a half hours later. And that's about how far apart our emails arrived. I imagine more are in the queue.
(And yay crapflooders for making it impossible to format things usefully in Slashdot comments.)
As far as passwords go, I'm not worried about anyone actually hacking my Kickstarter password. It's a password unique to Kickstarter, and it was generated at random.org as a 13 character mixed-case alphanumeric password. Good luck reverse-hashing that. Even if you do, it won't get you much.
Not just mileage, but also number of cars, and age at time of breakdown. If you have a bunch of 20 year old cars of Brand X dying along the side of the road, the fact they're 20 years old is more an endorsement than an indictment.
If you're not corporate drones, whose idea was the Slashdot PT Cruiser?
Marketing. Personally, we (Rob and Jeff) think the Slashdot Cruiser was a really stupid idea, and if they'd asked us about it, we'd have told them so. Usually the marketing department consults with us about promotional ideas, but they're not required to, and in this case they didn't. Given that the reaction to it has been largely negative, we expect they've learned their lesson.
It's probably something even more prosaic, such as memcpy(), memmove() or similar. Simple, correct, broadly distributed, and needed by nearly every program ever written, either directly or indirectly, regardless of the language it's written in.
What piece of code, in a non-assembler format, has been run the most often, ever, on this planet? By 'most often,' I mean the highest number of executions, regardless of CPU type. For the code in question, let's set a lower limit of 3 consecutive lines.
Somehow I don't think your entry is in the spirit of the question.
As far as the original question is concerned: If you don't tie to any particular program, but just a subroutine used everywhere, heavily, I think memcpy() is a contender. (Pick a memcpy() implementation to put here to reach the 3 line minimum.) You'd be surprised how many programs' run times are dominated by memory copies.
Apparently, the TRON family of operating systems one of the most popular operating system families in the world, embedded in literally billions of devices. Sure, all those devices are probably quite slow, but they're also probably running around the clock, mostly looping in some idle loop. So I wouldn't be surprised to find out that it has the most-executed sequential code.
If you consider RTL (register transfer language) to be code, then I'm sure one of the state machines at the heart of the fastest, most prevalent CPU family is well in the lead. It could be the state machine that selects instructions to dispatch, the state machine that retires results, the state machine that fetches instructions, the branch predictor state machine... If a given state machine gets reused over many products in a family, that acts as a multiplier. If it's part of a core that gets replicated many, many times (such as a GPU core), that also acts as a multiplier. Of course, GPUs will clock down and power down cores aggressively when not needed.
This is where co-op programs can help close the gap. You get a long, shallow on-ramp to the job in parallel with finishing up your degree work. It leaves job training with the actual employers, where it belongs,
I pretty much agree with all of the above, having worked in the biz awhile myself.
Since this is a graphics algorithm (apparently), the OP might do better to try to state what the computational complexity is in terms of the operations involved for one output, in terms of basic operations such as multiplies and adds, and perhaps how much storage you need.
Consider this example: If someone came to me and asked me "How much does an 8x8 IDCT cost?" After asking them if it needs bit exactness or not (some standards require it, others don't), I could give them some numbers and some implementation bounds. "The Chen IDCT needs around 11 multiplies and 20 adds per 8-pt IDCT. Multiply that by 16 to get the full cost for an 8x8. (176 multiplies, 320 adds) To meet video precision requirements for an 8x8, the multiplies should be greater than 16 bit precision, and you should carry greater than 16 bits of precision between horizontal and vertical passes."
How many gates is that? Well, depends on the throughput you require, and the details of the implementation. Given the number of multiplies and adds required, you can work toward a number. Suppose you needed to have enough IDCT bandwidth to update a 1080p 4:2:2 image at 60Hz. So, that's 1920 * 1080 * 2 * 60 = approx 250M pixels/second that you need to produce. In terms of 8x8 blocks, that's a little under 4M blocks/second, with 176 multiplies and 320 adds. So, that's approx 700M multiplies a second and 1.3B adds.
Still, that's far from enough to get to a gate count. If you put down 1 multiplier and 2 adders and ran it at 1GHz, you'd have more than enough compute throughput. You still need to add some control logic around it (especially if you only put 1 multiplier and 2 adders, because the IDCT's compute pattern is non-trivial), and some memory to store inputs, outputs and intermediate results. A more likely implementation probably has a lot more multipliers and adders in hardware, but also runs at a much slower clock rate.
So how many gates is that? You need much more information to answer that question, despite the analysis above. You now need to pick an implementation strategy, and more than one makes sense. But, you have a much better idea of the computational cost, and can pick among multiple implementations. For example, if energy efficiency is your goal, you might implement the horizontal and vertical IDCTs in explicitly tuned multiplies and adds tuned to the exact precision necessary and connected exactly as the dataflow requires, and run the whole block at a low clock rate using slower transistors with less leakage. If flexibility is your goal, you might put in a small CPU with enough grunt to fit the computational load. with the idea that you can run other algorithms there if you need to. etc...
Oh, I'm familiar with the noun form of affect; however, it's usually applied to someone or something capable of having and displaying an emotion, as opposed to the emotional impact of an inanimate object. Either way, it's sufficiently obscure as to effect the appropriate response.
Won't the DDR take "50 to 150 cycles" to service each request? Or is there some sort of pipelining going on, where the DDR can take a request every 10 cycles but have a whole bunch of queued requests in flight?
Actually, that's pretty much exactly how it works. If you have a bunch of independent requests to DDR—and by independent, I mean that the processor(s) do not stall waiting for the information from one request in order to make the next—then you can get multiple requests in flight and they can pipeline. Streaming works this way, for example. The STREAM benchmark is a textbook example of a benchmark dominated by throughput, where all the accesses are independent. For example, a[i] = b[i] + c[i] does not depend on a[i - K] = b[i - K] + c[i - K] or a[i + K] = b[i + K] + c[i + K] for any value of K in STREAM's "Add" loop. All four loops of the benchmark have that character. So as long as the processor can get enough work in-flight, it can get multiple cache misses outstanding to DDR. And if one processor and its caches have limited ability to 'execute ahead' like this, multiple processors (or multiple independent threads on the same processor) acting independently can fill in those gaps.
Linked list traversal results in a series of requests that are all dependent on each other. If all the requests miss the caches and must go out to DDR, then the CPU's performance is bounded by the round trip latency to DDR, not the DDR's throughput. Take a look at the linked list benchmarks in Ulrich Drpper's paper, "What Every Programmer Should Know About Memory." (Specifically, go down to section 3.3.2 on page 20.) Pay particular attention to Figure 3.15, Sequential vs. Random Read (for a single thread), and also compare to Figure 3.21 which shows multi-threaded random accesses for 1, 2, and 4 threads.
The paper might be a little old (it uses a Pentium 4 for its benchmarks, after all), but the principles remain true. I should know... part of my day job is as a memory system architect.:-)
Well, even on a shared memory, certain data structures are latency bound, not throughput bound.
For example, consider a linked list. If none of the 'next' pointers are in cache, then you spend a full round-trip to DDR to get the next 'next' pointer. Depending on the machine, that could be anywhere from 50 to 150 cycles of latency, but not a huge hit on throughput.
Generalizing only slightly: a single processor chasing pointers will have a hard time maxing out the DDR throughput, although it will definitely be memory bottlenecked due to latency. Multiple processors all doing the same thing on the same memory will not, as a result, compete for bandwidth. Instead, their requests will execute in turn in the DDR, and you will be able to get some decent scaling up until the point where you have enough parallel requestors to start actually taxing the bandwidth.
If you bring disk accesses in the picture, you have some additional opportunities for scaling, if only some of the threads go to disk, while others hit in DDR. But, I grant that the crux of my argument assumes that accesses to DDR from a single thread bottleneck on latency, not throughput.
I thought searching a large collection of documents was disk-bound, and traversing an index was an inherently serial process. Or what parallel data structure for searching did I miss?
I just checked it now on my end and it seems to be fine. Maybe it was just a transient failure?
$ host eztv.it /etc/resolv.conf
eztv.it has address 162.159.244.249
eztv.it has address 162.159.243.249
eztv.it has IPv6 address 2400:cb00:2048:1::a29f:f3f9
eztv.it has IPv6 address 2400:cb00:2048:1::a29f:f4f9
eztv.it mail is handled by 10 ezmail.es.
$ grep nameserver
nameserver 8.8.4.4
nameserver 8.8.8.8
Don't try this at in a University of Texas parking lot. They'll fine you for it. When it happened to me, it was something like $35. "Improper Method of Parking," or some such bunkum. Oh, and Texas requires front plates, so you've already lost that aspect anyway.
I wonder if that was what was up with a truck I saw a few months back, with a huge ol' camera on the side. It was just a boring black pickup truck, and just one camera on the driver's side.
I've seen the Google Car, and it was much smaller, painted rather obviously, and had cameras facing multiple directions.
It refers to the C Run Time, aka. the C standard library. Back in the day, only C programmers were able to operate radar. Nowadays, they can monitor radar with jQuery and node.js.
Did the OTA process itself cause the instability, or would your device been just as unstable had you updated it over a cable? My comments regarding OTA updates are meant to apply to the OTA aspect only, not whether the update itself is good. That is, for a given update X, do you deliver that update via a programming cable plugged into the ECU at the dealership, or do you deliver that exact same update OTA. That was the point in debate.
Or is your (unstated) argument that by lowering the barrier for making updates (ie. OTA is easier and cheaper than calling everyone into the shop), that would tempt auto manufacturers to take shortcuts in their QA process in the name of getting updates out there more quickly?
Johnny Carson? Every one of his jokes was original? He mined Vaudeville humor and brought it to TV. He didn't even start the Tonight Show.
Elvis Presley? All of his hits were written by others. Let's face it: He made his money and fame bringing black music to white people.
James Brown? Definitely an original, whose life unfortunately went off the rails at some point.
Ok, that's enough rant. Every one of those folks earned their place on a stamp. I just wanted to point out your double standard. It's easy to dismiss one person or another with cherry picked criteria.
If you walked up to a random 20 or 30 something on the street today and asked them if they knew who Carson, Bergman, Presley, Brown or Jobs was, I imagine Steve would beat out most of them.
I'm no Steve Jobs fanboy. (I've never owned an iPhone or iPod, and I'm posting this from a Linux box.) But even I can recognize the reality of the situation.
No, he doesn't...
I know you're just trying to be snarky.
Actually, proper OTA updates have a number of safeguards built into them to ensure the process has clean "before" and "after" states for each step of the update process, with no crash-inducing intermediate state. I can think of at least one vendor that has a product in this space. (Note: The link is not meant as an endorsement; it's merely an example.)
The only real thing I imagine you need to worry about is if the car has had damage or after-market "upgrades" that might interfere with the validity of the update, leading to safety issues with the combination. A trip to the dealer would at least give the dealer a chance to notice such things. I find it hard to imagine that in practice, though, that it would uncover many negative interactions at the dealer.
I usually don't state that explicitly, at least at first. I want to let the idea sink in first. I will state that if someone doesn't get it at first, though.
The way I like to summarize it when talking to non-technical types is this: The odds of any one ticket winning the lottery jackpot are astronomically small. Regardless, people win the jackpot quite regularly.
Low probability per trial × many trials = reasonable probability of occurrence overall.
Rounding small probabilities down doesn't fully explain all the ways folks get tripped up thinking about probabilities. For example, the Birthday Paradox doesn't fit that model directly, because it's counter-intuitive what constitutes a "trial". As the number of people involved grows linearly, the number of potential pairings grows quadradically, and most folks don't really take that into account.
Extending that to the lottery example: It's far, far more likely that two people bought the same numbers than it is that anyone matched the jackpot numbers. (And that's before taking into account the fact that folks that pick their own numbers rarely pick very random numbers.) But nobody's interested in that coincidence until the folks with the same number also match the jackpot number.
The notifications seem to be going out in waves, slowly. I'm not sure why. Across three folks I know (including myself) with Kickstarter accounts, the emails themselves all seem to have gone out within minutes of each other, but one of them arrived just minutes ago.
I'm guessing with the volume of emails, it got throttled along the way. You can see this in the Received: headers:
Received: from o2.e2.kickstarter.com (o2.e2.kickstarter.com. [74.63.202.49])
...
by xx.example.com with SMTP id xxxxxxxxxx
for < username@example.com >;
Sat, 15 Feb 2014 21:49:50 -0800 (PST)
Received: by filter-219.sjc1.sendgrid.net with SMTP id xxxxxxxxxx
Sat, 15 Feb 2014 21:18:46 +0000 (UTC)
Received: from MTEzNDg (unknown [10.42.83.122])
by localhost.localdomain (SG) with HTTP id xxxxxxxxxx
for <no-reply@kickstarter.com>; Sat, 15 Feb 2014 21:18:46 +0000 (GMT)
Notice that the earlier time stamps (corresponding to when the emails were generated) are around 21:18 GMT, but the arrival timestamps are around 21:49 PST, about 8 and a half hours later. And that's about how far apart our emails arrived. I imagine more are in the queue.
(And yay crapflooders for making it impossible to format things usefully in Slashdot comments.)
As far as passwords go, I'm not worried about anyone actually hacking my Kickstarter password. It's a password unique to Kickstarter, and it was generated at random.org as a 13 character mixed-case alphanumeric password. Good luck reverse-hashing that. Even if you do, it won't get you much.
JonKatz was a self correcting phenomenon, at least.
Not just mileage, but also number of cars, and age at time of breakdown. If you have a bunch of 20 year old cars of Brand X dying along the side of the road, the fact they're 20 years old is more an endorsement than an indictment.
No, it was indeed the Slashdot Pre-Teen Cruiser. Naked petrified Natalie Portman approved. Comes a lifetime of hot grits down your pants.
Oh, and I love this item from the old Slashdot FAQ:
It's probably something even more prosaic, such as memcpy(), memmove() or similar. Simple, correct, broadly distributed, and needed by nearly every program ever written, either directly or indirectly, regardless of the language it's written in.
Somehow I don't think your entry is in the spirit of the question.
As far as the original question is concerned: If you don't tie to any particular program, but just a subroutine used everywhere, heavily, I think memcpy() is a contender. (Pick a memcpy() implementation to put here to reach the 3 line minimum.) You'd be surprised how many programs' run times are dominated by memory copies.
Apparently, the TRON family of operating systems one of the most popular operating system families in the world, embedded in literally billions of devices. Sure, all those devices are probably quite slow, but they're also probably running around the clock, mostly looping in some idle loop. So I wouldn't be surprised to find out that it has the most-executed sequential code.
If you consider RTL (register transfer language) to be code, then I'm sure one of the state machines at the heart of the fastest, most prevalent CPU family is well in the lead. It could be the state machine that selects instructions to dispatch, the state machine that retires results, the state machine that fetches instructions, the branch predictor state machine... If a given state machine gets reused over many products in a family, that acts as a multiplier. If it's part of a core that gets replicated many, many times (such as a GPU core), that also acts as a multiplier. Of course, GPUs will clock down and power down cores aggressively when not needed.
This is where co-op programs can help close the gap. You get a long, shallow on-ramp to the job in parallel with finishing up your degree work. It leaves job training with the actual employers, where it belongs,
You need < to get it (the semicolon also): <
I pretty much agree with all of the above, having worked in the biz awhile myself.
Since this is a graphics algorithm (apparently), the OP might do better to try to state what the computational complexity is in terms of the operations involved for one output, in terms of basic operations such as multiplies and adds, and perhaps how much storage you need.
Consider this example: If someone came to me and asked me "How much does an 8x8 IDCT cost?" After asking them if it needs bit exactness or not (some standards require it, others don't), I could give them some numbers and some implementation bounds. "The Chen IDCT needs around 11 multiplies and 20 adds per 8-pt IDCT. Multiply that by 16 to get the full cost for an 8x8. (176 multiplies, 320 adds) To meet video precision requirements for an 8x8, the multiplies should be greater than 16 bit precision, and you should carry greater than 16 bits of precision between horizontal and vertical passes."
How many gates is that? Well, depends on the throughput you require, and the details of the implementation. Given the number of multiplies and adds required, you can work toward a number. Suppose you needed to have enough IDCT bandwidth to update a 1080p 4:2:2 image at 60Hz. So, that's 1920 * 1080 * 2 * 60 = approx 250M pixels/second that you need to produce. In terms of 8x8 blocks, that's a little under 4M blocks/second, with 176 multiplies and 320 adds. So, that's approx 700M multiplies a second and 1.3B adds.
Still, that's far from enough to get to a gate count. If you put down 1 multiplier and 2 adders and ran it at 1GHz, you'd have more than enough compute throughput. You still need to add some control logic around it (especially if you only put 1 multiplier and 2 adders, because the IDCT's compute pattern is non-trivial), and some memory to store inputs, outputs and intermediate results. A more likely implementation probably has a lot more multipliers and adders in hardware, but also runs at a much slower clock rate.
So how many gates is that? You need much more information to answer that question, despite the analysis above. You now need to pick an implementation strategy, and more than one makes sense. But, you have a much better idea of the computational cost, and can pick among multiple implementations. For example, if energy efficiency is your goal, you might implement the horizontal and vertical IDCTs in explicitly tuned multiplies and adds tuned to the exact precision necessary and connected exactly as the dataflow requires, and run the whole block at a low clock rate using slower transistors with less leakage. If flexibility is your goal, you might put in a small CPU with enough grunt to fit the computational load. with the idea that you can run other algorithms there if you need to. etc...
Oh, I'm familiar with the noun form of affect; however, it's usually applied to someone or something capable of having and displaying an emotion, as opposed to the emotional impact of an inanimate object. Either way, it's sufficiently obscure as to effect the appropriate response.
You could of used irregardless in you're sig to embiggen it's affect.
Actually, that's pretty much exactly how it works. If you have a bunch of independent requests to DDR—and by independent, I mean that the processor(s) do not stall waiting for the information from one request in order to make the next—then you can get multiple requests in flight and they can pipeline. Streaming works this way, for example. The STREAM benchmark is a textbook example of a benchmark dominated by throughput, where all the accesses are independent. For example, a[i] = b[i] + c[i] does not depend on a[i - K] = b[i - K] + c[i - K] or a[i + K] = b[i + K] + c[i + K] for any value of K in STREAM's "Add" loop. All four loops of the benchmark have that character. So as long as the processor can get enough work in-flight, it can get multiple cache misses outstanding to DDR. And if one processor and its caches have limited ability to 'execute ahead' like this, multiple processors (or multiple independent threads on the same processor) acting independently can fill in those gaps.
Linked list traversal results in a series of requests that are all dependent on each other. If all the requests miss the caches and must go out to DDR, then the CPU's performance is bounded by the round trip latency to DDR, not the DDR's throughput. Take a look at the linked list benchmarks in Ulrich Drpper's paper, "What Every Programmer Should Know About Memory." (Specifically, go down to section 3.3.2 on page 20.) Pay particular attention to Figure 3.15, Sequential vs. Random Read (for a single thread), and also compare to Figure 3.21 which shows multi-threaded random accesses for 1, 2, and 4 threads.
The paper might be a little old (it uses a Pentium 4 for its benchmarks, after all), but the principles remain true. I should know... part of my day job is as a memory system architect. :-)
Well, even on a shared memory, certain data structures are latency bound, not throughput bound.
For example, consider a linked list. If none of the 'next' pointers are in cache, then you spend a full round-trip to DDR to get the next 'next' pointer. Depending on the machine, that could be anywhere from 50 to 150 cycles of latency, but not a huge hit on throughput.
Generalizing only slightly: a single processor chasing pointers will have a hard time maxing out the DDR throughput, although it will definitely be memory bottlenecked due to latency. Multiple processors all doing the same thing on the same memory will not, as a result, compete for bandwidth. Instead, their requests will execute in turn in the DDR, and you will be able to get some decent scaling up until the point where you have enough parallel requestors to start actually taxing the bandwidth.
If you bring disk accesses in the picture, you have some additional opportunities for scaling, if only some of the threads go to disk, while others hit in DDR. But, I grant that the crux of my argument assumes that accesses to DDR from a single thread bottleneck on latency, not throughput.
Two words: Map Reduce
Thank goodness Google doesn't linearly search the entire Internet every time I make a search. It'd get exponentially slower every year...