TheRaven64 · Slashdot Mirror

Re:The important details: Slower and over 540$ on Intel Core I7-5775C Desktop Broadwell With Iris Pro 6200 Graphics Tested · 2015-07-27 20:29 · Score: 1

The peak power consumption is important for one other reason: heat. The machine that I was talking about is in a small NAS case (4 drive bays, slimline optical drive, power distribution board, mini-ITX motherboard, no other spare space). It also on has a (fanless) 120W PSU, so it's quite easy to go over the available power if the CPU can spike up to a high peak. I'll keep the newer Intel chips in mind when I upgrade, but it looks as if most of the mini-ITX motherboards are still limited to 16GB of RAM and being able to upgrade to 32GB would be the main thing that would prompt me to replace the motherboard. Oh, and Haswell still doesn't have working FreeBSD drivers, so that wouldn't be an option yet.

Re:No kidding. on Google Studies How Bad Interstitials Are On Mobile · 2015-07-27 20:23 · Score: 1

Search for 'semantic web'. XHTML 2.0 was the core of a family of standards that completely separated data, ontology maps, and presentation.

Re:The important details: Slower and over 540$ on Intel Core I7-5775C Desktop Broadwell With Iris Pro 6200 Graphics Tested · 2015-07-27 02:49 · Score: 1

Ultra mobile? I don't think I've ever seen it in that configuration. It's in the same space as Intel's Atom, but unlike the Atom, AMD doesn't try to artificially segment the market too much so you can actually buy one with enough SATA slots for a NAS.

Re:Wait... on LinkedIn (Temporarily) Backs Down After Uproar At Contact Export Removal · 2015-07-27 00:51 · Score: 2

LinkedIn was ever relevant? I've had a lot of spam from them (even though I never joined or gave them permission to contact me), but I'd never consider using them to find people to hire (or to find work, back when I was looking).

Re:FreeBSD on HardenedBSD Completes Strong ASLR Implementation · 2015-07-27 00:48 · Score: 1

BROP doesn't work against a proper ASLR implementation

Define 'proper'. Re-randomisation after every fork()? Good luck with that. PLTs at random offsets? Sure, if you're willing to pay the overhead of not being able to share any position-independent code between processes.

Re:The important details: Slower and over 540$ on Intel Core I7-5775C Desktop Broadwell With Iris Pro 6200 Graphics Tested · 2015-07-27 00:45 · Score: 1

Depends on the AMD chip. I have a box that serves as a NAS and HTPC with an AMD Fusion E-350, which is one of their lower-power chips. Maximum power consumption is 18W for the CPU and GPU. The GPU works fine for decoding HD video (on FreeBSD, presumably it's as good on Linux). It's now around 4-5 years old and the only reason that I'm considering replacing it is that the motherboard can only handle 8GB of RAM, which isn't enough for ZFS deduplication with a 12TB pool.

Re:Scripts that interact with passwords fields aws on A Plea For Websites To Stop Blocking Password Managers · 2015-07-27 00:39 · Score: 4, Insightful

True, although most password managers can generate random passwords (of varying strengths, as a recent Oakland paper showed). Using this functionality is generally easier than thinking up a password.

Re:Scripts that interact with passwords fields aws on A Plea For Websites To Stop Blocking Password Managers · 2015-07-26 22:56 · Score: 3, Interesting

JavaScript can also intercept the contents of the clipboard. If you're blocking password managers, then people are going to do one of two things. Either they'll pick a (weak) easy-to-remember password, or they'll use a password manager and paste the password in. If they opt for the latter, then any malicious ad on the page can grab the password while it's in the clipboard...

Re:No kidding. on Google Studies How Bad Interstitials Are On Mobile · 2015-07-26 20:33 · Score: 1

During your rant, I couldn't help but think, 'But they DO have a standardized app for accessing all the websites', and it's called the browser!

I think that you're slightly missing the grandparent's point. About 10-15 years ago, there were two groups pushing new directions for the web. One group, led mostly by the W3C (though backed by Apple and a few other big companies) wanted to completely separate content and presentation. You'd have a service that would provide structured XML and then a web page or a native app that would process it and present it to the user. This would make it easy to write programs that aggregated data from multiple sources (e.g. find bus, train and flight times and prices so that you can find out the cheapest or most convenient route from A to B, including getting to and from different airports).

The other faction, led by Google, wanted to completely destroy this separation and make web pages into rich web apps that would ensure that you could only view the content in exactly the form that the authors intended. The main goal of this was to make it hard to distinguish content from ads and therefore make it hard to automatically remove ads.

Unfortunately, the second group mostly won. The grandparent seems to want people to go back to the other approach and present machine-readable data feeds so that we can then have rich client-side apps that are agnostic to the source, but present the data as the user wants. I'd like that too.

Re:Browsing with mosquitoes on Google Studies How Bad Interstitials Are On Mobile · 2015-07-26 20:25 · Score: 1

The only problem is that you may forget to turn off blocking of whatever analytics spyware they're running, so they won't get feedback that you only stayed on the page for a few seconds before leaving the site.

Re:All that effort, so little protection on HardenedBSD Completes Strong ASLR Implementation · 2015-07-26 05:27 · Score: 1

Read this paper if you want to know how easy it is.

Re:FreeBSD on HardenedBSD Completes Strong ASLR Implementation · 2015-07-26 05:23 · Score: 1

PC-BSD occasionally picks some patches to apply on top of a stock FreeBSD, but they try to keep it fairly small. I suspect that they're unlikely to pick up these for several reasons. First, there are still some random segfaults in applications caused by these patches that are not yet diagnosed. Second, the HardenedBSD team doesn't have a great track record for security, for example merging some insecure random number generator patches that were under review for FreeBSD and rejected over security issues and shipping them in production. Third, since the Blind ROP work from Stanford, ASLR is largely discredited as a security feature - it's a nice checkbox feature, but it doesn't really buy you much against a determined attacker. Fourth, the last iteration of the patches still had some very odd decisions about the interfaces for turning ASLR on and off (they also had a number of lock-order reversals, which are hopefully fixed in the latest version).

Re:Who wrote the summary? on Uber Faces $410 Million Canadian Class Action Suit · 2015-07-23 19:55 · Score: 1

So go ahead and kick the cripple, it's easy and fun.

Likening Slashdot editors to cripples is quite offensive to cripples.

Re:Commission on Woman Recruited By Google Four Times and Rejected Now Joins Age Discrimination Suit · 2015-07-23 04:24 · Score: 1

Google routinely contacts everyone who has been through their hiring process before. I applied when I was a PhD student and was rejected, but started getting calls from them after 6 months and got them every six months after that. When I was a bit bored, I let them interview me again (free trip to Paris to visit friends, not California for me, and since I stayed with friends instead of in a hotel they paid for a nice meal out to thank my friends rather than a nice hotel room). I turned them down that time, but they still call me every few months. Saying yes on those calls is basically the same as reapplying - it just sticks you into step 1 of the hiring process, they still then want you to send them an up-to-date CV and other things.

Re:The 19 year old is a lunatic on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-23 04:20 · Score: 1

At a single core, we have a 128KB multibanked scratchpad memory, which you can think of as just like an L1 cache but smaller and lower latency. We have one cycle latency for a load/store from your registers to or from the scratchpad

Note that a single-cycle latency for L1 is not that uncommon in in-order pipelines - the Cortex A7, for example, has single-cycle access to L1.

That scratchpad is physically addressed, and does not have a bunch of extra (and in our opinion, wasted) logic to handle address translations,

The usual trick for this is to arrange your cache lines such that your L1 is virtually indexed and physically tagged, which means that you only need the TLB lookup (which can come from a micro-TLB) on the response. If you look at the cache design on the Cortex A72, it does a few more tricks that let you get roughly the same power as a direct-mapped L1 (which has very similar power to a scratchpad) from an associative L1.

If the address requested by a core is not in its own scratchpad's range, it goes to the router and hops on the NoC until it gets there... with a one cycle latency per hop

To get that latency, it sounds like you're using the NoC topology that some MIT folks presented at ISCA last year. I seem to remember that it was pretty easy to come up with cases that would overload their network (propagating wavefronts of messages) and end up breaking the latency guarantees. It also sounds like you're requiring physical layout awareness from your jobs, bringing NUMA scheduling problems from the OS (where they're hard) into the compiler (where they're harder).

Building a compiler for this sounds like a fun set of research problems (if you're looking for consultants, my rates are very reasonable! Though I have a different research architecture that presents interesting compiler problems to occupy most of my time).

Oh, one more quick question: Have you looked at Loki? The lowRISC project is likely to include an implementation of those ideas and it sounds as if they have a lot in common with your design (though also a number of significant differences).

Re:The 19 year old is a lunatic on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-23 00:09 · Score: 2

Prefetching in the general case is non-computable, but a lot of accesses are predictable. If the stack is in the scratchpad, then you're really only looking at heap accesses and globals for prefetching. Globals are easy to statically hint and heap variables are accessed by pointers that are reachable. It's fairly easy for each function that you might call to emit a prefetch version that doesn't do any calculation and just loads the data, then insert a call to that earlier. You don't have to get it right all of the time, you just have to get it right often enough that it's a benefit.

For prefetching vs eviction, it's a question of window size. Even with no prefetching, most programs exhibit a lot of locality of reference and so caches work pretty well without prefetching - it doesn't matter that you take a miss on the first access, because you hit on the next few dozen (and in a multithreaded chip, you just let another thread run while you wait), but if you're evicting data too early then it's a problem. A combination of LRU / LFU works well, though all of the good algorithms in this space are patented. Although issuing prefetch hints is fairly easy, the reason that most compilers don't is that there's a good chance of accidentally pushing something else out of the cache. That said, if they're targeting HPC workloads, then just running them in a trace and then using that for hinting would probably be enough for a lot of things.

I heard a nice anecdote from some friends at Apple a while ago. They found that one of their core frameworks was getting a significant slowdown on their newer chip. The eventual cause was quite surprising. In the old version, they had a branch being mispredicted, and a load speculatively executed. The correct branch target was identified quite early, so they only had a few cancelled instructions in the pipeline. About a hundred cycles later, they hit the same instruction and this time ran it correctly. With the new CPU, the initial branch was correctly predicted. This time, when they hit the load for real, it hadn't been speculatively executed and so they had to wait for a cache miss.

Also, if you're trying to create a parallel system with manual caches... good luck. Cache coherency is a pain to get right, but it's then fundamental to most modern parallel software. Implementing the shootdowns in software is going to give you a programming model that's horrible.

And finally there's the problem that doing it in software makes it serial. The main reason that we use hardware page-table walkers in modern CPUs is not that they're much better than a software TLB fill, it's that it's much easier to make them run completely asynchronously with the main pipeline. The same applies to caches.

Re:Only $100k? on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-22 21:31 · Score: 1

A couple of hundred thousand dollars will get you a prototype, not prototypes - experienced chip designers sometimes get something that works first time (and are deservedly incredibly smug about it). More commonly, you go through at least 2-3 iterations.

Re:The 19 year old is a lunatic on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-22 21:30 · Score: 1

Whether he can actually produce a compiler than will insert the necessary memory fetch instructions at compile time in an efficient manner remains to be seen

That's not the hard bit of the problem. Compiler-aided prefetching is fairly well understood. The problem is the eviction. Having a good policy for when data won't be referenced in the future is hard. A simple round-robin policy on cache lines works okay, but part of the reason that modern caches are complex is that they try to have more clever eviction strategies. Even then, most of the die usage by caches is the SRAM cells - the controller logic is tiny in comparison.

Re:The 19 year old is a lunatic on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-22 21:28 · Score: 1

"Virtual Memory translation and paging are two of the worst decisions in computing history"

He's not completely wrong there. Paging is nice for operating systems isolating processes and for enabling swapping, but it's horrible to implement in hardware and it's not very useful for userland software. Conflating translation with protection means that the OS has to be on the fast path for any userland changes and means that the protection granule and translation granule have to be the same size. The TLB needs to be an associative structure that can return results in a single cycle, which makes it hard to scale. Larger pages help (though then you make the protection granule even larger), but the amount of physical memory that the TLB can cover has dropped with each successive generation since paging was first introduced into microprocessors.

"Introduction of hardware managed caching is what I consider 'The beginning of the end'"

I don't completely agree with this, but given the amount of effort that people writing high-performance code (and compilers) have to spend understanding the hardware caching policy and working around it, I'm not completely convinced that it's a win in the HPC arena - you end up spending almost as much time fighting the cache as you would working with a hardware scratchpad. I'm still a fan of single-level stores as a programmer abstraction though.

Re:Not sure whats more impressive... on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-22 21:07 · Score: 1

I'm hoping that there's a million missing there. Are you just planning on selling IP cores? When I talked to a former Intel Chief Architect a few years ago (hmm, about 10 years ago now), he was looking at creating a startup and figured that $60m was about the absolute minimum to bring something to market. From talking to colleagues on the lowRISC project and at ARM, $1-2m is just enough to produce a prototype on a modern process, but won't get you close to mass production. Do you plan on raising more money or partnering with someone else for production?

Re:Not sure whats more impressive... on 19-Year-Old's Supercomputer Chip Startup Gets DARPA Contract, Funding · 2015-07-22 20:58 · Score: 2

When it comes to being better than a GPU for applications, you have to remember GPUs have abysmal memory bandwidth (due to being limited by PCIe's 16GB/s to the CPU)

That's a somewhat odd claim. One of the reasons that computations on GPUs are fast is that they have high memory bandwidth. Being hampered by using the same DRAM as the CPU is one of the reasons that integrated GPUs perform worse. If you're writing GPU code that's doing anything other than initial setup over PCIe, then you're doing it badly wrong.

That said, GPU memory controllers tend to be highly specialised. The nVidia ones have around 20 different streaming modes for different access patterns (I think the new version has a programmable prefetcher - Intel is also adding one), but if your memory access patterns are data dependent then GPUs can suck.

after you run out of data in the relatively small memory on the GPU

Not really. If you're doing big workloads on a GPU, your overflow isn't main memory over PCIe, it's the next GPU along over a much faster interconnect. And even with PCIe, most of the latency comes from the protocol and not the physical interconnect - you can get a lot more speed out of the PCIe hardware if you don't need all of the features of the PCIe bus.

The DARPA grant is specifically for continued research and work on our development tools, which are intended to automate the unique features of our memory system. We have some papers in the works and will be talking pubicly about our *very* cool software in the next couple of months.

Where have you sent them? I'll keep an eye out.

Your mention of the Mill and running existing code well, I had a pretty good laugh

You certainly wouldn't be alone there.

stack machines are notorious for having HORRIBLE support for languages like C

That's not really true (not sure what the relevance to The Mill is though - it's not a stack machine). Algol support for stack machines became pretty good (C wasn't really popular until stack machines had largely died out, but the back end of a C compiler is not that different from the back end of an Algol compiler). The reason that stack machines died is that it's basically impossible for the hardware to extract ILP from a stack ISA. That's less of an issue if your throughput comes from thread-level parallelism. There are some experimental architectures floating around that get very good i-cache usage and solid performance from a stack-based ISA and a massive number of hardware threads.

Re:AASSHOLES on The Lone Gunmen Are Not Dead · 2015-07-22 20:54 · Score: 1

I was more mad at the writers for spoiling the ending of the BSG remake.

Re:Spoilers on The Lone Gunmen Are Not Dead · 2015-07-22 20:51 · Score: 1

Alien also managed the suspense well at the time by having the least-well-known actor be the survivor. Of course, if you've seen trailers for any of the later ones, then you expect her to be fine, so this didn't last much beyond the original release.

Re:Beautifully put on On Being Pro-GPL · 2015-07-22 06:09 · Score: 1

I think we are disagreeing on facts here

Yes, you're talking about projects that I worked on and actions of people that I collaborate with and claiming that things happened very differently to how I remember them.

Yes a lot of their changes weren't merged in the end but that was because the GCC developers didn't like the direction they were going. Early on they got merged. When really matters.

Most of Apple's changes were in Objective-C (not merged), blocks (not merged) and early stuff on PowerPC autovectorisation (also mostly not merged as GCC contributors at IBM blocked the AltiVec stuff from being merged while it was a Freescale-only feature). About the only stuff that was routinely merged was the Mach-O support stuff, and that was largely useless as the corresponding binutils changes often weren't, so it couldn't be used without Apple's build of the rest of the toolchain.

As for Sony I was talking PS3 they used Apple's early work on GCC.

Sorry, I meant PS3 - the SPU stuff (where the PS3 actually got decent performance) was all new. The PPU stuff may have used a little of the Apple stuff, but very little. Apple did not do the PowerPC bring-up, that was done mostly by IBM folks (who were shipping GCC on AIX on PowerPC, and later Linux on PowerPC). Apple only did the vectorisation work because FreeScale wasn't contributing to GCC.

How does that disprove anything? The question is cooperation. Webkit clearly has lots of people working with it. That's what GPL does

Because the GPL was completely irrelevant to this. Ignoring the fact that it's LGPL, for a moment (and that every iOS device is violating the LGPL, because stuff links WebKit and doesn't allow the recipient of the code to re-link the code against their own build of WebKit), the sequence of events was:

KHTML was open.
Apple created a proprietary fork of KHTML.
To comply with the letter of the license, Apple did code dumps of the KHTML-derived code on every binary release.
No one used the Apple code - even the KHTML developers couldn't work out how to merge the changes, because they were given a big blob containing about six months of work from multiple developers with no revision history.
Other companies (amusingly, Nokia was one of the leaders here) approached Apple and offered to contribute to WebKit if it were developed in the open.
Apple creates a public svn repository for WebKit.
WebKit becomes a successful open source project.

The LGPL was in no way responsible for WebKit becoming a successful open source project, that happened solely because external contributors with deep pockets approached Apple and offered to devote engineering manpower if Apple would collaborate with them. If the license had been BSDL, then the sequence would likely have been:

KHTML was open.
Apple created a proprietary fork of KHTML.
Other companies approached Apple and offered to contribute to WebKit if it were developed in the open.
Apple creates a public svn repository for WebKit.
WebKit becomes a successful open source project.

The steps that were enforced by the license were completely irrelevant to WebKit's success.

Re:Beautifully put on On Being Pro-GPL · 2015-07-22 02:22 · Score: 1

The GPL forced Apple to release all their changes to GCC even though they may have seen these as a competitive advantages and didn't release similar code bases when the code was BSD. This GCC release allowed Sony to leverage them on their game design system

I have several 'huh' reactions to this.

First: Apple's changes to GCC were released, but mostly lived in a separate branch in GCC's svn. A lot of them were never merged into the mainline. The GCC code has subsequently changed enough that applying them is no easier than completely rewriting them (source: a colleague at CodeSourcey who is working on doing exactly this).

Second: Apple is one of the largest contributors to LLVM and Clang (BSD licensed) and has been since the project was quite small - they hired the lead developer of LLVM to use LLVM in their graphics stack and started the Clang project to replace GCC with something that was useable for refactoring tools, syntax highlighting, and so on.

Third: I guess you're talking about the PS2, but optimisations for the Cell PPU are not really very useful when encouraging people to run code on the SPUs (the PPU was pretty underpowered) and Sony did not use Apple's GCC branch. For the PS4, their toolchain is entirely LLVM based (and their OS is FreeBSD based).

Another example involving Apple is the release of Webkit because of the KHTML code BTW.

Not a great example. WebKit was originally released as a bunch of code dumps that were basically impossible to merge with upstream KHTML. It was the promise of outside developer interest (after many years of developing it with no community involvement), not the license, that caused them to release it as a proper open source project (i.e. with a public revision control system). Oh, and the Apple-original bits of WebKit (i.e. those not derived from KHTML) are BSD licensed...

Slashdot Mirror

User: TheRaven64

Comments · 32,964