Yes, an ad-blocker definitely reduces memory usage, by a lot. However, its a bad idea to use any add-on for 'important' sites. I compartmentalize my browser into different user ids so the actual chrome instance I use to access sensitive accounts is completely independent of the instance I use for general browsing. The ad-blocker is disabled for the one I use to access sensitive accounts (in fact, ALL add-ons are disabled for that one), and enabled for the one I use for general browsing.
So actually even though the memory footprint is larger, using separate processes also makes chrome more swap-friendly, which means the kernel can page-in/page-out the tabs more efficiently. The result seems, at least for me, to be a smoother ride when I have a lot of tabs open.
Of course, swap space should always be configured on a SSD.
I always enable the site isolation option. Its nice to see google finally making it the default.
Well, NVMe is a multi-queue spec. The best drivers and chipsets for it will assign a command and response queue to each cpu in the system. This allows for both lockless queuing operation as well as polling with no cross-cpu contamination. In this regard, NVMe is far, far superior to AHCI (aka SATA, which has only one queue for multiple targets) and SAS chipsets (which typically are not multi-queue).
At 10uS, though, interrupt overhead (with MSI-X vectoring per-cpu) still yields superior cpu-v-data performance. Interrupt overhead is only around ~1uS or so. Still, its getting close. At lower latencies polling will definitely be a win. But even at 10uS, interrupt driven operation still leaves the cpu with extra clocks to do other work in that it wouldn't have with polling.
Another problem is that NVMe chipsets generally don't have anywhere near the 1023+ queues supported by the spec. They usually come in at no more than 31 queues, which is not enough to assign one to each cpu thread on heftier systems. The chipset spec can support a lot more... in fact, many more MSI-X interrupts can be supported as well, but we just don't see it out in the field.
Most chipsets only offer 8 queues, which is near worthless on modern multi-core cpus.
Eventually we'll start to see chipsets that implement closer to the queue limit in the spec, at which point we can theoretically assign a queue pair to every active user thread using the storage. But for now I would be happy if chipsets just gave us enough queues to implement two per cpu thread (for priority separation).
Also, Intel NVMe SSDs are *NORTORIOUSLY* bad in multi-queue configurations. Performance is far poorer than other vendors placed in the same configuration. I think this is rather ironic, actually. Intel markets low latency, but their chipsets can't handle it in the real-life configurations that NVMe was designed for.
These are not really consumer products. Basically what you get out of an Optane drive is more durability (hence 10DWPB instead of 0.3DWPD @ 5 year warranty), and low latencies at low queue depths ( 10uS @ QD1 instead of 30uS+ @ QD1 for a NAND drive, random read).
But that's it. Everything else about Optane is non-competitive with NAND, at least so far. The price is ridiculous, the throughput at higher queue depths isn't really all that impressive.
No consumer is going to notice the lower latencies at low queue depths for the types of activities Intel advertises the product for (such as gaming), because all of those activities involve bulk reading and writing which NAND does very well, and most involve a certain degree of sequential reading or writing which modern NAND drives (such as the Samsungs) optimize very well. At higher queue depths the Intel advantage goes away entirely, so it wouldn't move the needle even for concurrent random server workloads.
Consumers for the most part never hit the actual durability limits of a NAND drive. For one, even with the lower durability the NAND drive is typically going to be double or triple the capacity of the Optane drive at the same price point, and for two, consumer use cases do not usually do 10 full drive writes per day over the life of the device or anything even close to that.
Basically, like the idiotic optane 'disk cache' Intel tried to hawk last year, this drive is a pretty bad fit as a consumer device. In this offering Intel at least put the proper durability that Optane is *supposed* to have in the specs. Around 8900TB... nothing to sneeze at when most NAND drives have durabilities in the 200-400TB range. There is something to be said for that, even without real-life integrity/retention data available yet. But... it's still just not a consumer-oriented device.
If I understand this correctly, the kernel is being relinked and rewritten to the boot partition. That's instant fail in my book.... at least for us, the boot partition is sacrosanct. We do *NOT* write to it except when specifically upgrading a system. We do not do ad-hoc or automated writes to it because years of experience has shown that most corrupted boots (aka machine -> non-working) are due to unexpected events occurring while a filesystem is being written to.
The rename trick is not a solution (there's the 'ideal' atomic, and then there is the reality. That storage devices can fail in many different ways even while writing a particular sector, that are unrelated to that sector).
So, honestly, I think OpenBSD is making a huge mistake here. I can see randomization at load-time, but relinking and rewriting the kernel binary on every boot? No. Bad bad bad idea.
ASLR or equivalent is close to useless anyway. Malware has found ways around it, it makes debugging and bug reproducability difficult (which arguably is more important... that bugs get found and fixed, not simply detected). It also tends to fragment memory which can cause serious problems for long-running systems. And the vast majority of systems will simply restart the service anyway. They might log the seg-fault from the malware, but maybe 0.001% of system owners actually look at those logs.
Not 100% sure but I think this particular speedup was due to an issue with non-temporal writes to memory. Such instructions are used in heavily optimized game code but not generally used in critical paths elsewhere. They are also known to be highly temperamental instructions even across Intel cpus. The Ryzen box was synchronizing the memory writes to all cores which imploded some of the heavily optimized algorithms.
So far my tests with a 1700X show Ryzen to be an excellent performance cpu, it goes up well against nearly all of Intel's offerings. It does still run a bit hotter than Intel in my tests but the power consumption is significantly better than past AMD cpus. It's a lot closer to Intel now.
More importantly, Intel's FAB advantage is dissipating fairly quickly as other fabs catch up. The combination of a modern cpu design and competitive third party fabs puts AMD in a good position to compete from this point forwards.
As AMD has shown just in the past few days, Ryzen can definitely be competitive and even more so as game devs begin to make Ryzen-optimized builds available.
I've been running an openvpn link from my home to our colo for years. I also have it set up on all my devices so I can use it while traveling. Some of our DFly devs also use it when they are traveling. Here's my cumulative wisdom on the matter:
Generally speaking it works quite well. I use a medium-numbered port but I also have a server running on port 443 because the many weird networks one runs through when traveling often block most parts, but usually leave the https port open.
* Use UDP for the transport when running openvpn over a broadband link. This provides the most consistent experience.
* Use TCP for the transport for connections from mobile devices. This provides the most consistent experience. There are several reasons for this not the least of which being that the telco infrastructure seems to devalue UDP by a lot verses other traffic. TCP is also a lot easier to run on the server-side if you potentially have many devices connecting in, because you can run one server instance.
* Configure a smaller mss, I use 1300, so the encapsulation doesn't get fragmented by the transport. This is very important.
* Configure a relatively frequent keepalive in openvpn over a WAN link (I use 1sec/10sec), but a less frequent one over mobile (I use 20sec/120sec). This is particularly important on mobile because cell tower switches can cause long disruptions. You don't want to drop the VPN link in such circumstances if you can help it. DO NOT DISABLE THE KEEPALIVE. Always have an openvpn keepalive setup, particularly over TCP, because the TCP connection backoff can prevent your sessions from recovering or cause them to take a long time to recover if one or the other direction is not actively sending data (such as with most web connections, downloads, streaming, etc).
I personally like 'OpenVPN Connect' on IOS (which I use to connect to our project colo). And of course I run openvpn on all the DragonFly boxes including my laptop.
--
Reliability of the VPN depends entirely on the path between your location and the VPN server. The packet must travel this path in addition to the path from the VPN server to the nominal destination, and even in the best of circumstances it will double the chances of something going wrong.
I've had a number outages at home where my cable link is still operational but the cable company's path to the VPN server is having problems. Also, recovery times are longer because not only does the dead network have to revive, but the openvpn setup has to reconnect and renegotiate.
--
Commercial services are going to be hit or miss. VPN'ing your broadband link might be problematic and you have no real visibility into what the commercial service is doing with your data. That said, they are probably going to be a lot better than trusting your data to the telco and wifi hot-spots you connect from when you are mobile.
Netflix and other video streaming providers will often block-out commercial VPN IPs from the service. Generally speaking, using a commercial service for high-bandwidth connections is really hit-or-miss. You are using their bandwidth as well as your own.
When using a VPN, you are bypassing any special deals your broadband provider has made with the likes of YouTube, Netflix, etc. Remember that if the cell bandwidth is supposed to be free, because it won't be over the VPN.
--
In terms of security, its a mixed bag. The VPN will secure your traffic from your immediately ISP/Telco (aka Comcast, AT&T), and that's actually very important. However, you are not anonymous and once your traffic reaches the egress point its up for grabs by any network it flows through and, in particular, the target web page or whatever might be doing its own data collection.
But the telco data collection is MUCH more valuable to third parties than target data collection, and the VPN link at least protects you from that.
The VPN will not do a whole lot for your internal network security. If someone bre
What did it in should be obvious... one security exploit after another, non-stop, for over 8 years. HTML5 might have been the final nail in the coffin but Flash really did itself in.
When Flash was originally conceived by Macromedia very little thought went into security, because at the time security wasn't a big issue (the Internet was still fairly small, compared to today, and hackers had not yet really ramped up on a large scale). The entire codebase was inherently insecure and trusting of the flash handed to it.
In all that time, ever since that first flash product went out the door, right on up to today, nobody did more than basic hand-waving around the security problems. I'm sure they will claim that they tried... but no... they really didn't.
In the end, people finally got tired of the endless stream of security exploits.
That means the average latency is 85uS (averaged over all reads), the lowest latency measured was 66uS and the highest was 140uS. Another important metric is the standard deviation... that is, how 'tight' access times are around that average latency of 85uS. In this case, a standard deviation of 7.5uS is very good.
Comparing this to the Optane. what Intel has stated is that the average latency over all reads for Optane NVMe will be around 10uS. They also stated that the standard deviation would be much tighter. So that is comparative.
But here's the real problem... you ask whether Optane will beat a PCIe SSD as a HDD cache in actual real-world desktop circumstances. I will add 'at the same price point'. The answer to that is going to be 'no'. The reason is that you can buy 4x to 8x the amount of NAND NVMe-based storage as you can Optaane NVMe storage for the same price.
So instead of having a 32G Octane cache, you could have a 128GB-256GB NAND SSD cache for the same price. That *completely* trumps Octane, no matter how low Octane's latency is, for this use case.
Insulator breakdowns on circuit boards happen less often these days but they are still prevalent in Electrolytic caps and anything with windings (transformers, inductors, DC motors, etc), though it can take 20-50 years to happen and depends on conditions. And the failure mode depends too.
Generally speaking, any component with an insulator which is getting beat up is subject to the issue.
Circuit boards got a lot better as vendors switched to solid state caps. Electrolytics tend to dry out and little arc-throughs punch holes in the insulator over time (running them at less than half their rated voltage goes a long ways to lengthening their lives, which is why you usually see voltage ratings much higher than the voltages that are actually run through them).
The insulating coatings in wires used for windings has gotten better. Typically shorts develop over time and change the value of the inductance (or voltage ratio for a transformer), and other parameters until it gets to the point where it is so out of spec it stops doing its function properly. DC motors will get weaker, etc etc.
Just so happens I have an Intel 750 in the pile, here's the issue that the linux NVMe code had to work around:
nvme3: mem 0xc7310000-0xc7313fff irq 40 at device 0.0 on pci4 nvme3: mapped 32 MSIX IRQs nvme3: NVME Version 1.0 maxqe=4096 caps=0000002028010fff nvme3: Model INTEL_SSDPEDMW400G4 BaseSerial CVCQ535100LC400AGN nscount=1 nvme3: Request 64/32 queues, Returns 31/31 queues, rw-sep map (31, 31) nvme3: Interrupt Coalesce: 100uS / 4 qentries nvme3: Disk nvme3 ns=1 blksize=512 lbacnt=781422768 cap=372GB serno=CVCQ535100LC400AGN-1
If I run a randread test on uncompressed data using block sizes 512... 131072, you can see the glitch that occurs at 65536 bytes. I will use a deep queue (128 threads, around QD4 per HW queue but considered to be QD128 globally), so this is the absolute limit of the device's performance. Look at what happens when the block size transitions from 32768 to 65536. That's the firmware screwup that the Linux folks worked around. No other NVME vendor has this issue:
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2 nvme1: mapped 8 MSIX IRQs nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1 nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8) nvme1: Interrupt Coalesce: 100uS / 4 qentries nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
And who the hell do you think I am mister Anonymous Coward?
So, as I thought, you don't understand either that commit or the commit later on that simplified it (159b67d7).
It's not a stripe-size limitation per say, it's just a limitation on the maximum physical transfer size per I/O request, which for 99.9% of the NVMe devices out in the wild will be >= 131072 bytes and completely irrelevant for all filesystem I/O and even most softRAID I/O.
More to the point, that particular commit does not apply to the 600P at all. It applies to several older Intel datacenter SSDs as well as the 750 series and it exists because Intel really screwed up the firmware on those devices and put crazy stupid low limitations on physical transfer size. Then they carefully designed tests that didn't hit those limitations to sell the devices.
The 750, for example, loses a huge amount of performance with a block size >= 65536 bytes. Intel maybe didn't advertise the mistake, but that is a limitation that doesn't exist in the 600P nor does it exist on ANY OTHER NVME SSD IN EXISTENCE. Only a complete idiot creates a NVMe device which can't handle block transfers of 65536 or 131072 bytes without losing massive performance. Intel = 65536 bytes.
This was a well known bug in these particular models.
In anycase, even for these models, this particular quirk has no effect on block I/O tests for block sizes 65536 bytes. And, as I mentioned already, NO OTHER NVME VENDOR has such absurdly low limits or such massively disconnected performance metrics when you exceed them. And even Intel fixed the issue on the P600.
This just points to the idiocy inside Intel. And it shows your stupidity as well, believing that a little quirk like this somehow effects the entire NVMe space (or even the entire Intel NVMe space), which it doesn't. These sorts of quirks exist for all manner of hardware, not just NVMe, to work around poor, buggy implementations.
And, of course, any Linux or BSD operating system will use all available memory for cache data from storage anyway. I guess Windows needs a little more help to do that.
This certainly shows up in, for example, Chrome startup times. It takes around 4 seconds from a hard drive, uncached, 1 second from a SSD, 1 second from a NVMe drive, and presumably 1 second from any other form of storage because chrome itself needs a bit of cpu time to initialize itself, not to mention the time it takes to load a tab (minimum 0.5 seconds).
So honestly once one transitions from the HDD to a SATA SSD, where the difference is noticeable, any further transitions (SATA SSD -> NAND NVME SSD -> XPOINT NVME SSD -> XPOINT DDRs) are not likely to be noticeable, even without a ram cache.
I think Intel's ENTIRE marketing effort revolves around Windows' slow startup times. Or more to the point, Windows tends to seek the storage device a lot while starting up which is *very* noticeable if you have a hard drive, but most irrelevant if you have any sort of SSD.
Since one can accomplish the same thing simply buy purchasing a small SSD, I just don't see them being able to make a case for it being 'easier' as a disk caching substitute verses someone coming to the realization that their time and data are valuable enough to actually spend a bit more money on buying some native SSD storage in the first place.
The advent of the cloud is also making local mass storage less and less relevant. Here I'm not talking about those of us who insist on having our own local archives (mine is getting close to 4TB now, with another 4TB in two backup locations so... that's 12TB of storage for me). I'm talking about 'normal' people who are using cloud storage more and more often. They won't need Intel's ridiculous 'solution' either (not even mentioning the fact that a normal NAND NVME SSD to cache a HDD is a better fix for the solution they are marketing than their Optane junk).
Motherboard vendors are just now, finally, starting to put M.2 connectors on the motherboard. Blame Intel for the slow rate of adoption. Intel came out with three different formats, all basically incompatible with each other, and created mass confusion.
But now, finally, mobo vendors are settling on a single PCIe-only M.2 format. Thank god. They are finally starting to put one or more M.2 slots and finally starting to put on U.2 connectors for larger NVMe SSDs. Having fewer SATA ports on the mobo is no longer a marketing issue. I've seen many more mobos recently with just 2-4 SATA ports.
It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.
In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.
Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.
The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.
The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.
The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.
In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).
But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.
Maybe you should point me at the commitid you are referring to, then I can address your comment more directly. I can tell you straight out, even without seeing it, that you are probably misinterpreting it.
Intel devices have quirks, but I think you are mixing apples and oranges here. All modern filesystems systems have used larger alignments for ages. The only real issue was that the original *DOS* partition table offset the base of the slice the main filesystem was put on by a weird multiple of 512 bytes which was not even 4K aligned.
This has not been an issue for years. It was fixed long ago on DOS systems and does not exist at all on EFI systems. Regardless of the operating system.
At the same time, all SSDs past the second generation became sophisticated enough that they really stopped caring about alignment for most practical use cases.
Where Intel does mess up depends on the device. In the 600P's case, the firmware is poorly designed in many respects. In other cases, such as with the 750, performance implodes with large block sizes (64KB or higher). This just makes the device less worthy, because frankly NO OTHER SSD VENDOR has these sorts of idiotic problems.
All of that said, insofar as operating systems go, these storage-level devices have no real visibility into, understanding of, or optimizations for one particular filesystem verses another. So for all practical situations, there is NO raw performance difference between Windows, MacOS, Linux, or any of the BSD's for these storage level devices. They are completely OS-agnostic and have always been completely OS-agnostic.
Right. They are trying to market it as something cool and new, which would be great except for the fact that it isn't cool OR new. A person can already use ANY storage device to accelerate any OTHER storage device. There are dozens of 'drive accelerators' on the market and have been for years. So if a person really wanted to, they could trivially use a small NAND flash based NVMe SSD to do the same thing, and get better results because they'll have a lot more flash. A person could even use a normal SATA SSD for the same purpose.
What Intel is not telling people is that NOBODY WILL NOTICE the lower latency of their XPoint product. At (I am assuming for this product) 10uS the Intel XPoint NVMe is roughly 1/6 the latency of a Samsung NVMe device. Nobody is going to notice the difference between 10uS and 60uS. Even most *server* workloads wouldn't care. But I guarantee that people WILL notice the fact that the Intel device is caching much less data than they could be caching for the same money with a NAND-based NVMe SSD or even just a SATA SSD.
Smoke. Total and complete nonsense. Why would I want to buy their over-priced octane junk verses a Samsung 951* or 960* NVMe drive? Far more storage for around $115-$130, 1.4 GBytes/sec consistent read performance, decent write performance, and decent durability.
P.S. the Intel 600P NVMe drive is also horrid, don't buy it.
Intels claims are rather exaggerated. Their claims have already been torn apart on numerous tech forums. At best we're talking only a ~3-5x reduction in QD1 latency and the intentionally omit vital information in the specs to force everyone to guess what the actual durability of the XPoint devices is. They say '12PB' of durability for the 375GB part but refuse to tell us how much overprovisioning they do. They say '30 drive writes per day' without tellling us what the warrenty will be.
In fact, over the last 6 months Intel has walked back their claims by orders of magnitude, to the point now where they don't even claim to be bandwidth competitive. They focus on low queue depths and and play fast and loose with the stats they supply.
For example, their QOS guarantee is only 60uS 4KB (99.999%) random access latency and in the same breath they talk about being orders of magnitude faster than NAND NVMe devices. They fail to mention that, for example, the Samsung NVMe devices also typically run around ~60-70uS QD1 latencies. Then Intel mumbles about 10uS latencies but bandies about large factors of improvement over NAND NVMe devices, far larger than the 6:1 one gets simply assuming 10uS vs 60uS.
Then they go on to say that they will have a NVDIMM form for the device later this year, with much faster access times (since in the NVMe form factor access times are constricted by the PCIe bus and block I/O protocol). But with potentially only 33,000 rewrite cycles per cell to failure that's seriously problematic. (And that's the best guess, since Intel won't actually tell us what the cell durability is).
--
The price point is way too high for what XPoint in the NVMe format appears to actually be capable of doing. The metrics look impossible for a NVDIMM form later this year. Literally we are supposed to actually buy the thing to get actual performance metrics for it? I don't think so.
Its insane. This is probably the biggest marketing failure Intel has ever had. Don't they realize that nobody is being fooled by their crap specs?
Pulseaudio is nortiously linux-specific. We've had nothing but trouble trying to use it on BSD and switched to ALSA (which is a lot more reliable on BSDs) a year or two ago for that reason.
I guess that's the end of Firefox's portability. Most of our users use Chromium anyway because Firefox has been so unstable and crash-prone. Long live Chromium?
Your problem was that you were using Kingston, Patriot, etc... all third-rate SSD vendors who use whatever flash chips happen to be cheapest. Crucial (aka Micron), Samsung, and a few others are first-line vendors.
SSDs can certainly fail, but its kinda like PSUs... some vendors are first-line, most are not.
Yes, an ad-blocker definitely reduces memory usage, by a lot. However, its a bad idea to use any add-on for 'important' sites. I compartmentalize my browser into different user ids so the actual chrome instance I use to access sensitive accounts is completely independent of the instance I use for general browsing. The ad-blocker is disabled for the one I use to access sensitive accounts (in fact, ALL add-ons are disabled for that one), and enabled for the one I use for general browsing.
-Matt
So actually even though the memory footprint is larger, using separate processes also makes chrome more swap-friendly, which means the kernel can page-in/page-out the tabs more efficiently. The result seems, at least for me, to be a smoother ride when I have a lot of tabs open.
Of course, swap space should always be configured on a SSD.
I always enable the site isolation option. Its nice to see google finally making it the default.
-Matt
Well, NVMe is a multi-queue spec. The best drivers and chipsets for it will assign a command and response queue to each cpu in the system. This allows for both lockless queuing operation as well as polling with no cross-cpu contamination. In this regard, NVMe is far, far superior to AHCI (aka SATA, which has only one queue for multiple targets) and SAS chipsets (which typically are not multi-queue).
At 10uS, though, interrupt overhead (with MSI-X vectoring per-cpu) still yields superior cpu-v-data performance. Interrupt overhead is only around ~1uS or so. Still, its getting close. At lower latencies polling will definitely be a win. But even at 10uS, interrupt driven operation still leaves the cpu with extra clocks to do other work in that it wouldn't have with polling.
Another problem is that NVMe chipsets generally don't have anywhere near the 1023+ queues supported by the spec. They usually come in at no more than 31 queues, which is not enough to assign one to each cpu thread on heftier systems. The chipset spec can support a lot more... in fact, many more MSI-X interrupts can be supported as well, but we just don't see it out in the field.
Most chipsets only offer 8 queues, which is near worthless on modern multi-core cpus.
nvme0: Model SAMSUNG_MZVPV128HDGM-00000 BaseSerial S1XVNYAGA03031 nscount=1
nvme0: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme2: Model INTEL_SSDPEKKW256G7 BaseSerial BTPY64430Q5B256D nscount=1
nvme2: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme3: Model TOSHIBA-RD400 BaseSerial Z6TS10AUTPEV nscount=1
nvme3: Request 64/32 queues, Returns 7/7 queues, rw-sep map (7, 7)
nvme4: Model WDC_WDS256G1X0C-00ENX0 BaseSerial 170369420988 nscount=1
nvme4: Request 64/32 queues, Returns 16/16 queues, rw-sep map (16, 16)
nvme5: Model BPX BaseSerial 8B7107720F0823024374 nscount=1
nvme5: Request 64/32 queues, Returns 7/7 queues, rw-sep map (7, 7)
nvme6: Model PLEXTOR_PX-256M8PeG BaseSerial P02652102851 nscount=1
nvme6: Request 64/32 queues, Returns 16/16 queues, rw-sep map (16, 16)
Eventually we'll start to see chipsets that implement closer to the queue limit in the spec, at which point we can theoretically assign a queue pair to every active user thread using the storage. But for now I would be happy if chipsets just gave us enough queues to implement two per cpu thread (for priority separation).
Also, Intel NVMe SSDs are *NORTORIOUSLY* bad in multi-queue configurations. Performance is far poorer than other vendors placed in the same configuration. I think this is rather ironic, actually. Intel markets low latency, but their chipsets can't handle it in the real-life configurations that NVMe was designed for.
-Matt
These are not really consumer products. Basically what you get out of an Optane drive is more durability (hence 10DWPB instead of 0.3DWPD @ 5 year warranty), and low latencies at low queue depths ( 10uS @ QD1 instead of 30uS+ @ QD1 for a NAND drive, random read).
But that's it. Everything else about Optane is non-competitive with NAND, at least so far. The price is ridiculous, the throughput at higher queue depths isn't really all that impressive.
No consumer is going to notice the lower latencies at low queue depths for the types of activities Intel advertises the product for (such as gaming), because all of those activities involve bulk reading and writing which NAND does very well, and most involve a certain degree of sequential reading or writing which modern NAND drives (such as the Samsungs) optimize very well. At higher queue depths the Intel advantage goes away entirely, so it wouldn't move the needle even for concurrent random server workloads.
Consumers for the most part never hit the actual durability limits of a NAND drive. For one, even with the lower durability the NAND drive is typically going to be double or triple the capacity of the Optane drive at the same price point, and for two, consumer use cases do not usually do 10 full drive writes per day over the life of the device or anything even close to that.
Basically, like the idiotic optane 'disk cache' Intel tried to hawk last year, this drive is a pretty bad fit as a consumer device. In this offering Intel at least put the proper durability that Optane is *supposed* to have in the specs. Around 8900TB... nothing to sneeze at when most NAND drives have durabilities in the 200-400TB range. There is something to be said for that, even without real-life integrity/retention data available yet. But... it's still just not a consumer-oriented device.
-Matt
If I understand this correctly, the kernel is being relinked and rewritten to the boot partition. That's instant fail in my book.... at least for us, the boot partition is sacrosanct. We do *NOT* write to it except when specifically upgrading a system. We do not do ad-hoc or automated writes to it because years of experience has shown that most corrupted boots (aka machine -> non-working) are due to unexpected events occurring while a filesystem is being written to.
The rename trick is not a solution (there's the 'ideal' atomic, and then there is the reality. That storage devices can fail in many different ways even while writing a particular sector, that are unrelated to that sector).
So, honestly, I think OpenBSD is making a huge mistake here. I can see randomization at load-time, but relinking and rewriting the kernel binary on every boot? No. Bad bad bad idea.
ASLR or equivalent is close to useless anyway. Malware has found ways around it, it makes debugging and bug reproducability difficult (which arguably is more important... that bugs get found and fixed, not simply detected). It also tends to fragment memory which can cause serious problems for long-running systems. And the vast majority of systems will simply restart the service anyway. They might log the seg-fault from the malware, but maybe 0.001% of system owners actually look at those logs.
-Matt
Not 100% sure but I think this particular speedup was due to an issue with non-temporal writes to memory. Such instructions are used in heavily optimized game code but not generally used in critical paths elsewhere. They are also known to be highly temperamental instructions even across Intel cpus. The Ryzen box was synchronizing the memory writes to all cores which imploded some of the heavily optimized algorithms.
So far my tests with a 1700X show Ryzen to be an excellent performance cpu, it goes up well against nearly all of Intel's offerings. It does still run a bit hotter than Intel in my tests but the power consumption is significantly better than past AMD cpus. It's a lot closer to Intel now.
More importantly, Intel's FAB advantage is dissipating fairly quickly as other fabs catch up. The combination of a modern cpu design and competitive third party fabs puts AMD in a good position to compete from this point forwards.
As AMD has shown just in the past few days, Ryzen can definitely be competitive and even more so as game devs begin to make Ryzen-optimized builds available.
-Matt
I've been running an openvpn link from my home to our colo for years. I also have it set up on all my devices so I can use it while traveling. Some of our DFly devs also use it when they are traveling. Here's my cumulative wisdom on the matter:
Generally speaking it works quite well. I use a medium-numbered port but I also have a server running on port 443 because the many weird networks one runs through when traveling often block most parts, but usually leave the https port open.
* Use UDP for the transport when running openvpn over a broadband link. This provides the most consistent experience.
* Use TCP for the transport for connections from mobile devices. This provides the most consistent experience. There are several reasons for this not the least of which being that the telco infrastructure seems to devalue UDP by a lot verses other traffic. TCP is also a lot easier to run on the server-side if you potentially have many devices connecting in, because you can run one server instance.
* Configure a smaller mss, I use 1300, so the encapsulation doesn't get fragmented by the transport. This is very important.
* Configure a relatively frequent keepalive in openvpn over a WAN link (I use 1sec/10sec), but a less frequent one over mobile (I use 20sec/120sec). This is particularly important on mobile because cell tower switches can cause long disruptions. You don't want to drop the VPN link in such circumstances if you can help it. DO NOT DISABLE THE KEEPALIVE. Always have an openvpn keepalive setup, particularly over TCP, because the TCP connection backoff can prevent your sessions from recovering or cause them to take a long time to recover if one or the other direction is not actively sending data (such as with most web connections, downloads, streaming, etc).
I personally like 'OpenVPN Connect' on IOS (which I use to connect to our project colo). And of course I run openvpn on all the DragonFly boxes including my laptop.
--
Reliability of the VPN depends entirely on the path between your location and the VPN server. The packet must travel this path in addition to the path from the VPN server to the nominal destination, and even in the best of circumstances it will double the chances of something going wrong.
I've had a number outages at home where my cable link is still operational but the cable company's path to the VPN server is having problems. Also, recovery times are longer because not only does the dead network have to revive, but the openvpn setup has to reconnect and renegotiate.
--
Commercial services are going to be hit or miss. VPN'ing your broadband link might be problematic and you have no real visibility into what the commercial service is doing with your data. That said, they are probably going to be a lot better than trusting your data to the telco and wifi hot-spots you connect from when you are mobile.
Netflix and other video streaming providers will often block-out commercial VPN IPs from the service. Generally speaking, using a commercial service for high-bandwidth connections is really hit-or-miss. You are using their bandwidth as well as your own.
When using a VPN, you are bypassing any special deals your broadband provider has made with the likes of YouTube, Netflix, etc. Remember that if the cell bandwidth is supposed to be free, because it won't be over the VPN.
--
In terms of security, its a mixed bag. The VPN will secure your traffic from your immediately ISP/Telco (aka Comcast, AT&T), and that's actually very important. However, you are not anonymous and once your traffic reaches the egress point its up for grabs by any network it flows through and, in particular, the target web page or whatever might be doing its own data collection.
But the telco data collection is MUCH more valuable to third parties than target data collection, and the VPN link at least protects you from that.
The VPN will not do a whole lot for your internal network security. If someone bre
What did it in should be obvious... one security exploit after another, non-stop, for over 8 years. HTML5 might have been the final nail in the coffin but Flash really did itself in.
When Flash was originally conceived by Macromedia very little thought went into security, because at the time security wasn't a big issue (the Internet was still fairly small, compared to today, and hackers had not yet really ramped up on a large scale). The entire codebase was inherently insecure and trusting of the flash handed to it.
In all that time, ever since that first flash product went out the door, right on up to today, nobody did more than basic hand-waving around the security problems. I'm sure they will claim that they tried... but no... they really didn't.
In the end, people finally got tired of the endless stream of security exploits.
-Matt
Dissecting the test output:
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
That means the average latency is 85uS (averaged over all reads), the lowest latency measured was 66uS and the highest was 140uS. Another important metric is the standard deviation... that is, how 'tight' access times are around that average latency of 85uS. In this case, a standard deviation of 7.5uS is very good.
Comparing this to the Optane. what Intel has stated is that the average latency over all reads for Optane NVMe will be around 10uS. They also stated that the standard deviation would be much tighter. So that is comparative.
But here's the real problem... you ask whether Optane will beat a PCIe SSD as a HDD cache in actual real-world desktop circumstances. I will add 'at the same price point'. The answer to that is going to be 'no'. The reason is that you can buy 4x to 8x the amount of NAND NVMe-based storage as you can Optaane NVMe storage for the same price.
So instead of having a 32G Octane cache, you could have a 128GB-256GB NAND SSD cache for the same price. That *completely* trumps Octane, no matter how low Octane's latency is, for this use case.
-Matt
Insulator breakdowns on circuit boards happen less often these days but they are still prevalent in Electrolytic caps and anything with windings (transformers, inductors, DC motors, etc), though it can take 20-50 years to happen and depends on conditions. And the failure mode depends too.
Generally speaking, any component with an insulator which is getting beat up is subject to the issue.
Circuit boards got a lot better as vendors switched to solid state caps. Electrolytics tend to dry out and little arc-throughs punch holes in the insulator over time (running them at less than half their rated voltage goes a long ways to lengthening their lives, which is why you usually see voltage ratings much higher than the voltages that are actually run through them).
The insulating coatings in wires used for windings has gotten better. Typically shorts develop over time and change the value of the inductance (or voltage ratio for a transformer), and other parameters until it gets to the point where it is so out of spec it stops doing its function properly. DC motors will get weaker, etc etc.
-Matt
Just so happens I have an Intel 750 in the pile, here's the issue that the linux NVMe code had to work around:
nvme3: mem 0xc7310000-0xc7313fff irq 40 at device 0.0 on pci4
nvme3: mapped 32 MSIX IRQs
nvme3: NVME Version 1.0 maxqe=4096 caps=0000002028010fff
nvme3: Model INTEL_SSDPEDMW400G4 BaseSerial CVCQ535100LC400AGN nscount=1
nvme3: Request 64/32 queues, Returns 31/31 queues, rw-sep map (31, 31)
nvme3: Interrupt Coalesce: 100uS / 4 qentries
nvme3: Disk nvme3 ns=1 blksize=512 lbacnt=781422768 cap=372GB serno=CVCQ535100LC400AGN-1
If I run a randread test on uncompressed data using block sizes 512... 131072, you can see the glitch that occurs at 65536 bytes. I will use a deep queue (128 threads, around QD4 per HW queue but considered to be QD128 globally), so this is the absolute limit of the device's performance. Look at what happens when the block size transitions from 32768 to 65536. That's the firmware screwup that the Linux folks worked around. No other NVME vendor has this issue:
xeon126# randread /dev/nvme3s1b 512 100 128 /dev/nvme3s1b bufsize 512 limit 16.000GB nprocs 128 /dev/nvme3s1b 1024 100 128 /dev/nvme3s1b bufsize 1024 limit 16.000GB nprocs 128 /dev/nvme3s1b 2048 100 128 /dev/nvme3s1b bufsize 2048 limit 16.000GB nprocs 128 /dev/nvme3s1b 4096 100 128 /dev/nvme3s1b bufsize 4096 limit 16.000GB nprocs 128 /dev/nvme3s1b 8192 100 128 /dev/nvme3s1b bufsize 8192 limit 16.000GB nprocs 128 /dev/nvme3s1b 16384 100 128 /dev/nvme3s1b bufsize 16384 limit 16.000GB nprocs 128 /dev/nvme3s1b 32768 100 128 /dev/nvme3s1b bufsize 32768 limit 16.000GB nprocs 128 /dev/nvme3s1b 65536 100 128 /dev/nvme3s1b bufsize 65536 limit 16.000GB nprocs 128
device
487912/s avg=262.34uS bw=249.81MB/s lo=60.69uS, hi=2693.38uS stddev=101.09uS
488698/s avg=261.92uS bw=250.14MB/s lo=44.12uS, hi=2693.58uS stddev=101.79uS
489023/s avg=261.75uS bw=250.37MB/s lo=54.44uS, hi=2629.42uS stddev=98.31uS
^C
xeon126# randread
device
485963/s avg=263.39uS bw=497.62MB/s lo=45.28uS, hi=2593.95uS stddev=91.90uS
486353/s avg=263.18uS bw=497.98MB/s lo=60.05uS, hi=1268.07uS stddev=89.05uS
486312/s avg=263.21uS bw=497.97MB/s lo=62.99uS, hi=1131.01uS stddev=89.04uS
^C
xeon126# randread
device
459915/s avg=278.31uS bw=941.89MB/s lo=61.83uS, hi=1244.34uS stddev=95.07uS
459681/s avg=278.45uS bw=941.33MB/s lo=68.47uS, hi=2890.23uS stddev=99.12uS
458907/s avg=278.92uS bw=939.81MB/s lo=67.12uS, hi=2838.20uS stddev=110.08uS
^C
xeon126# randread
device
442539/s avg=289.24uS bw=1812.62MB/s lo=75.33uS, hi=2985.67uS stddev=154.67uS
444166/s avg=288.18uS bw=1819.13MB/s lo=76.80uS, hi=2618.38uS stddev=145.94uS
443966/s avg=288.31uS bw=1818.44MB/s lo=73.81uS, hi=2854.27uS stddev=146.88uS
^C
xeon126# randread
device
248658/s avg=514.76uS bw=2036.99MB/s lo=81.98uS, hi=3809.30uS stddev=321.11uS
249693/s avg=512.63uS bw=2045.32MB/s lo=84.38uS, hi=3278.75uS stddev=317.38uS
247367/s avg=517.45uS bw=2026.38MB/s lo=86.12uS, hi=3032.98uS stddev=323.87uS
^C
xeon126# randread
device
124276/s avg=1029.97uS bw=2036.11MB/s lo=115.63uS, hi=3886.27uS stddev=558.13uS
124526/s avg=1027.90uS bw=2040.07MB/s lo=118.72uS, hi=3894.09uS stddev=574.04uS
125651/s avg=1018.69uS bw=2058.63MB/s lo=109.03uS, hi=3843.91uS stddev=550.71uS
^C
xeon126# randread
device
62540/s avg=2046.68uS bw=2049.30MB/s lo=137.03uS, hi=6263.58uS stddev=1148.11uS
63146/s avg=2027.05uS bw=2068.84MB/s lo=157.29uS, hi=5875.07uS stddev=1134.63uS
62563/s avg=2045.95uS bw=2050.01MB/s lo=147.76uS, hi=6244.51uS stddev=1285.00uS
^C
xeon126# randread
device
4431/s avg=28887.12uS bw=290.39MB/s lo=195.41uS, hi=59137.97uS stddev=-34838.70uS
Certainly faster writing. Read speed is about the same for the EVO (on real blocks of uncompressible data, not the imaginary compressable or zerod blocks that they use to report their 'maximum').
XPoint over NVMe has only two metrics that people need to know about to understand how it fits into the ethos: (1) More durability, up to 33,000 rewrites apparently (many people have had to calculate it, Intel refuses to say outright what it is because it is so much lower than what they originally said it would be). (2) Lower latency.
So, for example, NVMe devices using Intel's XPoint have an advertised latency of around 10uS. That is, you submit a READ request, and 10uS later you have the data in hand. The 960 EVO, which I have one around here somewhere... ah, there it is... the 960 EVO has a read latency of around 87uS.
This is called the QD1 latency. It does not translate to the full bandwidth of the device as you can queue multiple commands to the device and pipeline the responses. In fact, a normal filesystem sequential read always queues read-ahead I/O so even an open/read*/close sequence generally operates at around QD4 (4 read commands in progress at once) and not QD1.
Here's the 960 EVO and some randread tests on it at QD1 and QD4.
nvme1: mem 0xc7500000-0xc7503fff irq 32 at device 0.0 on pci2
nvme1: mapped 8 MSIX IRQs
nvme1: NVME Version 1.2 maxqe=16384 caps=00f000203c033fff
nvme1: Model Samsung_SSD_960_EVO_250GB BaseSerial S3ESNX0J219064Y nscount=1
nvme1: Request 64/32 queues, Returns 8/8 queues, rw-sep map (8, 8)
nvme1: Interrupt Coalesce: 100uS / 4 qentries
nvme1: Disk nvme1 ns=1 blksize=512 lbacnt=488397168 cap=232GB serno=S3ESNX0J219064Y-1
(/dev/nvme1s1b is a partition filled with uncompressible data)
xeon126# randread /dev/nvme1s1b 4096 100 1 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 1
device
11737/s avg= 85.20uS bw=48.07 MB/s lo=66.22uS, hi=139.77uS stddev=7.50uS
11458/s avg= 87.28uS bw=46.92 MB/s lo=68.50uS, hi=154.20uS stddev=7.01uS
11469/s avg= 87.19uS bw=46.98 MB/s lo=69.97uS, hi=151.97uS stddev=6.95uS
11477/s avg= 87.13uS bw=47.01 MB/s lo=69.31uS, hi=158.03uS stddev=7.03uS
And here is QD4 (really QD1 x 4 threads on 4 HW queues):
xeon126# randread /dev/nvme1s1b 4096 100 4 /dev/nvme1s1b bufsize 4096 limit 16.000GB nprocs 4
device
44084/s avg= 90.74uS bw=180.57MB/s lo=65.17uS, hi=237.92uS stddev=16.94uS
44205/s avg= 90.49uS bw=181.05MB/s lo=65.38uS, hi=222.21uS stddev=16.56uS
44202/s avg= 90.49uS bw=181.04MB/s lo=65.19uS, hi=221.48uS stddev=16.72uS
44131/s avg= 90.64uS bw=180.75MB/s lo=64.44uS, hi=245.91uS stddev=16.81uS
44210/s avg= 90.48uS bw=181.08MB/s lo=63.73uS, hi=232.05uS stddev=16.74uS
So, as you can see, at QD1 the 960 EVO is doing around 11.4K transactions/sec and at QD4 it is doing around 44K transactions/sec. If I use a larger block size you can see the bandwidth lift off:
xeon126# randread /dev/nvme1s1b 32768 100 4 /dev/nvme1s1b bufsize 32768 limit 16.000GB nprocs 4
device
19997/s avg=200.03uS bw=655.26MB/s lo=125.02uS, hi=503.26uS stddev=55.24uS
20090/s avg=199.10uS bw=658.23MB/s lo=124.62uS, hi=522.04uS stddev=54.83uS
20034/s avg=199.66uS bw=656.47MB/s lo=123.63uS, hi=495.74uS stddev=55.59uS
20008/s avg=199.92uS bw=655.62MB/s lo=123.50uS, hi=500.24uS stddev=55.92uS
20034/s avg=199.66uS bw=656.47MB/s lo=125.17uS, hi=488.30uS stddev=55.02uS
20000/s avg=200.00uS bw=655.35MB/s lo=123.19uS, hi=504.18uS stddev=55.98uS
And if I use a deeper queue I can max-out the bandwidth. On this particular device, random blocks of uncompressable data at 32KB limits out at around 1 GByte/sec. I'll also show 64KB and 128KB:
xeon126# randread /dev/nvme1s1b 32768 100 64 /dev/nvme1s1
device
And who the hell do you think I am mister Anonymous Coward?
So, as I thought, you don't understand either that commit or the commit later on that simplified it (159b67d7).
It's not a stripe-size limitation per say, it's just a limitation on the maximum physical transfer size per I/O request, which for 99.9% of the NVMe devices out in the wild will be >= 131072 bytes and completely irrelevant for all filesystem I/O and even most softRAID I/O.
More to the point, that particular commit does not apply to the 600P at all. It applies to several older Intel datacenter SSDs as well as the 750 series and it exists because Intel really screwed up the firmware on those devices and put crazy stupid low limitations on physical transfer size. Then they carefully designed tests that didn't hit those limitations to sell the devices.
The 750, for example, loses a huge amount of performance with a block size >= 65536 bytes. Intel maybe didn't advertise the mistake, but that is a limitation that doesn't exist in the 600P nor does it exist on ANY OTHER NVME SSD IN EXISTENCE. Only a complete idiot creates a NVMe device which can't handle block transfers of 65536 or 131072 bytes without losing massive performance. Intel = 65536 bytes.
This was a well known bug in these particular models.
In anycase, even for these models, this particular quirk has no effect on block I/O tests for block sizes 65536 bytes. And, as I mentioned already, NO OTHER NVME VENDOR has such absurdly low limits or such massively disconnected performance metrics when you exceed them. And even Intel fixed the issue on the P600.
This just points to the idiocy inside Intel. And it shows your stupidity as well, believing that a little quirk like this somehow effects the entire NVMe space (or even the entire Intel NVMe space), which it doesn't. These sorts of quirks exist for all manner of hardware, not just NVMe, to work around poor, buggy implementations.
-Matt
And, of course, any Linux or BSD operating system will use all available memory for cache data from storage anyway. I guess Windows needs a little more help to do that.
This certainly shows up in, for example, Chrome startup times. It takes around 4 seconds from a hard drive, uncached, 1 second from a SSD, 1 second from a NVMe drive, and presumably 1 second from any other form of storage because chrome itself needs a bit of cpu time to initialize itself, not to mention the time it takes to load a tab (minimum 0.5 seconds).
So honestly once one transitions from the HDD to a SATA SSD, where the difference is noticeable, any further transitions (SATA SSD -> NAND NVME SSD -> XPOINT NVME SSD -> XPOINT DDRs) are not likely to be noticeable, even without a ram cache.
I think Intel's ENTIRE marketing effort revolves around Windows' slow startup times. Or more to the point, Windows tends to seek the storage device a lot while starting up which is *very* noticeable if you have a hard drive, but most irrelevant if you have any sort of SSD.
Since one can accomplish the same thing simply buy purchasing a small SSD, I just don't see them being able to make a case for it being 'easier' as a disk caching substitute verses someone coming to the realization that their time and data are valuable enough to actually spend a bit more money on buying some native SSD storage in the first place.
The advent of the cloud is also making local mass storage less and less relevant. Here I'm not talking about those of us who insist on having our own local archives (mine is getting close to 4TB now, with another 4TB in two backup locations so... that's 12TB of storage for me). I'm talking about 'normal' people who are using cloud storage more and more often. They won't need Intel's ridiculous 'solution' either (not even mentioning the fact that a normal NAND NVME SSD to cache a HDD is a better fix for the solution they are marketing than their Optane junk).
-Matt
Motherboard vendors are just now, finally, starting to put M.2 connectors on the motherboard. Blame Intel for the slow rate of adoption. Intel came out with three different formats, all basically incompatible with each other, and created mass confusion.
But now, finally, mobo vendors are settling on a single PCIe-only M.2 format. Thank god. They are finally starting to put one or more M.2 slots and finally starting to put on U.2 connectors for larger NVMe SSDs. Having fewer SATA ports on the mobo is no longer a marketing issue. I've seen many more mobos recently with just 2-4 SATA ports.
-Matt
It would depend on the relative latency and other characteristics. XPoint is definitely not it, because XPoint can't handle unlimited writing. But in some future lets say we do have a non-volatile storage mechanic that has effectively unlimited durability, like ram, but which is significantly more dense, like XPoint.
In that situation I can see systems supporting a chunk of that sort of storage as if it were memory.
Latency matters greatly here for several reasons. First, I don't think XPoint is quite fast enough, at least not yet. The problem with any sort of high-latency storage being treated like memory at the HARDWARE level is because that latency creates massive stalls on the cpu. DRAM today causes huge many-clock stalls on a cpu. These stalls are transparent to the operating system, so the operating system cannot just switch to another thread or do other work during the stall. The stall effectively reduces the performance of the system. This is the #1 problem with treating any sort of storage technology as if it were memory.
The #2 problem is that memory is far easier to corrupt than storage (which requires a block transaction to write). I would never want to map my filesystem entire storage's block device directly into memory, for example. It's just too dangerous.
The solution that exists today is, of course, swap space. You simply configure your swap on an SSD. The latencies are obviously much higher than they would be for a HW XPoint style solution, around 50-100uS to take a page-fault requiring I/O from a NVMe SSD, for example.
The difference though is that the operating system knows that it is taking the page-fault and can switch to another runnable thread in the mean time, so the CPU is not stalled for 50-100uS. It's doing other work. Given enough pending work, the practical overhead of a page-fault in terms of lost CPU time is only around 2-4uS.
In a XPoint-like all-hardware solution, the CPU will stall on the miss. If the XPoint 'pagein' time is 1-2uS, then the all-hardware solution winds up only being twice as good as the swap space solution in terms of CPU cycles. Of course, the all-hardware solution will be far better in terms of latency (1-2uS verses 50-100uS).
But to really work in this format the non-volatile memory needs to have a nearly unlimited write capability. XPoint does not. XPoint only has around 33,000 write cycles of durability per cell (and that's being generous). It needs to be half a million at a minimum and at least 10 million to *really* be useful.
-Matt
Maybe you should point me at the commitid you are referring to, then I can address your comment more directly. I can tell you straight out, even without seeing it, that you are probably misinterpreting it.
-Matt
Intel devices have quirks, but I think you are mixing apples and oranges here. All modern filesystems systems have used larger alignments for ages. The only real issue was that the original *DOS* partition table offset the base of the slice the main filesystem was put on by a weird multiple of 512 bytes which was not even 4K aligned.
This has not been an issue for years. It was fixed long ago on DOS systems and does not exist at all on EFI systems. Regardless of the operating system.
At the same time, all SSDs past the second generation became sophisticated enough that they really stopped caring about alignment for most practical use cases.
Where Intel does mess up depends on the device. In the 600P's case, the firmware is poorly designed in many respects. In other cases, such as with the 750, performance implodes with large block sizes (64KB or higher). This just makes the device less worthy, because frankly NO OTHER SSD VENDOR has these sorts of idiotic problems.
All of that said, insofar as operating systems go, these storage-level devices have no real visibility into, understanding of, or optimizations for one particular filesystem verses another. So for all practical situations, there is NO raw performance difference between Windows, MacOS, Linux, or any of the BSD's for these storage level devices. They are completely OS-agnostic and have always been completely OS-agnostic.
-Matt
Right. They are trying to market it as something cool and new, which would be great except for the fact that it isn't cool OR new. A person can already use ANY storage device to accelerate any OTHER storage device. There are dozens of 'drive accelerators' on the market and have been for years. So if a person really wanted to, they could trivially use a small NAND flash based NVMe SSD to do the same thing, and get better results because they'll have a lot more flash. A person could even use a normal SATA SSD for the same purpose.
What Intel is not telling people is that NOBODY WILL NOTICE the lower latency of their XPoint product. At (I am assuming for this product) 10uS the Intel XPoint NVMe is roughly 1/6 the latency of a Samsung NVMe device. Nobody is going to notice the difference between 10uS and 60uS. Even most *server* workloads wouldn't care. But I guarantee that people WILL notice the fact that the Intel device is caching much less data than they could be caching for the same money with a NAND-based NVMe SSD or even just a SATA SSD.
In otherwords, Intel's product is worthless.
-Matt
I think you are a little confused by Intel marketing speak. Actually, you are a lot confused.
-Matt
Smoke. Total and complete nonsense. Why would I want to buy their over-priced octane junk verses a Samsung 951* or 960* NVMe drive? Far more storage for around $115-$130, 1.4 GBytes/sec consistent read performance, decent write performance, and decent durability.
P.S. the Intel 600P NVMe drive is also horrid, don't buy it.
http://apollo.backplane.com/DF...
-Matt
Intels claims are rather exaggerated. Their claims have already been torn apart on numerous tech forums. At best we're talking only a ~3-5x reduction in QD1 latency and the intentionally omit vital information in the specs to force everyone to guess what the actual durability of the XPoint devices is. They say '12PB' of durability for the 375GB part but refuse to tell us how much overprovisioning they do. They say '30 drive writes per day' without tellling us what the warrenty will be.
In fact, over the last 6 months Intel has walked back their claims by orders of magnitude, to the point now where they don't even claim to be bandwidth competitive. They focus on low queue depths and and play fast and loose with the stats they supply.
For example, their QOS guarantee is only 60uS 4KB (99.999%) random access latency and in the same breath they talk about being orders of magnitude faster than NAND NVMe devices. They fail to mention that, for example, the Samsung NVMe devices also typically run around ~60-70uS QD1 latencies. Then Intel mumbles about 10uS latencies but bandies about large factors of improvement over NAND NVMe devices, far larger than the 6:1 one gets simply assuming 10uS vs 60uS.
Then they go on to say that they will have a NVDIMM form for the device later this year, with much faster access times (since in the NVMe form factor access times are constricted by the PCIe bus and block I/O protocol). But with potentially only 33,000 rewrite cycles per cell to failure that's seriously problematic. (And that's the best guess, since Intel won't actually tell us what the cell durability is).
--
The price point is way too high for what XPoint in the NVMe format appears to actually be capable of doing. The metrics look impossible for a NVDIMM form later this year. Literally we are supposed to actually buy the thing to get actual performance metrics for it? I don't think so.
Its insane. This is probably the biggest marketing failure Intel has ever had. Don't they realize that nobody is being fooled by their crap specs?
-Matt
haha. notoriously that is. Damn laptop keyboard.
-Matt
Pulseaudio is nortiously linux-specific. We've had nothing but trouble trying to use it on BSD and switched to ALSA (which is a lot more reliable on BSDs) a year or two ago for that reason.
I guess that's the end of Firefox's portability. Most of our users use Chromium anyway because Firefox has been so unstable and crash-prone. Long live Chromium?
-Matt
Your problem was that you were using Kingston, Patriot, etc... all third-rate SSD vendors who use whatever flash chips happen to be cheapest. Crucial (aka Micron), Samsung, and a few others are first-line vendors.
SSDs can certainly fail, but its kinda like PSUs... some vendors are first-line, most are not.
-Matt