He said basically "Having the information 'available' isn't really helpful because we have no way to get it."
This happens all the time, it's a normal part of the real world. You can't build a computer powerful enough to simulate the Universe, within the Universe. The information may be lost to observers within the Universe, but never lost to the Universe itself. "Us" being able to retrieve the information has no bearing on the subject. Simple example. Tell me exactly what happened 13.8bil years ago, I want to know the exact position of every bit of information in the Universe down to its plank. You may not be able to answer that, but the Universe can.
You simply proposed a theoretical way to get it that can't be done either at this time, possibly ever
I didn't say "rewinding time" as a solution. If you can't rewind time and get the information back out, you've broken causality.
We already knew information wasn't destroyed, we just didn't have a mechanism for it not to. If information could be destroyed, then causality wouldn't work. What do you thin would happen in a universe where cause-and-effect didn't occur? No science, that's what.
My point is that if the "information is there" but we cannot retrieve it, it's the *same* as information being destroyed
I can do just this with only 256bits of entropy. It's called AES encryption. A future observer not being able to unscramble entropy back into its data form even with all of the energy in the Universe without knowing what the entropy was. This is not an issue. The real issue is the past version of the information not being able to be unscrambled when you run time backwards.
One of the big things about our Universe is causality, it is the single most important concept. One of its big points is given a set of parameters, a given outcome will occur and for a given outcome, there is a specific set of parameters. A traditional blackhole broke that. It was impossible to figure out the original parameters because all outcomes were the same, the mass of the blackhole increased and the information was lost.
Privilege has a weak link with intelligence. Supportive parental attention is your main metric for determining the intellectual success of a child. If society allowed poor families to not have to work all of the time, the parents could spend more time with their children and the gap would be closed. Of course this wouldn't help in the stereotypical welfare case where the parent(s) don't care and wouldn't spend time with their children given the chance.
Being able to reassemble it is not the point, it's that you can re-wind time and get the information back out. With the normal idea of a blackhole, even if you could rewind time, you couldn't get the information back out.
What would you need to put 10Gb from multiple clients back to the net? Are you honestly expecting some 1Pb connection at Comcast somewhere?
Even if customers had infinitely fast Internet connections, there would still be a maximum usage. You could give all of your customers 10Gb/s of bandwidth and still never have congestion, all you need is a bit of statistics to find the peak bandwidth usage. As long as your peak bandwidth usage is less than 80% of your pipe, you're good.
A few hundred yards from your home, your bandwidth is aggregated with that of all your neighbors and carried over a single fiber from there on, just like the data from and to your cable segment is carried over fiber to your neighborhood hub.
Google Fiber gives each customer their own lambda of bandwidth. Each "single fiber from there on" has 32 1.25Gb/1.25Gb lambdas, giving each customer their own 1.25Gb/1.25Gb of bandwidth for a grand total of 40Gb/s shared among 32 customers, each with 1Gb provisioned. Wait, that's 8Gb/s of extra bandwidth. Oh yeah, Google Fiber is undersubscribed.
You are correct, at some point all the bandwidth is shared, the same way a company paying $1mil/month for 200Gb/s of bandwidth from your favorite backbone also shares bandwidth with the rest of the Internet.
I like to use the way my ISP defined "dedicated" to me. I should NEVER see congestion on their network or to their transit provider. If I do, call them up and they'll fix it. I only had to call twice. Once was because my ISP was under a DDOS and my pings were 20ms higher than normal, and another time was because they needed to upgrade their core router because there was much greater demand for their new fiber internet than expected and they maxed out their old core router much quicker than expected. Their new router can handle terabits of bandwidth, plenty for our small city of 30k people.
Being that Level 3 is their transit provider, they also have a "no congestion" policy. You should never see congestion within Level 3's network, and also rarely to their peers, but there are some exceptions because of peering disputes.
If I should never see congestion within my ISP or their transit provider, that's pretty much all I can expect. I don't need a point-to-point fully connected connected graph of fiber to every person in the world to have "dedicated".
My ISP just went the route of, all accounts are uncapped business accounts. $20 for 20/20, $35 for 70/70, $45 for 100/100, $100 for 250/250, $200 for 500/500, and $300 for 1Gb/1Gb. If you want an SLA with that, $3k, but if you feel you don't need an SLA, $300. Bandwidth is cheap, SLAs are not.
You also have to put up with random 30sec-1min downtime between 12am-2am a few times a month. If you don't need an SLA, you can save a lot of money and get the same quality service while the service is working.
100/100 dedicated fiber for $45/m, no cap, get your full speed 24/7 to nearly every datacenter in the world. I can reach all of Midwest USA in 7-20ms, East coast in 30ms, all of Southern in 45ms, and West coast in 60ms. Less than 1ms jitter to all of the USA and under 5ms of jitter to the entire world. I queued up several terabytes of download and let it run over peak hours and my average download rate was 99.5Mb/s +- 0.25Mb/s, ping to my ISP stayed at a flat 1ms the entire time. 0 packets lost over the period of a week is the norm. Over the period of a month, I do get upwards of 10 packets lost, typically in a burst during the middle of the night on Sunday.
DOCSIS 3.1 requires node splits, because of its much reduced distance. It also requires new amps, filters, and cables. On top of all of that, it required redistributing frequency allocations because the block sizes have changed. It's about as simple as upgrading from 100Mb Copper Ethernet to 10Gb Copper Ethernet. They're both Ethernet. Drop in replacement, right?
I think you mean "volume" instead of "filesystem". All snapshots are relative to the pool. You can create a snapshot of several volumes at once in perfect sync because the snapshot is actually at the pool level, but only attached to the relative volume that you're looking at. When you have mounted volumes inside of volumes, not only will the data of the current volume be part of the snapshot, but all of the data in the mounted volumes.
BTRFS stores snapshots in the volume, ZFS stores them in the pool. If you snapshot a BRTFS sub-volume, you will only get that sub-volume and nothing else. You can make a script to loop through the mounted sub-volumes within a sub-volume, but there is no guarantee they will be all in sync. Even worse, if a sub-volume later gets unmounted and moved somewhere else, or the sub-volume is deleted in BTRFS, all data for that sub-volume is gone, including it's snapshots. In ZFS, if you have a snapshot of parent volume and there are mounted child volumes, and you snapshot at the parent level, everything is part of that snapshot, including the contents of the child volumes.
Even easier than that. Modern edge network devices(Modems, ONTs, etc) for residential broadband to be limited to their assigned IPs from the DHCP server. They already have DHCP server reflection going on, all the modem does is monitor the DHCP traffic and update an Internal list.
The only annoyance I am aware of is if they need to restart their internal network, your DHCP lease may be invalidated and suddenly you no longer have Internet access until you clear your lease and negotiate a new one. It has happened to me a few times. The ISP could get around this by remote cycling the Ethernet port off then on, which most computers will renegotiate DHCP on physical link loss.
Their term of "Crash" is different than yours. You assume "crash" means the Universe self-destructed. They just assume the writes were interrupted, like power failure or your kernel locked up, not your harddrives dying.
"MIT's New File System Won't Lose Data During Crashes" can be read as "MIT's New File System Won't be at fault for lost data once committed during any interruption of writes"
ZFS does the same thing, minus the proofs. If you do a sync write and ZFS says it completed, then that data is not going to be lost due to any fault of ZFS. But what if someone threw all of your harddrives into lava? Again, not the fault of ZFS. Same idea.
Rule of thumb, if your FS needs FSCK, it can probably lose data given the right kind of interruption.
A true versioning file system would crumble under an IO workload with many small updates. The problem with btrfs is you can't snapshot two subvolumes in sync with eachother. Snapshots are per subvolume. In ZFS, snapshots are at the pool level, allowing volumes to be in perfect sync.
Isn't that like saying "only real construction workers lay bricks 100% of the time"? Writing code is a small part of programming. The most important part of programming is the how and why of each piece of the puzzle. I can get more experience programming without a computer than someone writing code.
what was the last program you wrote for yourself, why did you write it, etc
Never wrote a program for myself. If open source is any indication of those types of people, they're not very good at design and architecture. The best programming language is pseudocode. You can crank out a concept 100x faster and tell if it'll work given certain assumptions.
I don't get paid for hours, I get paid for results. Of course results take time and there is a certain correlation with a maximum amount of time, but that amount of time is typically less than 40 hours. Programming isn't like factory work where 2x the hours means 2x the work, 2x the hours may mean 1/2 the work once you include all of the technical debt you've induced from being burnt out.
If I'm doing hard mental work, I may go home after 7 hours at work, which may have included 3 hours of breaks. Once you're reached your limit, you have negative value. If you're working 100 hours as a "developer", you doing your position a disservice and creating sub-par code. Some people may be able to do those hours, but 99.99% can't, the rest may think they can, the same way people think multi-tasking makes them "faster".
Yeah, file systems are not created often, so if you're going to create one, you best make sure it's at least better on paper than what is out there. We don't need speed, we need scaling. Even ZFS has issues with large amounts of memory and dedup is horrible on large pools. HAMMER2 sounds awesome, but it does have a lot of crazy features that make it more complex which increases the chance of bugs, not coming to fruition or not getting ported to all Open source OSes. I don't need master-master distribution.
He said basically "Having the information 'available' isn't really helpful because we have no way to get it."
This happens all the time, it's a normal part of the real world. You can't build a computer powerful enough to simulate the Universe, within the Universe. The information may be lost to observers within the Universe, but never lost to the Universe itself. "Us" being able to retrieve the information has no bearing on the subject. Simple example. Tell me exactly what happened 13.8bil years ago, I want to know the exact position of every bit of information in the Universe down to its plank. You may not be able to answer that, but the Universe can.
You simply proposed a theoretical way to get it that can't be done either at this time, possibly ever
I didn't say "rewinding time" as a solution. If you can't rewind time and get the information back out, you've broken causality.
MSSQL may be crap, but it's still better than most of the alternatives, paid for Open Source.
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate
We already knew information wasn't destroyed, we just didn't have a mechanism for it not to. If information could be destroyed, then causality wouldn't work. What do you thin would happen in a universe where cause-and-effect didn't occur? No science, that's what.
My point is that if the "information is there" but we cannot retrieve it, it's the *same* as information being destroyed
I can do just this with only 256bits of entropy. It's called AES encryption. A future observer not being able to unscramble entropy back into its data form even with all of the energy in the Universe without knowing what the entropy was. This is not an issue. The real issue is the past version of the information not being able to be unscrambled when you run time backwards.
One of the big things about our Universe is causality, it is the single most important concept. One of its big points is given a set of parameters, a given outcome will occur and for a given outcome, there is a specific set of parameters. A traditional blackhole broke that. It was impossible to figure out the original parameters because all outcomes were the same, the mass of the blackhole increased and the information was lost.
Privilege has a weak link with intelligence. Supportive parental attention is your main metric for determining the intellectual success of a child. If society allowed poor families to not have to work all of the time, the parents could spend more time with their children and the gap would be closed. Of course this wouldn't help in the stereotypical welfare case where the parent(s) don't care and wouldn't spend time with their children given the chance.
Being able to reassemble it is not the point, it's that you can re-wind time and get the information back out. With the normal idea of a blackhole, even if you could rewind time, you couldn't get the information back out.
What would you need to put 10Gb from multiple clients back to the net? Are you honestly expecting some 1Pb connection at Comcast somewhere?
Even if customers had infinitely fast Internet connections, there would still be a maximum usage. You could give all of your customers 10Gb/s of bandwidth and still never have congestion, all you need is a bit of statistics to find the peak bandwidth usage. As long as your peak bandwidth usage is less than 80% of your pipe, you're good.
A few hundred yards from your home, your bandwidth is aggregated with that of all your neighbors and carried over a single fiber from there on, just like the data from and to your cable segment is carried over fiber to your neighborhood hub.
Google Fiber gives each customer their own lambda of bandwidth. Each "single fiber from there on" has 32 1.25Gb/1.25Gb lambdas, giving each customer their own 1.25Gb/1.25Gb of bandwidth for a grand total of 40Gb/s shared among 32 customers, each with 1Gb provisioned. Wait, that's 8Gb/s of extra bandwidth. Oh yeah, Google Fiber is undersubscribed.
You are correct, at some point all the bandwidth is shared, the same way a company paying $1mil/month for 200Gb/s of bandwidth from your favorite backbone also shares bandwidth with the rest of the Internet.
I like to use the way my ISP defined "dedicated" to me. I should NEVER see congestion on their network or to their transit provider. If I do, call them up and they'll fix it. I only had to call twice. Once was because my ISP was under a DDOS and my pings were 20ms higher than normal, and another time was because they needed to upgrade their core router because there was much greater demand for their new fiber internet than expected and they maxed out their old core router much quicker than expected. Their new router can handle terabits of bandwidth, plenty for our small city of 30k people.
Being that Level 3 is their transit provider, they also have a "no congestion" policy. You should never see congestion within Level 3's network, and also rarely to their peers, but there are some exceptions because of peering disputes.
If I should never see congestion within my ISP or their transit provider, that's pretty much all I can expect. I don't need a point-to-point fully connected connected graph of fiber to every person in the world to have "dedicated".
My ISP just went the route of, all accounts are uncapped business accounts. $20 for 20/20, $35 for 70/70, $45 for 100/100, $100 for 250/250, $200 for 500/500, and $300 for 1Gb/1Gb. If you want an SLA with that, $3k, but if you feel you don't need an SLA, $300. Bandwidth is cheap, SLAs are not.
You also have to put up with random 30sec-1min downtime between 12am-2am a few times a month. If you don't need an SLA, you can save a lot of money and get the same quality service while the service is working.
100/100 dedicated fiber for $45/m, no cap, get your full speed 24/7 to nearly every datacenter in the world. I can reach all of Midwest USA in 7-20ms, East coast in 30ms, all of Southern in 45ms, and West coast in 60ms. Less than 1ms jitter to all of the USA and under 5ms of jitter to the entire world. I queued up several terabytes of download and let it run over peak hours and my average download rate was 99.5Mb/s +- 0.25Mb/s, ping to my ISP stayed at a flat 1ms the entire time. 0 packets lost over the period of a week is the norm. Over the period of a month, I do get upwards of 10 packets lost, typically in a burst during the middle of the night on Sunday.
DOCSIS 3.1 requires node splits, because of its much reduced distance. It also requires new amps, filters, and cables. On top of all of that, it required redistributing frequency allocations because the block sizes have changed. It's about as simple as upgrading from 100Mb Copper Ethernet to 10Gb Copper Ethernet. They're both Ethernet. Drop in replacement, right?
I already said that. The snapshot is "attached" to the dataset, but the snapshot is scoped to the pool.
You need a way to objectively measure a student's progression. How do you do this without any form of grading?
I think you mean "volume" instead of "filesystem". All snapshots are relative to the pool. You can create a snapshot of several volumes at once in perfect sync because the snapshot is actually at the pool level, but only attached to the relative volume that you're looking at. When you have mounted volumes inside of volumes, not only will the data of the current volume be part of the snapshot, but all of the data in the mounted volumes.
BTRFS stores snapshots in the volume, ZFS stores them in the pool. If you snapshot a BRTFS sub-volume, you will only get that sub-volume and nothing else. You can make a script to loop through the mounted sub-volumes within a sub-volume, but there is no guarantee they will be all in sync. Even worse, if a sub-volume later gets unmounted and moved somewhere else, or the sub-volume is deleted in BTRFS, all data for that sub-volume is gone, including it's snapshots. In ZFS, if you have a snapshot of parent volume and there are mounted child volumes, and you snapshot at the parent level, everything is part of that snapshot, including the contents of the child volumes.
Level 3 is a transit provider. Source IPs from other networks leaving their network is the norm.
Even easier than that. Modern edge network devices(Modems, ONTs, etc) for residential broadband to be limited to their assigned IPs from the DHCP server. They already have DHCP server reflection going on, all the modem does is monitor the DHCP traffic and update an Internal list.
The only annoyance I am aware of is if they need to restart their internal network, your DHCP lease may be invalidated and suddenly you no longer have Internet access until you clear your lease and negotiate a new one. It has happened to me a few times. The ISP could get around this by remote cycling the Ethernet port off then on, which most computers will renegotiate DHCP on physical link loss.
Their term of "Crash" is different than yours. You assume "crash" means the Universe self-destructed. They just assume the writes were interrupted, like power failure or your kernel locked up, not your harddrives dying.
You need to read in-between the lines.
"MIT's New File System Won't Lose Data During Crashes" can be read as "MIT's New File System Won't be at fault for lost data once committed during any interruption of writes"
ZFS does the same thing, minus the proofs. If you do a sync write and ZFS says it completed, then that data is not going to be lost due to any fault of ZFS. But what if someone threw all of your harddrives into lava? Again, not the fault of ZFS. Same idea.
Rule of thumb, if your FS needs FSCK, it can probably lose data given the right kind of interruption.
A true versioning file system would crumble under an IO workload with many small updates. The problem with btrfs is you can't snapshot two subvolumes in sync with eachother. Snapshots are per subvolume. In ZFS, snapshots are at the pool level, allowing volumes to be in perfect sync.
Isn't that like saying "only real construction workers lay bricks 100% of the time"? Writing code is a small part of programming. The most important part of programming is the how and why of each piece of the puzzle. I can get more experience programming without a computer than someone writing code.
what was the last program you wrote for yourself, why did you write it, etc
Never wrote a program for myself. If open source is any indication of those types of people, they're not very good at design and architecture. The best programming language is pseudocode. You can crank out a concept 100x faster and tell if it'll work given certain assumptions.
I don't get paid for hours, I get paid for results. Of course results take time and there is a certain correlation with a maximum amount of time, but that amount of time is typically less than 40 hours. Programming isn't like factory work where 2x the hours means 2x the work, 2x the hours may mean 1/2 the work once you include all of the technical debt you've induced from being burnt out.
If I'm doing hard mental work, I may go home after 7 hours at work, which may have included 3 hours of breaks. Once you're reached your limit, you have negative value. If you're working 100 hours as a "developer", you doing your position a disservice and creating sub-par code. Some people may be able to do those hours, but 99.99% can't, the rest may think they can, the same way people think multi-tasking makes them "faster".
Yeah, file systems are not created often, so if you're going to create one, you best make sure it's at least better on paper than what is out there. We don't need speed, we need scaling. Even ZFS has issues with large amounts of memory and dedup is horrible on large pools. HAMMER2 sounds awesome, but it does have a lot of crazy features that make it more complex which increases the chance of bugs, not coming to fruition or not getting ported to all Open source OSes. I don't need master-master distribution.