There's something to be said for 'bad' use of DVCS in a private company. But here are the good usage patterns IHMO 1) checkin after every logically complete operation (for What The Fun Just Happened moments) 2) checkin every night (so if I'm sick tomorrow people can get to my work) 3) my-code-doesn't work, collaborate with someone down the hall or geographically remotely 4) I want to experiment with an alternate code path (but don't want to deal with the politics - remeber coders have egos) 4a) I want to experiment with an alternate code path, but don't want any risk to the trunk 4b) This code is too specialized, we need a much simplified version for this use-case (but need to maintain the original code path) 5) Let's say we suck at graphics, so we outsource to a 3rd party company. How the hell does this happen with central version control. F-no do we give them direct access. And if they email us a zip file of the final product, how do we keep in sync from there-on-out? Their or our changes will get over-written or go into non-versioned-hell. With DVCS, we can provide read-only access (possibly via emailed repository-clones). DVCS allows trivial re-integration. Security is maintained, reconciliation is trivial. History and auditing is somewhat maintained (you can obviously fake it). And most importantly we can switch a new NEW 3rd party contractor at any time, possibly even AT THE SAME TIME. 6) rebase (not DVCS specific). If I've got 10 branches (in svn or any place else), do I know for sure that after a while, the history has gotten too complex? In git at least, we can say, ok, these feature-branches should all be thrown away - lets 'rebase' to produce a pristine trunk and quite literally throw away all the branches by flattening them. 7) central 'owner'/'maintainer' of a given project. Make sure someone knows everything that's going on with the project by having them integrate or 'bless' an integration. With central repositories, this requires they do the merging. with DVCS, you do the merging as a 'candidate' and they either accept it as their own or not (e.g. fast-forward merging of your repository with theirs). 7a) as with linux, for larger projects development-teams, you can have lieutenants that perform step 7 for sub-sections of the larger project. For which each lieutenant will 'trust' each other's official repository and auto-fast-forward-merge. The singular project-manager can then choose for political reasons (because we are political in nature) to disagree with lieutenants decisions - as they are they primary responsible party (at least in closed-sourced commercial solutions). This works because lieutenants can continue with their private fork until they can form a mutiny - so ego's are maintained.
Well, what I haven't heard in this thread yet is that public utilities fall into a special category called natural monopoly. Phone-[land]lines, bridges, power, train-ways, and several other things deal with natural scarce resources or access points / paths, and thus it doesn't make sense to have 1,000 companies competing to give you the next MP3 player - may the best brand win, or rather the low-production-cost supplier win. Natural monopolies do not have 'free market solutions'. They REQUIRE social intervention (who gets to own the land for the train-track?). The generally idealized natural monopoly is a heavily regulated system (they can only charge cost + 10%) and are subsidized to overcome their over-heads or fixed capital investments. Meaning Everybody pays taxes to pay for the infrustructure, then they pay cost + 10% for the marginal cost of production (of water, electricity, road maintenance, etc). Sometimes you can get away with government backed bonds that handle the capital expenditures. It's extremely anti-market. You have the producer now WANTING to be wasteful, because their '10%' gives their better total profits. Also they have little or not value in capital investments - that would generally only be useful to increase efficiency - which again, is opposing their bottom-line. The only time they might want to perform capital investment is to increase capacity.
Thus in ideal situations, you the people subsidize redundant providers, for at least they can compete for a larger share of the pie, but even then you have trivial mafia style oligopolies.
Not sure what you're saying. Why do you suggest relational models support more situations. You can not model recursive situations effectively. You can not model hierarchical data-structures - at least not ones with cycles. The join syntax itself is very verbose and when there are significant numbers of indexes, the number of permutations of possible join strategies grows exponentially (if you had 200 tables joined in a single query, with each table utilizing 4 indexes, you'd have a nearly impossible to optimize query). Yes this is an odd query, but only because RDBMS does not support this style of data-traversal - many systems would crap out at 1 to 4KB of SQL syntax. Not to mention the locking structure overhead would practically serialize access to the DB (yes, I know, why the hell would a non-transactional read cause locks.. because joins just simply suck in most RDBMS implementations).
Compare that to an OODBM like Objectivity, where joins are replaced with 64bit foreign pointers to virtual addresses in possibly alternate storage spaces. And more importantly, the SQL schema replaces a join with a single dot, which is very familiary to Object Oriented systems, including ORM layers.. So "SELECT person.father.mother.daughters[0].siblings[0].employer.company.website.contact.phone FROM person WHERE id=?" is a legitimate query. It is FAR more expressive than if each was a separate table join. And each traversal requires a potentially uncached pair of page-lookups - one for the virtual mapping table, and one for the actual disk block. Compare that to traditional indexed based foreign keys which require log_base256(n) cold disk hits per join. Both OODBMS and RDBMS support B-Tree and Hash-Map symbolic indexing (e.g. login-name, range searches, etc). But for the simple graph traversal, OODBMS is just hard to beat.
And no-sql solutions are really all about documents in general. complex data-structures which may or may not be hierarchical, yet have schema validation support (see voldemort JSON data-store, or couchDB). Depending on the schema, new inserts can extend the schema on the fly, or via DML statements, you can enforce that all NEW requests have a new schema, while leaving old records with a previously well defined schema. The document would have to retain it's schema id. Certainly an RDBMS could do this as well, though most are optimized to support highly structured rectangular data, with only the use of nullity supporting 'optional' schema additions.
I do, very much like the set-nature of RDBMS, and for large complex cross-table index based queries (e.g. I need an index from tables A, B, C that are not their primary/foreign keys), RDMS supports some pretty damn complex capabilities - I'm specifically thinking of postgres, where you can do hash-joins of 5 separate queries, each with their own index, or covering indexes where you don't even need to access the main row to get the result (which is essentially what most nosql solutions do), or synthetic-function-output indexes (indexes on the output of functions instead of the data itself) or conditional indexes, where you fine-tune which you know you'll search on (create index job_state on jobs(state) where state not in ('COMPLETED','ARCHIVED')). Most nosql solutions are no where ready to support these complex search optimizations - though things like couchDB do allow you to have lazy indexes with user-defined functions - but I think they required indexable data on every row.
But I don't see most of these advantages as being specific to RDBMS - just in their maturity. HBase, cassandra are still in their infancy - not even official releases yet from what I remember.
Hense the introduction of convention-based-configuration.. e.g. zero configuration except where it deviates from convention... In other words, field A matches a class also named A. Controller B matches view named B. HTTP form-field C matches input field C. In one sense it's magic/side-effect based coding. In another, it's intuitive programming.
I've personally taken thousand-file projects (quarter million lines of code) and only required about 1,000 lines of XML (almost all dealing with database, bootstrapping, and environment settings - for which you'd have needed more java code to have done the same thing).. And 5x as much C++ code. There are definitely polymorphic cases where this breaks down, but I've found that OO is highly over-rated - especially when dealing with database persistable objects - I've gone back and replaced OO-DB styles with switch-statement dispatches to OO data-structures.
Huh? digging ditches means you work in... a ditch... Many of us consider dealing with the arbitrary, constantly changing short-comings / inconsistencies / poorly-thought-out OS decisions of the primary.NET platform to be like working in a ditch. And no, you can't do much useful with.mono - all the power of.NET is in it's libraries which are tightly tied to the OS of dis-choice (for that few of us).
Sorry, but we're either in disagreement, or you're not understanding.
When a Unix partition gets to the last xth percent remaining space it locks the partition down to all but root.
When a partition gets to 15% free, all sorts of monitor / alarm bells SHOULD be ringing (if you have a properly configured system).
If you get to the 50% mark, then you need to start planning ahead for an upgrade.
By over-allocating, you can do this at a group level instead of on a per-partition level.
Thus keep all partitions 15% full (without wasting 85% of disk-space - due to over-allocation).
Running out of disk space is running out of disk space - whether it's at the ext layer or LVM layer or NAS layer.. You should be monitoring and planning ahead no matter what layer it's in.
The fact that editing a pre-existing block CAN cause a failure (because of sys-admin-neglegance) is NOT a fault of the application or technology. Especially since there is no difference between it failing due to pre-existing disk-allocation v.s. appending to a file (/var/log/messages). It's the same as saying 'well, my application polled the disk-free in second one, then assumed it was safe to allocate an extra 4Meg.. But then when it came time to do so, there was no free space. waaah).
Likewise, with mysql-INNODB, I can utilize 25 4TB eSATA externally managed machines (assuming 4 2TB disks RAID-1'd together), each mapped to a 4TB block-device, which INNODB treats efficiently.. (Or if using ext4, I could use LVM to map those eSATA devices together for a general purpose disk).
I could even have LVM stripe those remote volumes to get better IOPs.
At $700 a base machine (gray boxes) (including disks), that's $17,500.
If I didn't care about random-write-speed, then I could go with RAID5. Put 3 disks in each machine and reduce costs to $15,000.
Or I could go with RAID5 on a hot-swappable 16-disk $1,000 RAID controller and reduce it down to 4 machines. Bringing the price down to $11,600.
We're assuming either with HDFS or mysql or any other app, that you build redundancy on TOP of the applications. Which is the ONLY smart thing to do with enterprise grade applications.
Failure of a disk is ASSUMED.. Meaning your $100,000 netapp WILL fail one day.. You are an IDIOT if you don't believe this.. Sure it may take 10 years. But what happens then? Replace it every 5 years? How about every 3? That's not a capital cost, that's a variable cost. Sure your data may be worth it. But as a business, can you attain the same degree of reliability cheaper? HELL YEAH. And the 3 year replacement cycle doesn't handle a power surge which blows the hardware (say a rogue UPS). Sure put redundant power supplies on isolated UPSes - ok, I'm unplugging an ethernet cable and accidentally cause an electrical surge which blows the network controller.. OPS, data is safe, no access! 2 days down-time!!
The point is RAID was invented to solve a class of problems in a cost effective way. It doesn't solve every problem, and I completely agree that solutions LIKE netapp are GREAT when you want to start medium and leave room to grow large. When you want to consolidate and dynamically repartition disk space AND spindles (e.g. vmware solutions). When you want lower maintenance costs (avoiding having to rebuild lots of regularly failing gray-boxes, constantly swapping out one of hundreds of $100 2TB disks).. BUT I submit to you that on scale, this is cheap / slave-labor. You hire high-school students to replicate base OS hard drives (with a replicating station), you buy 25% overstock of base hardware so you have fast (30 minute) build new machine / deploy into cluster, environments. You only need accuracy, you don't need intelligence. Yes, a $30k / year salary is more than the extra $30k you spend on the netapp.. But you'd have to buy dozens of netapps to really scale an enterprise solution.. And that same $30k can usually handle it.
So there is absolutely a scale region where a netapp makes sense.. But it is NOT the high end - which is what they'd like you to believe. And I submit that there are application-level redundancy solutions which are more reliable (though at the cost of semi-static configurations - which a netapp type system does provide value-add).
Uhh, what does LVM do then? Oh yeah, you OVER-ALLOCATE.. My bad. And yes, with LVM-snapshots, you very well can crash the system if free space is maxed out. I don't recall, but I believe it deletes the snapshot, but since that's a mounted file-system, it's just as bad.
There's also commercial NAS hardware which works like this. They have little green, yellow and red lights next to each physical disk.. Supposedly you should swap out a yellow or red disk with a larger one to avoid either automatically reducing RAID redundency (e.g. 2 disk redundancy reduces down to 1 disk redundancy), and then ultimately producing seek errors when no remaining physical blocks can map to a requested virtual block. I forget the name of the vendor in question, but it was far cheaper than a netapp - but really meant to sit next to your workstation (obviously).
Imagine a specialized net-appliance (screw netapp). It has 32 Gig of RAM and a 512Gig high-speed random-access SSD (where read speed is more important).
Split the 512Gig into two 256Gig portions.
The first portion contains 4 bytes of the MD5 sum of each 512B block (represents up to 32TB of block storage).
Every 2048B block being back-ground scanned for deduping does an SSD lookup against the 256G SSD hash-map which is open-chained and points back to existing 2048B blocks on disk. This lets us efficiently cross-link (reverse copy-on-write). I'd prefer 512B block boundries, but most file systems use 2048B blocks (or large) and HD's are starting to move to this to increase ECC efficiency. Plus it just reduces the overhead.
So that's just a minor optimization of whatever people have already been doing in software.. bla bla. Boring stuff, right.
For those blocks that DIDN'T match...
We do a modified version of zlib compression (which only stores 32KB worth of back-data). We extend this to store 4 Gig worth of code points (assuming a 4Byte identity prefix match and 4 Byte SSD disk block pointer). Each reference is a 256Byte block which thus supports up an 8 bite length pointer.
So now as you scan through the 2048 byte blocks being stored on disk, you do a hash-lookup of every consecutive 7 bytes. You hash the first 4 bytes and lookup in RAM. If matched, you lookup in SSD the remaining byte-string and see how many bytes of match. If more than 7, you store a disk-pos + length vector. Saving you at least 1 bytes (1 byte magic, 4 byte pointer, 1 byte length), and possibly the entire 256 bytes. If you can compress to one of 50% or 75%, you store at 1024 bytes or 512bytes.. As soon as you reach either of these two boundries, you stop compressing. Though this does assume you're not using 2048B boundry HDs. You then store into one of 2 special areas on disk that are 1/2 and 1/4 block compressed.
So this solves highly compressible but single byte-offset situations.. e.g. I copy sections of source-code (at least 512 bytes) and paste them either into the same file or some other file.
So long as you don't pull out the HD, the ref-map in SSD matches the previous runs on disk, so you don't have to do random disk-seeks to reconstruct the blocks. So now reading highly compressed blocks not only reduces the number of bytes read from physical disk, it increases the ratio of SSD to HD reads.:)
I'm only joking of course. But not really.. You hiring netapp??
This thread seems to be getting too defocused from reality.
Here's the rub.
Checksumming == good. All else being equal, we should have more of it. But checksumming is expensive (adds latency to your write).
So once you have it, might as well use it.
Background thread can compare checksums of blocks as starting points to identifying identical blocks (since checksum collisions are more than possible, they're only a matter of time - I see colliding MD5 sums all the time in BackupPC - you can tell because they append a semi-colon + sequence ID to the file-name to disambiguate).
As some thread posters have listed - file-names prevent entire files from being block-shared.. Rubbish. File-names in Unix file systems have never been coupled with file-metadata. Files are identified by inode numbers, not file-names.. file-names are meta-data stored in directory files (which is why hard links are possible). Now unless you have noatime in your mount options, replicating inode descriptors will be nearly impossible, but that should only be a small fraction of your disk blocks anyway.
Historically, the main way you'd leverage shared blocks is through snapshot images - which all use copy-on-write. LVM and netapp and I'm sure dozens of other vendors supply this because it's trivial to do.
All this is really likely doing is extending the existing SNAPSHOT copy-on-write logic to merge blocks from different file-systems (which snapshots technically are) AND from within the same file-system. And most likely done through block-level checksum comparisons. Though since ext and many other file-systems don't naturally support check-sums at the block level, I doubt this is leveraging file-system level operations.
BigTable scales pretty well (go read it's white-papers) - though perhaps not as efficiently as map-reduce for something as simple as text to keyword statistics (otherwise why wouldn't they have used it all along).
I'll caveat this whole post with - this is all based on my reading of the BigTable white-paper a year ago, but having played with Cassandra, Hadoop, etc occasionally since then. Feel free to call me out on any obvious errors. I've also looked at a lot of DB internals (Sybase, Mysql MyISAM/INNODB and postgresql).
What I think you're thinking is that in a traditional RDBMS (which they hint at), you have a single logical machine that holds your data.. That's not entirely true, because even with mysql, you can shard the F*K out of it. Consider putting a mysql server on every possible combination of the first two letters of a google-search. Then take high density combinations (like those beginning with s) and split it out 3, 4 or 5 ways.
There are drastic differences to how data is stored, but that's not strictly important - because there are column-oriented table stores in mysql and other RDBMS systems. But the key problem of sharding is what's focused on Mysql-NDB-Cluster (which is a primitive key-value store) and other distributed-DB technologies that best traditional DBs at scalability.
BUT, the fundamental problem that page-searches are dealing with is that I want a keyword to map to a page-view-list (along with meta-data such as first-paragraph / icon / etc) that is POPULATED from statistical analysis of ALL page-centric data. Meaning you have two [shardable] primary keys. One is a keyword and One is a web-page-URL. But the web-page table has essentially foreign keys into potentially THOUSANDS of keyword records and visa-versa. Thus a single web-page update would require thousands of locks.
In map-reduce, we avoid the problem. We start off with page-text, mapped to keywords with some initial meta-data about the parent-page. In the reduce phase, we consolidate (via a merge-sort) into just the keywords, grouping the web pages into ever more complete lists of pages (ranked by their original meta-data - which includes co-keywords). In the end, you have a maximally compact index file, which you can replicate to the world using traditional BigTable (or even big-iron if you really wanted).
The problem of course, was that you can't complete the reduce phase until all web pages are fully downloaded and scanned.. ALL web pages. Of course, you do an hourly job which takes only high-valued web-pages and merges with the previous master list. So you have essentially static pre-processed data which is over-written by a subset of fresh data.. But you still have slowest-web-page syndrome. Ok, so solve this problem by ignoring web-load requests that don't complete in time - they'll be used in the next update round.. Well, you still have the issue of massive web-pages that take a long time to process. Ok, so we'll have a cut-off for them too.. Mapping nodes which take too long, don't get included this round (you're merging against you last valid value - so if there isn't a newer version, the old one will naturally keep). But the merge-sort itself is still MASSIVELY slow. You can't get 2-second turn-around on high-importance web-sites. You're still building a COMPLETE index every time.
So now, with a 'specialized' GFS2 and specialized BigTable, either or both with new fangled 'triggers', we have the tools (presumably) to do real-time updates. A Page load updates its DB table meta-data. It see's it went up in ranking, so it triggers a call to modify the associated keyword's table (a thousand of them). Those keywords have some sort of batch-delay (of say 2 seconds) so that it minimizes the number of pushes to production read-servers.. So now we have an event queue processor on the keyword table. This is a batch processor, BUT, we don't necessarily have to drain the queue before pushing to production. We only accept as many requests as we can fit into a 2 second time-slice. Presumably
"Will that still be your position if they win in November? After all, you're on a threat about hosting a counter rally Colbert-esque. And promoting the fact that it's all an act even by the right, despite the real ramifications, and call that a joke."
The joke I think the parent is referring to is about the Republican's leading them to think they are supporting the causes of the tea party. That the exact same people that spent us into oblivion are going to have their interests at heart. When all most people in power care about is appeasing their biggest doners - sure they have a say about which doners they want to pick - it's called picking a party. Now there are some legitimate tea party people.. People that I wouldn't trust to run a 7-eleven. Those I would truely and honestly believe feel the need to minimize taxation. Because.. well, it only takes about 50 IQ to figure out that the government forces you to pay taxes, and that in doing so it costs YOU hard-earned money. Easy fix.. Run for congress and don't do it anymore.. problem solved.. YEAH, if only thousands of years of governance could have figured that out!!! I applaud the tea-party movement for it's innovation.
Oh, by the way, the dumb f*ks don't even know (or acknowledge) what the tea party was.. It was NOT a fight for anti-taxation.. It was a fight against being a second class citizen - where taxation occurred without any local benefit, nor with any autonomy. Sure, once established, US congress had minimal taxation (predominantly on foreign trade because people didn't understand the ramifications of doing so at the time). But it was also a time when for the next 160 years we would not have a standing navy/army to fund. Nor an interstate road to maintain (for the same standing army).
now now now. You see a lot of depictions of ill-informed tea party goers, but this is hardly representative. Many honestly want too privatize S.S. (essentially undoing the social safety net because they, their friends, and those they care about already have their golden parachutes). Many believe military spending is the only legitimate use of taxation. The ones you CAN throw popo at are the ones that say the best way to cut our deficit and taxation is by reducing foreign aid and by reducing pork-spending (which collectively is like 1.5%). Which is the majority of American's polled - not just lipton something for nothings.
Then I guess you can include the white-supremisists (disguised as American values people, or old people), the anti immigration people, the America-firsters, and my personal favorite, the God-wants-us-to-winn ers. Never mind the sage advice to hope to be on God's side.
Just reread the definition of P=NP (been a while).. Guess FFT isn't a good example. There's no P verify and NP answer aspect of the FT.
But then again, traveling salesman problem (minimum path) isn't P to verify as far as I can reason. Though public key encryption probably is. Encrypt/decrypt in P time (matches original input == works?). V.s. crack in NP time.
err.. rainbow tables?? Encryption with O(n ^ inf) of all 10 byte input files are pretty much constant to decrypt, even without the decryption key.
And I'm not sure what you're saying with n^1E8 . Consider what it would mean to have such a coefficient. 100 million nested loops?? Where practically speaking are you going to have that kind of coefficient in a polynomial algorithm? (I only bring it up because you mentioned practical).
The practical problem class is factorial or exponential n ^ x, which occur in combinatorial problem sets (meaning with every new element, you have to consider every existing element's permutations or combinations). Most interesting problems live here.
That being said, I've never formally studied P/NP, and personally find it a boring subject (especially given how much face time the subject gets)
Re:What would the impacts of this be for cryptogra
on
Claimed Proof That P != NP
·
· Score: 3, Interesting
Don't think this is what it means. Look at FFT (logarithmic optimization to a quadratic problem). P = NP as I understand it means that ALL NP problems have a corresponding P solution. You just have to think hard enough to find it. Proving that there are classes of NP that have no P just suggests certain crytographic algorithms MIGHT be NP. But it doesn't prove it (unless it was one of the particularly proven NP classes in this or some other paper). And even if this paper includes RSA / ECC, etc. That doesn't mean someone even more clever 30 years from now finds a flaw or special case where this isn't true and thus finds a P cracking tool.
But VM style forking requires non-trivial memory. Likewise space-time would need to reserve energy for the fork.. So you would need to contribute as much energy into a forking time-event as the extent of the fork causes deviations. But the causality issue is that you can't know a-priori how much change will be in effect, thus how much energy needs to be committed.. So the whole concept violates all sorts of principles of Science and Logic.
The only remaining two forms of 'time travel' are 1) An event that does not change the future at all (and thus non-paradoxical) - traveling backwards in time is no different than traveling left down i-95. 2) time travel is purposefully incompatible with distance locality.. Meaning I can travel back in time, but only n light-years away, thus my interference would not have a resonating paradoxical effect. This one seems the most compatible with relativity in my interpretation. It would seem that time travel requires very fast speeds, which would be inter-stellar in scale. This also is compatible with the statistical capturing of historical information. Light would be traveling to distant stars, and you could travel 'faster than light' to those stars such that you could see them before the light gets there.. The fact that you were going backwards in time to do so is almost immeasureable. Short distances would allow you to quickly see something that just happened.. While longer distances would have less and less resolution into past events.
Neither of the above suggests HOW you would do these things - just dealing with the logical consistency.
I can only hope you were being sarcastic. variable names are least parser intensive operation. Though for non lexically scoped variables (at least in perl, which I though PHP was at least loosely based on), the variable names are hash-lookups, thus long variable names have a minute incremental cost - especially in tight loops.
That aside, this isn't a rational comparison, given that php is a scripted language and java is a compiled language. So your 50 character java variable name is a 4 byte integer symbol reference at load time and execution time.
That being said, java.class files (even in version 6) are pretty startup intensive the first time. And.jsp files are doublly so, because they are compiled into java source, then compiled into.class, then finally loaded. It'll only win over a PHP compile if it's
But high-performance pages are likely raw servlets and thus pre-loaded prior to startup.. Meaning before accepting port 8080. Thus in a clustered environment with rolling updates, you never see the startup slowness. The only remaining startup slowness would be pre-jitted code (running raw interpreted-mode for the first 100 executions or so). But by run 1,000 you're likely running bare-metal assembly - depending on the nature of the servlet that is. Granted, this doesn't compensate for overly abstracted code (many of the MVC frameworks) or inefficient cluster/database code.
You're half right.. It's a pyramid scheme.. As a greater and greater fraction of GDP goes into the market, prices rise (just like the housing market). Government incentives be damned, it's the personal preference of most people that CD's at 1% to 6% is never going to let you retire.. The ONLY way to retire is with 7% to 15% returns. Likewise for pension-funds, municipals, endowment funds, etc. This only is [presumably] achievable by taking risk *cough* *cough* (better to actually invest that money in yourself and allow yourself to earn more money, but that's crazy talk).
So just like the housing market, eventually the market will peek. Most likely it'll just level out, but the lack of growth will kill the perceived future value (since there is physically no more money that can go into the aggregate market).. Then it becomes a zero-sum-game.. Sloshing funds from one stock to another.
What you're describing is useless arbitrage. It isn't win-win-win.. It's win-ops-ops.
Useful arbitrage happens with 3 or more markets. Any two individuals can reach an optimal price through direct negotiation.. Any argument that HFT increases the performance is merely describing inefficiencies in the exchange market setup (e.g. having multiple exchanges that aren't centralized).
Consider if I want stock S at price P from user U. But it happens that if I route around 3 intermediate firms, I can get a lower price. Arbitragers remove the profit margin on alternate routes, so it's never worth my while researching these paths.
The equivalent would be finding that it's cheaper for a British goat-buyer, finding a shortage of goats in the UK, but seeing a glut of goats in the US. He might simply order the goats on his UK credit card, and get charged a single exchange rate. BUT, if he does this often enough. He's better off pre-purchasing a lot of US $ - watching the market fluctuations to find optimal weekly/monthly rates. Holding large sums of USD is expensive, since this is the one and only use for it. THEN, let's say he's really dilligent. He determines that due to trade-deficits and trade-wars / import-duties, it's cheaper to buy the goats in Yen!! Or worse, to first ship the goats to Japan, then re-ship them to the UK.
These are structural innefficiencies that arbitrage can solve in a win-win situation.
A currency arbitrage buys and sells currency to the point that it's never worth it to buy in someone else's currency.
The better long term solution is to unify the exchange process.. As Europeans did with the Euro. You get rid of the middle-men entirely.
Likewise here, the only innefficiency is the alternate market-makers not having a central clearing-house.. So.. fix it. Make a central clearing house (overseen by the government).
Except that, there is no difference between what you've suggested an a market maker that simply sells a $29 stock at $29 to the guy willing to buy $30 and not have a middle-man leach $1 out of the system. There is no conceivable reason why the already electronic system can't make such matching decisions automatically instead of requiring arbitration.
I'm no financial trader, but HFT has to do with the speed of a transaction, NOT the financial analyst watching pending transactions and making auto-purchase/sell/short decisions. These two concepts are independent. I (as the speculative market-maker) can make the same transaction once / second and accomplish your liquidity goals - and most likely be MORE efficient because I can batch 1 second worth of pending requests.
The ONLY thing HFT does is let YOU be the speculator faster than your competitors.
Thus banning HFT would have ZERO effect on liquidity.
There's something to be said for 'bad' use of DVCS in a private company. But here are the good usage patterns IHMO
1) checkin after every logically complete operation (for What The Fun Just Happened moments)
2) checkin every night (so if I'm sick tomorrow people can get to my work)
3) my-code-doesn't work, collaborate with someone down the hall or geographically remotely
4) I want to experiment with an alternate code path (but don't want to deal with the politics - remeber coders have egos)
4a) I want to experiment with an alternate code path, but don't want any risk to the trunk
4b) This code is too specialized, we need a much simplified version for this use-case (but need to maintain the original code path)
5) Let's say we suck at graphics, so we outsource to a 3rd party company. How the hell does this happen with central version control. F-no do we give them direct access. And if they email us a zip file of the final product, how do we keep in sync from there-on-out? Their or our changes will get over-written or go into non-versioned-hell. With DVCS, we can provide read-only access (possibly via emailed repository-clones). DVCS allows trivial re-integration. Security is maintained, reconciliation is trivial. History and auditing is somewhat maintained (you can obviously fake it). And most importantly we can switch a new NEW 3rd party contractor at any time, possibly even AT THE SAME TIME.
6) rebase (not DVCS specific). If I've got 10 branches (in svn or any place else), do I know for sure that after a while, the history has gotten too complex? In git at least, we can say, ok, these feature-branches should all be thrown away - lets 'rebase' to produce a pristine trunk and quite literally throw away all the branches by flattening them.
7) central 'owner'/'maintainer' of a given project. Make sure someone knows everything that's going on with the project by having them integrate or 'bless' an integration. With central repositories, this requires they do the merging. with DVCS, you do the merging as a 'candidate' and they either accept it as their own or not (e.g. fast-forward merging of your repository with theirs).
7a) as with linux, for larger projects development-teams, you can have lieutenants that perform step 7 for sub-sections of the larger project. For which each lieutenant will 'trust' each other's official repository and auto-fast-forward-merge. The singular project-manager can then choose for political reasons (because we are political in nature) to disagree with lieutenants decisions - as they are they primary responsible party (at least in closed-sourced commercial solutions). This works because lieutenants can continue with their private fork until they can form a mutiny - so ego's are maintained.
Well, what I haven't heard in this thread yet is that public utilities fall into a special category called natural monopoly. Phone-[land]lines, bridges, power, train-ways, and several other things deal with natural scarce resources or access points / paths, and thus it doesn't make sense to have 1,000 companies competing to give you the next MP3 player - may the best brand win, or rather the low-production-cost supplier win. Natural monopolies do not have 'free market solutions'. They REQUIRE social intervention (who gets to own the land for the train-track?). The generally idealized natural monopoly is a heavily regulated system (they can only charge cost + 10%) and are subsidized to overcome their over-heads or fixed capital investments. Meaning Everybody pays taxes to pay for the infrustructure, then they pay cost + 10% for the marginal cost of production (of water, electricity, road maintenance, etc). Sometimes you can get away with government backed bonds that handle the capital expenditures. It's extremely anti-market. You have the producer now WANTING to be wasteful, because their '10%' gives their better total profits. Also they have little or not value in capital investments - that would generally only be useful to increase efficiency - which again, is opposing their bottom-line. The only time they might want to perform capital investment is to increase capacity.
Thus in ideal situations, you the people subsidize redundant providers, for at least they can compete for a larger share of the pie, but even then you have trivial mafia style oligopolies.
Not sure what you're saying. Why do you suggest relational models support more situations. You can not model recursive situations effectively. You can not model hierarchical data-structures - at least not ones with cycles. The join syntax itself is very verbose and when there are significant numbers of indexes, the number of permutations of possible join strategies grows exponentially (if you had 200 tables joined in a single query, with each table utilizing 4 indexes, you'd have a nearly impossible to optimize query). Yes this is an odd query, but only because RDBMS does not support this style of data-traversal - many systems would crap out at 1 to 4KB of SQL syntax. Not to mention the locking structure overhead would practically serialize access to the DB (yes, I know, why the hell would a non-transactional read cause locks.. because joins just simply suck in most RDBMS implementations).
Compare that to an OODBM like Objectivity, where joins are replaced with 64bit foreign pointers to virtual addresses in possibly alternate storage spaces. And more importantly, the SQL schema replaces a join with a single dot, which is very familiary to Object Oriented systems, including ORM layers.. So "SELECT person.father.mother.daughters[0].siblings[0].employer.company.website.contact.phone FROM person WHERE id=?" is a legitimate query. It is FAR more expressive than if each was a separate table join. And each traversal requires a potentially uncached pair of page-lookups - one for the virtual mapping table, and one for the actual disk block. Compare that to traditional indexed based foreign keys which require log_base256(n) cold disk hits per join. Both OODBMS and RDBMS support B-Tree and Hash-Map symbolic indexing (e.g. login-name, range searches, etc). But for the simple graph traversal, OODBMS is just hard to beat.
And no-sql solutions are really all about documents in general. complex data-structures which may or may not be hierarchical, yet have schema validation support (see voldemort JSON data-store, or couchDB). Depending on the schema, new inserts can extend the schema on the fly, or via DML statements, you can enforce that all NEW requests have a new schema, while leaving old records with a previously well defined schema. The document would have to retain it's schema id. Certainly an RDBMS could do this as well, though most are optimized to support highly structured rectangular data, with only the use of nullity supporting 'optional' schema additions.
I do, very much like the set-nature of RDBMS, and for large complex cross-table index based queries (e.g. I need an index from tables A, B, C that are not their primary/foreign keys), RDMS supports some pretty damn complex capabilities - I'm specifically thinking of postgres, where you can do hash-joins of 5 separate queries, each with their own index, or covering indexes where you don't even need to access the main row to get the result (which is essentially what most nosql solutions do), or synthetic-function-output indexes (indexes on the output of functions instead of the data itself) or conditional indexes, where you fine-tune which you know you'll search on (create index job_state on jobs(state) where state not in ('COMPLETED','ARCHIVED')). Most nosql solutions are no where ready to support these complex search optimizations - though things like couchDB do allow you to have lazy indexes with user-defined functions - but I think they required indexable data on every row.
But I don't see most of these advantages as being specific to RDBMS - just in their maturity. HBase, cassandra are still in their infancy - not even official releases yet from what I remember.
Hense the introduction of convention-based-configuration.. e.g. zero configuration except where it deviates from convention... In other words, field A matches a class also named A. Controller B matches view named B. HTTP form-field C matches input field C. In one sense it's magic/side-effect based coding. In another, it's intuitive programming.
I've personally taken thousand-file projects (quarter million lines of code) and only required about 1,000 lines of XML (almost all dealing with database, bootstrapping, and environment settings - for which you'd have needed more java code to have done the same thing).. And 5x as much C++ code. There are definitely polymorphic cases where this breaks down, but I've found that OO is highly over-rated - especially when dealing with database persistable objects - I've gone back and replaced OO-DB styles with switch-statement dispatches to OO data-structures.
Huh? digging ditches means you work in ... a ditch... Many of us consider dealing with the arbitrary, constantly changing short-comings / inconsistencies / poorly-thought-out OS decisions of the primary .NET platform to be like working in a ditch. And no, you can't do much useful with .mono - all the power of .NET is in it's libraries which are tightly tied to the OS of dis-choice (for that few of us).
Sorry, but we're either in disagreement, or you're not understanding.
When a Unix partition gets to the last xth percent remaining space it locks the partition down to all but root.
When a partition gets to 15% free, all sorts of monitor / alarm bells SHOULD be ringing (if you have a properly configured system).
If you get to the 50% mark, then you need to start planning ahead for an upgrade.
By over-allocating, you can do this at a group level instead of on a per-partition level.
Thus keep all partitions 15% full (without wasting 85% of disk-space - due to over-allocation).
Running out of disk space is running out of disk space - whether it's at the ext layer or LVM layer or NAS layer.. You should be monitoring and planning ahead no matter what layer it's in.
The fact that editing a pre-existing block CAN cause a failure (because of sys-admin-neglegance) is NOT a fault of the application or technology. Especially since there is no difference between it failing due to pre-existing disk-allocation v.s. appending to a file (/var/log/messages). It's the same as saying 'well, my application polled the disk-free in second one, then assumed it was safe to allocate an extra 4Meg.. But then when it came time to do so, there was no free space. waaah).
HDFS disk size is meaningless herein.
Likewise, with mysql-INNODB, I can utilize 25 4TB eSATA externally managed machines (assuming 4 2TB disks RAID-1'd together), each mapped to a 4TB block-device, which INNODB treats efficiently.. (Or if using ext4, I could use LVM to map those eSATA devices together for a general purpose disk).
I could even have LVM stripe those remote volumes to get better IOPs.
At $700 a base machine (gray boxes) (including disks), that's $17,500.
If I didn't care about random-write-speed, then I could go with RAID5. Put 3 disks in each machine and reduce costs to $15,000.
Or I could go with RAID5 on a hot-swappable 16-disk $1,000 RAID controller and reduce it down to 4 machines. Bringing the price down to $11,600.
We're assuming either with HDFS or mysql or any other app, that you build redundancy on TOP of the applications. Which is the ONLY smart thing to do with enterprise grade applications.
Failure of a disk is ASSUMED.. Meaning your $100,000 netapp WILL fail one day.. You are an IDIOT if you don't believe this.. Sure it may take 10 years. But what happens then? Replace it every 5 years? How about every 3? That's not a capital cost, that's a variable cost. Sure your data may be worth it. But as a business, can you attain the same degree of reliability cheaper? HELL YEAH. And the 3 year replacement cycle doesn't handle a power surge which blows the hardware (say a rogue UPS). Sure put redundant power supplies on isolated UPSes - ok, I'm unplugging an ethernet cable and accidentally cause an electrical surge which blows the network controller.. OPS, data is safe, no access! 2 days down-time!!
The point is RAID was invented to solve a class of problems in a cost effective way. It doesn't solve every problem, and I completely agree that solutions LIKE netapp are GREAT when you want to start medium and leave room to grow large. When you want to consolidate and dynamically repartition disk space AND spindles (e.g. vmware solutions). When you want lower maintenance costs (avoiding having to rebuild lots of regularly failing gray-boxes, constantly swapping out one of hundreds of $100 2TB disks).. BUT I submit to you that on scale, this is cheap / slave-labor. You hire high-school students to replicate base OS hard drives (with a replicating station), you buy 25% overstock of base hardware so you have fast (30 minute) build new machine / deploy into cluster, environments. You only need accuracy, you don't need intelligence. Yes, a $30k / year salary is more than the extra $30k you spend on the netapp.. But you'd have to buy dozens of netapps to really scale an enterprise solution.. And that same $30k can usually handle it.
So there is absolutely a scale region where a netapp makes sense.. But it is NOT the high end - which is what they'd like you to believe. And I submit that there are application-level redundancy solutions which are more reliable (though at the cost of semi-static configurations - which a netapp type system does provide value-add).
Uhh, what does LVM do then? Oh yeah, you OVER-ALLOCATE.. My bad. And yes, with LVM-snapshots, you very well can crash the system if free space is maxed out. I don't recall, but I believe it deletes the snapshot, but since that's a mounted file-system, it's just as bad.
There's also commercial NAS hardware which works like this. They have little green, yellow and red lights next to each physical disk.. Supposedly you should swap out a yellow or red disk with a larger one to avoid either automatically reducing RAID redundency (e.g. 2 disk redundancy reduces down to 1 disk redundancy), and then ultimately producing seek errors when no remaining physical blocks can map to a requested virtual block. I forget the name of the vendor in question, but it was far cheaper than a netapp - but really meant to sit next to your workstation (obviously).
It's not a new concept at all.
No, NFS should be doing this, that way you aren't tied to specific file-system or disk systems limitations. ;)
Haha. I call small-minded skizzies on your sir!
:)
Imagine a specialized net-appliance (screw netapp). It has 32 Gig of RAM and a 512Gig high-speed random-access SSD (where read speed is more important).
Split the 512Gig into two 256Gig portions.
The first portion contains 4 bytes of the MD5 sum of each 512B block (represents up to 32TB of block storage).
Every 2048B block being back-ground scanned for deduping does an SSD lookup against the 256G SSD hash-map which is open-chained and points back to existing 2048B blocks on disk. This lets us efficiently cross-link (reverse copy-on-write). I'd prefer 512B block boundries, but most file systems use 2048B blocks (or large) and HD's are starting to move to this to increase ECC efficiency. Plus it just reduces the overhead.
So that's just a minor optimization of whatever people have already been doing in software.. bla bla. Boring stuff, right.
For those blocks that DIDN'T match...
We do a modified version of zlib compression (which only stores 32KB worth of back-data). We extend this to store 4 Gig worth of code points (assuming a 4Byte identity prefix match and 4 Byte SSD disk block pointer). Each reference is a 256Byte block which thus supports up an 8 bite length pointer.
So now as you scan through the 2048 byte blocks being stored on disk, you do a hash-lookup of every consecutive 7 bytes. You hash the first 4 bytes and lookup in RAM. If matched, you lookup in SSD the remaining byte-string and see how many bytes of match. If more than 7, you store a disk-pos + length vector. Saving you at least 1 bytes (1 byte magic, 4 byte pointer, 1 byte length), and possibly the entire 256 bytes. If you can compress to one of 50% or 75%, you store at 1024 bytes or 512bytes.. As soon as you reach either of these two boundries, you stop compressing. Though this does assume you're not using 2048B boundry HDs. You then store into one of 2 special areas on disk that are 1/2 and 1/4 block compressed.
So this solves highly compressible but single byte-offset situations.. e.g. I copy sections of source-code (at least 512 bytes) and paste them either into the same file or some other file.
So long as you don't pull out the HD, the ref-map in SSD matches the previous runs on disk, so you don't have to do random disk-seeks to reconstruct the blocks. So now reading highly compressed blocks not only reduces the number of bytes read from physical disk, it increases the ratio of SSD to HD reads.
I'm only joking of course. But not really.. You hiring netapp??
This thread seems to be getting too defocused from reality.
Here's the rub.
Checksumming == good. All else being equal, we should have more of it.
But checksumming is expensive (adds latency to your write).
So once you have it, might as well use it.
Background thread can compare checksums of blocks as starting points to identifying identical blocks (since checksum collisions are more than possible, they're only a matter of time - I see colliding MD5 sums all the time in BackupPC - you can tell because they append a semi-colon + sequence ID to the file-name to disambiguate).
As some thread posters have listed - file-names prevent entire files from being block-shared.. Rubbish. File-names in Unix file systems have never been coupled with file-metadata. Files are identified by inode numbers, not file-names.. file-names are meta-data stored in directory files (which is why hard links are possible). Now unless you have noatime in your mount options, replicating inode descriptors will be nearly impossible, but that should only be a small fraction of your disk blocks anyway.
Historically, the main way you'd leverage shared blocks is through snapshot images - which all use copy-on-write. LVM and netapp and I'm sure dozens of other vendors supply this because it's trivial to do.
All this is really likely doing is extending the existing SNAPSHOT copy-on-write logic to merge blocks from different file-systems (which snapshots technically are) AND from within the same file-system. And most likely done through block-level checksum comparisons. Though since ext and many other file-systems don't naturally support check-sums at the block level, I doubt this is leveraging file-system level operations.
BigTable scales pretty well (go read it's white-papers) - though perhaps not as efficiently as map-reduce for something as simple as text to keyword statistics (otherwise why wouldn't they have used it all along).
I'll caveat this whole post with - this is all based on my reading of the BigTable white-paper a year ago, but having played with Cassandra, Hadoop, etc occasionally since then. Feel free to call me out on any obvious errors. I've also looked at a lot of DB internals (Sybase, Mysql MyISAM/INNODB and postgresql).
What I think you're thinking is that in a traditional RDBMS (which they hint at), you have a single logical machine that holds your data.. That's not entirely true, because even with mysql, you can shard the F*K out of it. Consider putting a mysql server on every possible combination of the first two letters of a google-search. Then take high density combinations (like those beginning with s) and split it out 3, 4 or 5 ways.
There are drastic differences to how data is stored, but that's not strictly important - because there are column-oriented table stores in mysql and other RDBMS systems. But the key problem of sharding is what's focused on Mysql-NDB-Cluster (which is a primitive key-value store) and other distributed-DB technologies that best traditional DBs at scalability.
BUT, the fundamental problem that page-searches are dealing with is that I want a keyword to map to a page-view-list (along with meta-data such as first-paragraph / icon / etc) that is POPULATED from statistical analysis of ALL page-centric data. Meaning you have two [shardable] primary keys. One is a keyword and One is a web-page-URL. But the web-page table has essentially foreign keys into potentially THOUSANDS of keyword records and visa-versa. Thus a single web-page update would require thousands of locks.
In map-reduce, we avoid the problem. We start off with page-text, mapped to keywords with some initial meta-data about the parent-page. In the reduce phase, we consolidate (via a merge-sort) into just the keywords, grouping the web pages into ever more complete lists of pages (ranked by their original meta-data - which includes co-keywords). In the end, you have a maximally compact index file, which you can replicate to the world using traditional BigTable (or even big-iron if you really wanted).
The problem of course, was that you can't complete the reduce phase until all web pages are fully downloaded and scanned.. ALL web pages. Of course, you do an hourly job which takes only high-valued web-pages and merges with the previous master list. So you have essentially static pre-processed data which is over-written by a subset of fresh data.. But you still have slowest-web-page syndrome. Ok, so solve this problem by ignoring web-load requests that don't complete in time - they'll be used in the next update round.. Well, you still have the issue of massive web-pages that take a long time to process. Ok, so we'll have a cut-off for them too.. Mapping nodes which take too long, don't get included this round (you're merging against you last valid value - so if there isn't a newer version, the old one will naturally keep). But the merge-sort itself is still MASSIVELY slow. You can't get 2-second turn-around on high-importance web-sites. You're still building a COMPLETE index every time.
So now, with a 'specialized' GFS2 and specialized BigTable, either or both with new fangled 'triggers', we have the tools (presumably) to do real-time updates. A Page load updates its DB table meta-data. It see's it went up in ranking, so it triggers a call to modify the associated keyword's table (a thousand of them). Those keywords have some sort of batch-delay (of say 2 seconds) so that it minimizes the number of pushes to production read-servers.. So now we have an event queue processor on the keyword table. This is a batch processor, BUT, we don't necessarily have to drain the queue before pushing to production. We only accept as many requests as we can fit into a 2 second time-slice. Presumably
"Will that still be your position if they win in November? After all, you're on a threat about hosting a counter rally Colbert-esque. And promoting the fact that it's all an act even by the right, despite the real ramifications, and call that a joke."
The joke I think the parent is referring to is about the Republican's leading them to think they are supporting the causes of the tea party. That the exact same people that spent us into oblivion are going to have their interests at heart. When all most people in power care about is appeasing their biggest doners - sure they have a say about which doners they want to pick - it's called picking a party. Now there are some legitimate tea party people.. People that I wouldn't trust to run a 7-eleven. Those I would truely and honestly believe feel the need to minimize taxation. Because.. well, it only takes about 50 IQ to figure out that the government forces you to pay taxes, and that in doing so it costs YOU hard-earned money. Easy fix.. Run for congress and don't do it anymore.. problem solved.. YEAH, if only thousands of years of governance could have figured that out!!! I applaud the tea-party movement for it's innovation.
Oh, by the way, the dumb f*ks don't even know (or acknowledge) what the tea party was.. It was NOT a fight for anti-taxation.. It was a fight against being a second class citizen - where taxation occurred without any local benefit, nor with any autonomy. Sure, once established, US congress had minimal taxation (predominantly on foreign trade because people didn't understand the ramifications of doing so at the time). But it was also a time when for the next 160 years we would not have a standing navy/army to fund. Nor an interstate road to maintain (for the same standing army).
now now now. You see a lot of depictions of ill-informed tea party goers, but this is hardly representative. Many honestly want too privatize S.S. (essentially undoing the social safety net because they, their friends, and those they care about already have their golden parachutes). Many believe military spending is the only legitimate use of taxation. The ones you CAN throw popo at are the ones that say the best way to cut our deficit and taxation is by reducing foreign aid and by reducing pork-spending (which collectively is like 1.5%). Which is the majority of American's polled - not just lipton something for nothings.
Then I guess you can include the white-supremisists (disguised as American values people, or old people), the anti immigration people, the America-firsters, and my personal favorite, the God-wants-us-to-winn ers. Never mind the sage advice to hope to be on God's side.
Just reread the definition of P=NP (been a while).. Guess FFT isn't a good example. There's no P verify and NP answer aspect of the FT.
But then again, traveling salesman problem (minimum path) isn't P to verify as far as I can reason. Though public key encryption probably is. Encrypt/decrypt in P time (matches original input == works?). V.s. crack in NP time.
Naah.. Just pass the responsibility over to the hardware guys.. That's what cell phone dudes have been doing for a decade. ;)
err.. rainbow tables?? Encryption with O(n ^ inf) of all 10 byte input files are pretty much constant to decrypt, even without the decryption key.
And I'm not sure what you're saying with n^1E8 . Consider what it would mean to have such a coefficient. 100 million nested loops?? Where practically speaking are you going to have that kind of coefficient in a polynomial algorithm? (I only bring it up because you mentioned practical).
The practical problem class is factorial or exponential n ^ x, which occur in combinatorial problem sets (meaning with every new element, you have to consider every existing element's permutations or combinations). Most interesting problems live here.
That being said, I've never formally studied P/NP, and personally find it a boring subject (especially given how much face time the subject gets)
Don't think this is what it means. Look at FFT (logarithmic optimization to a quadratic problem). P = NP as I understand it means that ALL NP problems have a corresponding P solution. You just have to think hard enough to find it. Proving that there are classes of NP that have no P just suggests certain crytographic algorithms MIGHT be NP. But it doesn't prove it (unless it was one of the particularly proven NP classes in this or some other paper). And even if this paper includes RSA / ECC, etc. That doesn't mean someone even more clever 30 years from now finds a flaw or special case where this isn't true and thus finds a P cracking tool.
But VM style forking requires non-trivial memory. Likewise space-time would need to reserve energy for the fork.. So you would need to contribute as much energy into a forking time-event as the extent of the fork causes deviations. But the causality issue is that you can't know a-priori how much change will be in effect, thus how much energy needs to be committed.. So the whole concept violates all sorts of principles of Science and Logic.
The only remaining two forms of 'time travel' are
1) An event that does not change the future at all (and thus non-paradoxical) - traveling backwards in time is no different than traveling left down i-95.
2) time travel is purposefully incompatible with distance locality.. Meaning I can travel back in time, but only n light-years away, thus my interference would not have a resonating paradoxical effect. This one seems the most compatible with relativity in my interpretation. It would seem that time travel requires very fast speeds, which would be inter-stellar in scale. This also is compatible with the statistical capturing of historical information. Light would be traveling to distant stars, and you could travel 'faster than light' to those stars such that you could see them before the light gets there.. The fact that you were going backwards in time to do so is almost immeasureable. Short distances would allow you to quickly see something that just happened.. While longer distances would have less and less resolution into past events.
Neither of the above suggests HOW you would do these things - just dealing with the logical consistency.
I can only hope you were being sarcastic. variable names are least parser intensive operation. Though for non lexically scoped variables (at least in perl, which I though PHP was at least loosely based on), the variable names are hash-lookups, thus long variable names have a minute incremental cost - especially in tight loops.
.class files (even in version 6) are pretty startup intensive the first time. And .jsp files are doublly so, because they are compiled into java source, then compiled into .class, then finally loaded. It'll only win over a PHP compile if it's
That aside, this isn't a rational comparison, given that php is a scripted language and java is a compiled language. So your 50 character java variable name is a 4 byte integer symbol reference at load time and execution time.
That being said, java
But high-performance pages are likely raw servlets and thus pre-loaded prior to startup.. Meaning before accepting port 8080. Thus in a clustered environment with rolling updates, you never see the startup slowness. The only remaining startup slowness would be pre-jitted code (running raw interpreted-mode for the first 100 executions or so). But by run 1,000 you're likely running bare-metal assembly - depending on the nature of the servlet that is. Granted, this doesn't compensate for overly abstracted code (many of the MVC frameworks) or inefficient cluster/database code.
Wow, that's misleading.
March '05, google's PE was 87 and growing.
June '01 MSFT PE was 61
June '02 BP was 25
Just before the housing crash and the most recent correction, all these companies were at nearly 2x their current PE.
You're half right.. It's a pyramid scheme.. As a greater and greater fraction of GDP goes into the market, prices rise (just like the housing market). Government incentives be damned, it's the personal preference of most people that CD's at 1% to 6% is never going to let you retire.. The ONLY way to retire is with 7% to 15% returns. Likewise for pension-funds, municipals, endowment funds, etc. This only is [presumably] achievable by taking risk *cough* *cough* (better to actually invest that money in yourself and allow yourself to earn more money, but that's crazy talk).
So just like the housing market, eventually the market will peek. Most likely it'll just level out, but the lack of growth will kill the perceived future value (since there is physically no more money that can go into the aggregate market).. Then it becomes a zero-sum-game.. Sloshing funds from one stock to another.
What you're describing is useless arbitrage. It isn't win-win-win.. It's win-ops-ops.
Useful arbitrage happens with 3 or more markets. Any two individuals can reach an optimal price through direct negotiation.. Any argument that HFT increases the performance is merely describing inefficiencies in the exchange market setup (e.g. having multiple exchanges that aren't centralized).
Consider if I want stock S at price P from user U. But it happens that if I route around 3 intermediate firms, I can get a lower price. Arbitragers remove the profit margin on alternate routes, so it's never worth my while researching these paths.
The equivalent would be finding that it's cheaper for a British goat-buyer, finding a shortage of goats in the UK, but seeing a glut of goats in the US. He might simply order the goats on his UK credit card, and get charged a single exchange rate. BUT, if he does this often enough. He's better off pre-purchasing a lot of US $ - watching the market fluctuations to find optimal weekly/monthly rates. Holding large sums of USD is expensive, since this is the one and only use for it. THEN, let's say he's really dilligent. He determines that due to trade-deficits and trade-wars / import-duties, it's cheaper to buy the goats in Yen!! Or worse, to first ship the goats to Japan, then re-ship them to the UK.
These are structural innefficiencies that arbitrage can solve in a win-win situation.
A currency arbitrage buys and sells currency to the point that it's never worth it to buy in someone else's currency.
The better long term solution is to unify the exchange process.. As Europeans did with the Euro. You get rid of the middle-men entirely.
Likewise here, the only innefficiency is the alternate market-makers not having a central clearing-house.. So.. fix it. Make a central clearing house (overseen by the government).
Except that, there is no difference between what you've suggested an a market maker that simply sells a $29 stock at $29 to the guy willing to buy $30 and not have a middle-man leach $1 out of the system. There is no conceivable reason why the already electronic system can't make such matching decisions automatically instead of requiring arbitration.
I'm no financial trader, but HFT has to do with the speed of a transaction, NOT the financial analyst watching pending transactions and making auto-purchase/sell/short decisions. These two concepts are independent. I (as the speculative market-maker) can make the same transaction once / second and accomplish your liquidity goals - and most likely be MORE efficient because I can batch 1 second worth of pending requests.
The ONLY thing HFT does is let YOU be the speculator faster than your competitors.
Thus banning HFT would have ZERO effect on liquidity.