Domain: apache.org
Stories and comments across the archive that link to apache.org.
Comments · 2,937
-
Re:What Everyone is Missing
Hadoop is significantly more disruptive than Yet Another Storage box. It also scales larger and was written in less time than the 10 years it took Bonwick's team to build ZFS.
-
Use JOpenDocument or ApachePOI
It is a huge convenience if you are able to process data from documents like spreadsheets in code you write.
I agree. The problem is there really is no working "ODF Toolkit". It's vaporware. Sun and IBM have been promising an odf toolkit since 2006, but to date nothing of any use has been produced. The current "ODF Toolkit" has virtually no documentation or example code, and is generally useless for importing data from an openoffice.org spreadsheet into a java program. If readers here don't believe me, they can go ahead and try it for themselves. The best thing available for odf handling in java is JOpenDocument. Hopefully the "new and improved" odf toolkit project is now working with the JOpenDocument developers.
I don't know if they are, because I gave up waiting on Sun and IBM and decided to use the Apache POI libraries to read and write excel spreadsheets that can be created/opened by either MSOffice or OpenOffice.org. -
Re:Windows Azure Offers Developers Iron-Clad Lock-
Please enlighten us on exactly how it has been reimplemented as open source?
All of your storage is still done using Google's Big Table and the GQL query language. If you can find me the source for Big Table, please show it to me.
I didn't say "released", I said "reimplemented".
And Hadoop has done exactly that.
I have no idea how compatible they are -- I see lots of talk of reimplementing GQL, and no actual mention of an implementation -- but the speed with which Appdrop was released certainly gives the lie to AppEngine's "lock-in". If you really need something Google is provided, there's a very good chance you can reimplement it quickly using what's already out there.
-
Re:What is this anyway?
Of course since once you've chosen a MQ and adapted all your applications to use it you're basically tied forever and ever to your MQ vendor who hold you by the balls and can continuously rape you over and over with astronomical maintenance fee since you now have a single coordinated point of failure that can and will eventually take everything down at some point.
While it's still in incubation, Apache's qpid, might be worth a look. If it sucks now, throw a developer to help with it, and you'll have a sustainable MQ for well.. Your applications whole life-cycle.
Apache already has ActiveMQ which AFAIK is not AMQP but something still. Perhaps you could abstract the use of either in your architecture, and jump over to AMQP when you feel like it.
-
Re:What is this anyway?
Of course since once you've chosen a MQ and adapted all your applications to use it you're basically tied forever and ever to your MQ vendor who hold you by the balls and can continuously rape you over and over with astronomical maintenance fee since you now have a single coordinated point of failure that can and will eventually take everything down at some point.
While it's still in incubation, Apache's qpid, might be worth a look. If it sucks now, throw a developer to help with it, and you'll have a sustainable MQ for well.. Your applications whole life-cycle.
Apache already has ActiveMQ which AFAIK is not AMQP but something still. Perhaps you could abstract the use of either in your architecture, and jump over to AMQP when you feel like it.
-
F/OSS BPMs
There's also Apache ODE.
-
Derby
Sun doesn't, but if you live in the Java world have you looked at Derby recently? We started out using it as an authentication database embedded in an app, and are now making more and more use of it. It supports transactions and hundreds of simultaneous connections, has very flexible configuration, and supports up to about 50Gbytes of storage. The last alone makes it more useful in many applications than the free versions of MS SQL Server. There are many applications currently running on MySQL which (in my opinion) would benefit from migrating to a tightly coupled all-Java solution. The Derby footprint is tiny, database backup and failover is now supported, and you can work with anything from the command line tool to the usual studio type applications. It has taken me 4 years to become a convert, after 8 years of MySQL, but now in the latest release I love it.
-
For young kids, use a whitelist
For younger kids, you can just use some sort of whitelist. I use spamassassin but there are many ways to implement it. It's obviously not foolproof, but it works pretty damn well. You can even set it up so that your KIDS can edit the whitelist.
-
Re:Well, since I develop trading systems on FOSS
Full disclosure - I am a founder of a startup that develops an open source automated trading platform targeted at institutional investors.
As was mentioned in above postings, there are a series of open source tools available to bootstrap your trading system development:
- QuickFIX and QuickFIX/J (I'm also a developer of QFJ project) - a C++ and Java open source implementations of the FIX protocol, the underlying standard protocol for connectivity between financial institutions. Think of it as the HTTP of finance.
- QuantLib - an open source risk analytics package
- Esper - an open source complex event processing engine
- EclipseTrader - Eclipse-based open source trading GUI that's targeted more at retail investors
- ActiveMQ and AMQP and Qpid for messaging (AMQP standard was initially contributed by JPMorgan)
And then of course there's my company Marketcetera - we build on top of a lot of the tools mentioned above and others (ActiveMQ, MySQL, Ruby on Rails, QFJ, etc) to provide the basic underlying platform that institutional traders (think quantitative hedge funds) can use to build their proprietary algorithms and start trading. After implementing a few trading systems in a row ourselves for various trading firms we realized that there was an obvious need for an open source trading platform so that people wouldn't have to reinvent the wheel and write systems from scratch every single time.
To answer the OP's question about which commercial firms use FOSS: - a lot of proprietary trading software is implemented on top of OSS - JPMorgan famously built their trading GUI [PDF] on top of Eclipse, and Progress Apama is built on top of Eclipse RCP as well.
Not surprisingly, most trading applications are very Windows-heavy (although quite a few companies have Linux clusters, and some exchanges run on Linux as well). Most of the apps that your broker will provide for you to trade with are Windows-only (such as Bloomberg, Goldman Redi, MicroHedge, etc), and a lot of the APIs available from vendors are
.NET or COM components and nothing else. We implement our systems mostly in Java (including the Eclipse RCP), thought have connectors for some of the Windows-specific components.We know that flexibility is at the heart of any powerful trading application, and we think the open-source model maximizes the ability of our users to control the application. Some think the open-source model is antithetical to the secretive finance industry, but we see it as the perfect fit.
-
Re:Well, since I develop trading systems on FOSS
Full disclosure - I am a founder of a startup that develops an open source automated trading platform targeted at institutional investors.
As was mentioned in above postings, there are a series of open source tools available to bootstrap your trading system development:
- QuickFIX and QuickFIX/J (I'm also a developer of QFJ project) - a C++ and Java open source implementations of the FIX protocol, the underlying standard protocol for connectivity between financial institutions. Think of it as the HTTP of finance.
- QuantLib - an open source risk analytics package
- Esper - an open source complex event processing engine
- EclipseTrader - Eclipse-based open source trading GUI that's targeted more at retail investors
- ActiveMQ and AMQP and Qpid for messaging (AMQP standard was initially contributed by JPMorgan)
And then of course there's my company Marketcetera - we build on top of a lot of the tools mentioned above and others (ActiveMQ, MySQL, Ruby on Rails, QFJ, etc) to provide the basic underlying platform that institutional traders (think quantitative hedge funds) can use to build their proprietary algorithms and start trading. After implementing a few trading systems in a row ourselves for various trading firms we realized that there was an obvious need for an open source trading platform so that people wouldn't have to reinvent the wheel and write systems from scratch every single time.
To answer the OP's question about which commercial firms use FOSS: - a lot of proprietary trading software is implemented on top of OSS - JPMorgan famously built their trading GUI [PDF] on top of Eclipse, and Progress Apama is built on top of Eclipse RCP as well.
Not surprisingly, most trading applications are very Windows-heavy (although quite a few companies have Linux clusters, and some exchanges run on Linux as well). Most of the apps that your broker will provide for you to trade with are Windows-only (such as Bloomberg, Goldman Redi, MicroHedge, etc), and a lot of the APIs available from vendors are
.NET or COM components and nothing else. We implement our systems mostly in Java (including the Eclipse RCP), thought have connectors for some of the Windows-specific components.We know that flexibility is at the heart of any powerful trading application, and we think the open-source model maximizes the ability of our users to control the application. Some think the open-source model is antithetical to the secretive finance industry, but we see it as the perfect fit.
-
Re:HP
on top of that if they would redo ssl so thatyou can support host headers that would allow allot of consolidation of webservices/sites by farm hosters..
That would be RFC 2817, which Apache already supports since version 2.2. Unfortunately, this is unsupported in most browsers.
-
I don't get it
You've been using apache and mysql for years, yet you've been looking for a "real" reason to use linux for years? You had a couple good ones right there, man...
As for the phrase "virtual webserver on your desktop": That means nothing. You want a real (ie, "non-virtual") webserver, because you want to serve files. You don't want a webserver anywhere near your desktop, as apache's not a GUI application. So I suppose the answer is "No, you have to have a real webserver, but you can probably access it via a desktop icon or whatever".
Take a look here: http://httpd.apache.org/docs/2.2/vhosts/
That will show you how to set up a thing called "virtual hosts", which is pretty close to what I think you want. In a nutshell, you can have multiple document roots with various versions of a web site all served by the same apache daemon. You should be able to set it up so that, using
.htacess files, you can test all sorts of server directives, play with mod_rewrite, etc without affecting the other virtual hosts. (Note the "virtual" there: That's apache basically simulating you having multiple servers, when in fact you only have one. That may be close to what you meant.)You can also set up multiple instances of apache on your machine. Just have them all listen on a different port. The effect is the same, mostly, as having one server with many vhosts.
-B
-
Re:I beg to disagree
This is excellent advice.
I'd supplement it to say (mostly in agreement):
After your core java stuff, get some experience with Spring and Hibernate.
EVERYTHING nowadays uses Spring and Hibernate, in my experience (including our stuff).
Get familiar with the libraries and helpers available on Apache Commons and some of the ex-projects on Apache Jakarta.
In Commons, particularly pay attention to stuff in the easy to miss Lang area, which has huge amounts of handy utilities and helpers.
If you're doing web apps on Java, you'll have to pick a web framework (ick). I cant help much there, as I hate them all equally.
JSP is the devil.
-
Re:I beg to disagree
This is excellent advice.
I'd supplement it to say (mostly in agreement):
After your core java stuff, get some experience with Spring and Hibernate.
EVERYTHING nowadays uses Spring and Hibernate, in my experience (including our stuff).
Get familiar with the libraries and helpers available on Apache Commons and some of the ex-projects on Apache Jakarta.
In Commons, particularly pay attention to stuff in the easy to miss Lang area, which has huge amounts of handy utilities and helpers.
If you're doing web apps on Java, you'll have to pick a web framework (ick). I cant help much there, as I hate them all equally.
JSP is the devil.
-
Generic Non-Crap List
Considering that Java has been (probably) the most used language for a while, you get a lot of crap. So, here's my "crap filter" list of what you should learn to really hop into the JVM ecosystem.
Books:
1. Effective Java, 2nd edition, by Josh Bloch
This covers most of the twists and turns of the basics that an experienced programmer would need. I wouldn't worry about getting a simpler book.
2. Java Concurrency in Practice
Understanding the JVM model of concurrency is important, and this is the only guide that had a pretty in-depth look into the subject. The Sun documentation absolutely sucks at covering concurrency.
APIs
1. Guice http://code.google.com/p/google-guice/
Dependency injection is the most recent thing that makes Java a very powerful language for building large appications. And Guice is by far the best implementation of DI. (Yeah, you could learn Spring, but I just don't care for it.)
2. Hibernate http://hibernate.org/
I hate Hibernate. But it basically set the standard for EJB3. If you know Hibernate, it's not a very hard road to learn all the other "enterprise" crap.
On the other hand, any substantial server-based solution probably uses a ORM solution like Hibernate.
3. Apache's Commons http://commons.apache.org/ and Jakarta http://jakarta.apache.org/
There is a ton of projects under the Jakarta umbrella these days. The first one to try out is the commons-lang libraries, which provide very easy to use toString. equals, and hashCode implementations that are 'good enough' 99% of the time. Why do you need those? Read Effective Java.
:)Interesting stuff:
1. Hadoop http://hadoop.apache.org/
Hadoop is an open-source implementation of Google's MapReduce idea.
2. Scala http://scala-lang.org/
Scala is my favorite "non-Java" JVM language by far. For me, the scala interpreter is how I learn APIs. In fact, most of my new code is in Scala, not Java.
3. Groovy, JRuby
Just some more used non-Java JVM languages. I've used JRuby a bit, but have moved on to Scala. It's still a significant project, however.
4. Web application frameworks: Wicket http://wicket.apache.org/ + Databinder http://databinder.net/
Wicket is the simplest page-based Web framework I've ever used. I just find it easier to navigate than Rails. If you really want an ORM-based solution, go for the Databinder extensions. Databinder will get you coding in a couple of minutes.
5. Restlet http://restlet.org/
We have several different clusters, and a bunch of machines that need to transfer data around. I learned how to set up a restlet server that was integrated with Guice in a couple of hours, and now, have a very easy means to script together many different servers.
-
Generic Non-Crap List
Considering that Java has been (probably) the most used language for a while, you get a lot of crap. So, here's my "crap filter" list of what you should learn to really hop into the JVM ecosystem.
Books:
1. Effective Java, 2nd edition, by Josh Bloch
This covers most of the twists and turns of the basics that an experienced programmer would need. I wouldn't worry about getting a simpler book.
2. Java Concurrency in Practice
Understanding the JVM model of concurrency is important, and this is the only guide that had a pretty in-depth look into the subject. The Sun documentation absolutely sucks at covering concurrency.
APIs
1. Guice http://code.google.com/p/google-guice/
Dependency injection is the most recent thing that makes Java a very powerful language for building large appications. And Guice is by far the best implementation of DI. (Yeah, you could learn Spring, but I just don't care for it.)
2. Hibernate http://hibernate.org/
I hate Hibernate. But it basically set the standard for EJB3. If you know Hibernate, it's not a very hard road to learn all the other "enterprise" crap.
On the other hand, any substantial server-based solution probably uses a ORM solution like Hibernate.
3. Apache's Commons http://commons.apache.org/ and Jakarta http://jakarta.apache.org/
There is a ton of projects under the Jakarta umbrella these days. The first one to try out is the commons-lang libraries, which provide very easy to use toString. equals, and hashCode implementations that are 'good enough' 99% of the time. Why do you need those? Read Effective Java.
:)Interesting stuff:
1. Hadoop http://hadoop.apache.org/
Hadoop is an open-source implementation of Google's MapReduce idea.
2. Scala http://scala-lang.org/
Scala is my favorite "non-Java" JVM language by far. For me, the scala interpreter is how I learn APIs. In fact, most of my new code is in Scala, not Java.
3. Groovy, JRuby
Just some more used non-Java JVM languages. I've used JRuby a bit, but have moved on to Scala. It's still a significant project, however.
4. Web application frameworks: Wicket http://wicket.apache.org/ + Databinder http://databinder.net/
Wicket is the simplest page-based Web framework I've ever used. I just find it easier to navigate than Rails. If you really want an ORM-based solution, go for the Databinder extensions. Databinder will get you coding in a couple of minutes.
5. Restlet http://restlet.org/
We have several different clusters, and a bunch of machines that need to transfer data around. I learned how to set up a restlet server that was integrated with Guice in a couple of hours, and now, have a very easy means to script together many different servers.
-
Generic Non-Crap List
Considering that Java has been (probably) the most used language for a while, you get a lot of crap. So, here's my "crap filter" list of what you should learn to really hop into the JVM ecosystem.
Books:
1. Effective Java, 2nd edition, by Josh Bloch
This covers most of the twists and turns of the basics that an experienced programmer would need. I wouldn't worry about getting a simpler book.
2. Java Concurrency in Practice
Understanding the JVM model of concurrency is important, and this is the only guide that had a pretty in-depth look into the subject. The Sun documentation absolutely sucks at covering concurrency.
APIs
1. Guice http://code.google.com/p/google-guice/
Dependency injection is the most recent thing that makes Java a very powerful language for building large appications. And Guice is by far the best implementation of DI. (Yeah, you could learn Spring, but I just don't care for it.)
2. Hibernate http://hibernate.org/
I hate Hibernate. But it basically set the standard for EJB3. If you know Hibernate, it's not a very hard road to learn all the other "enterprise" crap.
On the other hand, any substantial server-based solution probably uses a ORM solution like Hibernate.
3. Apache's Commons http://commons.apache.org/ and Jakarta http://jakarta.apache.org/
There is a ton of projects under the Jakarta umbrella these days. The first one to try out is the commons-lang libraries, which provide very easy to use toString. equals, and hashCode implementations that are 'good enough' 99% of the time. Why do you need those? Read Effective Java.
:)Interesting stuff:
1. Hadoop http://hadoop.apache.org/
Hadoop is an open-source implementation of Google's MapReduce idea.
2. Scala http://scala-lang.org/
Scala is my favorite "non-Java" JVM language by far. For me, the scala interpreter is how I learn APIs. In fact, most of my new code is in Scala, not Java.
3. Groovy, JRuby
Just some more used non-Java JVM languages. I've used JRuby a bit, but have moved on to Scala. It's still a significant project, however.
4. Web application frameworks: Wicket http://wicket.apache.org/ + Databinder http://databinder.net/
Wicket is the simplest page-based Web framework I've ever used. I just find it easier to navigate than Rails. If you really want an ORM-based solution, go for the Databinder extensions. Databinder will get you coding in a couple of minutes.
5. Restlet http://restlet.org/
We have several different clusters, and a bunch of machines that need to transfer data around. I learned how to set up a restlet server that was integrated with Guice in a couple of hours, and now, have a very easy means to script together many different servers.
-
Generic Non-Crap List
Considering that Java has been (probably) the most used language for a while, you get a lot of crap. So, here's my "crap filter" list of what you should learn to really hop into the JVM ecosystem.
Books:
1. Effective Java, 2nd edition, by Josh Bloch
This covers most of the twists and turns of the basics that an experienced programmer would need. I wouldn't worry about getting a simpler book.
2. Java Concurrency in Practice
Understanding the JVM model of concurrency is important, and this is the only guide that had a pretty in-depth look into the subject. The Sun documentation absolutely sucks at covering concurrency.
APIs
1. Guice http://code.google.com/p/google-guice/
Dependency injection is the most recent thing that makes Java a very powerful language for building large appications. And Guice is by far the best implementation of DI. (Yeah, you could learn Spring, but I just don't care for it.)
2. Hibernate http://hibernate.org/
I hate Hibernate. But it basically set the standard for EJB3. If you know Hibernate, it's not a very hard road to learn all the other "enterprise" crap.
On the other hand, any substantial server-based solution probably uses a ORM solution like Hibernate.
3. Apache's Commons http://commons.apache.org/ and Jakarta http://jakarta.apache.org/
There is a ton of projects under the Jakarta umbrella these days. The first one to try out is the commons-lang libraries, which provide very easy to use toString. equals, and hashCode implementations that are 'good enough' 99% of the time. Why do you need those? Read Effective Java.
:)Interesting stuff:
1. Hadoop http://hadoop.apache.org/
Hadoop is an open-source implementation of Google's MapReduce idea.
2. Scala http://scala-lang.org/
Scala is my favorite "non-Java" JVM language by far. For me, the scala interpreter is how I learn APIs. In fact, most of my new code is in Scala, not Java.
3. Groovy, JRuby
Just some more used non-Java JVM languages. I've used JRuby a bit, but have moved on to Scala. It's still a significant project, however.
4. Web application frameworks: Wicket http://wicket.apache.org/ + Databinder http://databinder.net/
Wicket is the simplest page-based Web framework I've ever used. I just find it easier to navigate than Rails. If you really want an ORM-based solution, go for the Databinder extensions. Databinder will get you coding in a couple of minutes.
5. Restlet http://restlet.org/
We have several different clusters, and a bunch of machines that need to transfer data around. I learned how to set up a restlet server that was integrated with Guice in a couple of hours, and now, have a very easy means to script together many different servers.
-
Re:Bollocks.
you win the hairsplitter of the year prize.
Hairsplitter of the year? For using the word "access" in the exact and pretty much only meaning it has on the internet? To be able to access a resource is to be able to download it. "Access control" does not prevent you from rendering content you've downloaded (accessed), it prevents you from downloading (accessing) it in the first place. See, for example Authentication, Authorization and Access Control. Notice the word "access"?
How the fuck is using a word exactly as it is intended, and exactly as it is widely understood by pretty much everyone who has anything to do with internet technologies, hairsplitting?
My Linux PC can access all of the web[0]. Anything that is on the web, including flash files, I can access.
What I do with that access is my own business. I might want to md5 the content in order to write it's fingerprint to a database. I might want to try to reverse-engineer the content in order to write my own flash player. I might want to attach it to an email to send to someone else.
Like my PC, the iPhone can access the content and could do any those things I listed above. (Or it could if md5ers and hex editors were available for it). That it can't render proprietary content which is released by proprietary ISVs who have a terrible record of supporting more than a single platform well should be a surprise to no one.
I'm not an Apple fan. I own none of their products. But I do think that them getting slapped down for Adobe not supporting their platform is a bit fucking harsh.
[0] Well, all of the web I am authorised to access.
-
Re:Um, first question: WTF is MapReduce?
Map-Reduce is definitely a technique related to grid computing, but they are not one and the same.
The most popular (to my knowledge) open source Java library implementing MR is Hadoop.
Here's the algorithm in a nutshell (anyone who knows more than me, please correct, and I'll be forever grateful). I have a bunch of documents and I want to generate a list of word counts. So I begin with the first document and map each word in the document to the value 1. I return each mapping as I do it, and it is merge-sorted by key into a map. Let's say I start with a document of a single sentence: John likes Sue, but Sue doesn't like John. At the end of the map phase, I have compiled the following map, sorted by key:
- but - 1
- doesn't - 1
- like - 1
- likes - 1
- John - 1
- John - 1
- Sue - 1
- Sue - 1
Now begins the reduce phase. Since the map is sorted by key, all the reduce phase does is iterate through the keys and add up the associated values until a new key is encountered. The result is:
- but - 1
- doesn't - 1
- like - 1
- likes - 1
- John - 2
- Sue - 2
Simple. Stupid. What's the point? The point is that the way this algorithm divides up the work happens to be extremely convenient for parallel processing. So, the map phase of a single document can be split up and farmed out to different nodes in the grid for processing, which can be processed separately from the reduce phase. The merge-sort can even be done at a different processing node as mappings are returned. Redundancy can be achieved if the same document chunk is farmed out to several nodes for simultaneous processing, and the first one that returns the result is used, the others simply ignored or canceled (maybe they're queued up at redundant nodes that were busy, so canceling means simply removing from the queue with very few cycles wasted). Similarly, because the resulting map is sorted by key, an extremely large map can easily be split and sent to several processing nodes in parallel. The original task of counting words across a set of documents can be decomposed to an ridiculous extent for parallelization.
Of course, this doesn't make much sense to actually do this unless you have a very large number of documents. Or, let's say you have a lot of computing resources, but each resource on its own is very limited in terms of processing power. Or both.
This is very close to the problem a company like Google has to solve when indexing the web. The number of documents is huge (every web page), and they don't have any super computers—just a whole ton of cheap, old CPUs in racks.
At the end of the day, Map-Reduce is only useful for tasks that can be decomposed, though. If you have a problem with separate phases, where the input of each phase is determined by the output of the previous phase, then they must be executed serially and Map-Reduce can't help you. If you consider the word-counting example I posted above, it's easy to see that the result required depends upon state that is inherent in the initial conditions (the documents)—it doesn't matter how you divide up a document or if you jumble up the words, the count associated with each word doesn't change, so the result you're after doesn't depend on the context surrounding those words. On the other hand, if you're interested in counting the number of sentences in those documents, you might have a much more difficult problem. (You might think you could just chunk the documents up at the sentence level, but whether or not something is a sentence depends upon surrounding context—a machine can easily mistake an abbreviation like Mr. for the end of a sentence, especially if that Mr. is followed by a capital letter which could indicate the beginning of a new sentence...which it almost always is. Actually...if you're smart you can probably come up with a very compelling argument that this
-
Re:Perhaps a good addition to data warehousing
The correct project name is Hadoop. It was factored out of Nutch 2.5 years ago. And Yahoo has been putting a lot of effort to make it scale up. We run 15,000 nodes with Hadoop in clusters of up to 2,000 nodes each and soon that will be 3,000 nodes. I used 900 nodes to win Jim Gray's terabyte sort benchmark by sorting 1 TB of data (100 billion 100 byte records) in 3.5 minutes. It is also used to generate Yahoo's Web Map, which has 1 trillion edges in it.
-
Re:Not surprising....
Sure there are a few specialized proprietary distributed databases written from the ground up - Google, Amazon, EBay, Yahoo, and the like
And a few open ones. Take a look at CouchDB and Hadoop.
but no, dealing with very large databases does not scale well by throwing commodity hardware at it.
Assuming you're talking about, say, SQL databases, there's always sharding. You can even find proxies which will do it for you, without having to touch the app.
-
Re:Not surprising....
Sure there are a few specialized proprietary distributed databases written from the ground up - Google, Amazon, EBay, Yahoo, and the like
And a few open ones. Take a look at CouchDB and Hadoop.
but no, dealing with very large databases does not scale well by throwing commodity hardware at it.
Assuming you're talking about, say, SQL databases, there's always sharding. You can even find proxies which will do it for you, without having to touch the app.
-
Re:Wiki was obviously wrong...
Yes, yes it is. Map reduce sends redundant requests. If a machine dies, the query still succeeds. Take a look at hadoop its the same idea as google's map reduce and has been recommended by several googlers.
-
Re:Bjarne
He continues..
..What would the world be like without Google?... Only C++ can allow you to create applications as powerful as MapReduce which allows them to create fast searches.
I totally agree. If Java ( or Pyhton etc. for that matter ) were fast enough why did Google choose C++ to build their insanely fast search engine. MapReduce rocks.. No Java solution can even come close.
I rest my case.Apparently you've not heard of Hadoop which is written in Java and, unlike Google's MapReduce, actually available to people outside of Google today.
-
Re:Licenses for technology
-
It might be helpful to point some of it out
Up until recently I'd had a similar opinion. Then I started work on a new project and began noticing all these interesting technologies.
Some exciting technology is being developed using Java. Check the trove. -
How about Ant?
I've had several backup/maintenance schemes set up at home over the years, often spanning Windows and Mac machines. Being a Java guy, I found Apache Ant to be a really good tool in place of shell or batch scripts. It's cross-platform, and it's quite extensible. And of course Java has pretty comprehensive cryptography API. And it wouldn't be much to wrap it all in a decent GUI, if that strikes your fancy or you want to roll it out to someone else to use, who's not comfortable with the lower level stuff.
-
Re:Here are some things to test
Second, send yourself messages from multiple outside sources with the GTUBE string. This string is meant to trigger SpamAssassin so that it guarantees the message is marked as spam. Other filter systems respond to it as well. So you'll be able to tell if the message came through or not. http://spamassassin.apache.org/gtube/
That would cause problems if Google renamed YouTube to GTUBE.
-
Here are some things to test
Although the most likely scenario is botnet shutdowns, here's some steps you can try if you still suspect some new filtering in place:
- First check your message headers to see if there's anything new in there. If your ISP, webhost, or other intermediary is filtering, you'll probably see something in there indicating the messages as clean/safe and what filter marked them as such.
- Second, send yourself messages from multiple outside sources with the GTUBE string. This string is meant to trigger SpamAssassin so that it guarantees the message is marked as spam. Other filter systems respond to it as well. So you'll be able to tell if the message came through or not. http://spamassassin.apache.org/gtube/
- Third, if you're really ambitious, try forging an IP address to send yourself some messages from IPs on the major known blacklists. This should confirm if some filter is doing blacklist filtering, as some of the mail delivery systems (eg. Postfix) can do blacklist filtering without the need of an additional tool like SpamAssassin.
-
Re:I thought only Windows did this:
Because of the number of legacy servers on the web (e.g. those that serve all files as text/plain)
If you missed it, that was a thinly-veiled jab at Apache. Check out Bug #13986. You know you aren't doing well when an author of the HTTP 1.1 specification shows up on your bug tracker to post a "WTF?" comment
:). -
one-time vs. revenue stream
Retarded management is the problem in your scenario, not donations from Microsoft.
I don't share your faith in the infallibility of ASF administrators.
Furthermore, I think you should check out the sponsorship page at the ASF's website. Becoming a sponsor is a commitment to ongoing support, not simply a one-time payment. One-time donations to the ASF are handled through a separate mechanism, without the public fanfare associated with the sponsorship program.
It seems fairly clear that Microsoft's "sponsorship" is, in fact, supposed to be a revenue stream for the ASF.
So, what part of the management's actions here are retarded? Correctly categorizing a promise of ongoing financial support? Hiring new workers, purchasing new equipment, and purchasing bandwidth contracts that are appropriate for the new budget?
It seems to me that the poor decision was to accept sponsorship from an organization whose interests are so obviously not aligned with the ASF and Free software in general.
-
one-time vs. revenue stream
Retarded management is the problem in your scenario, not donations from Microsoft.
I don't share your faith in the infallibility of ASF administrators.
Furthermore, I think you should check out the sponsorship page at the ASF's website. Becoming a sponsor is a commitment to ongoing support, not simply a one-time payment. One-time donations to the ASF are handled through a separate mechanism, without the public fanfare associated with the sponsorship program.
It seems fairly clear that Microsoft's "sponsorship" is, in fact, supposed to be a revenue stream for the ASF.
So, what part of the management's actions here are retarded? Correctly categorizing a promise of ongoing financial support? Hiring new workers, purchasing new equipment, and purchasing bandwidth contracts that are appropriate for the new budget?
It seems to me that the poor decision was to accept sponsorship from an organization whose interests are so obviously not aligned with the ASF and Free software in general.
-
one-time vs. revenue stream
Retarded management is the problem in your scenario, not donations from Microsoft.
I don't share your faith in the infallibility of ASF administrators.
Furthermore, I think you should check out the sponsorship page at the ASF's website. Becoming a sponsor is a commitment to ongoing support, not simply a one-time payment. One-time donations to the ASF are handled through a separate mechanism, without the public fanfare associated with the sponsorship program.
It seems fairly clear that Microsoft's "sponsorship" is, in fact, supposed to be a revenue stream for the ASF.
So, what part of the management's actions here are retarded? Correctly categorizing a promise of ongoing financial support? Hiring new workers, purchasing new equipment, and purchasing bandwidth contracts that are appropriate for the new budget?
It seems to me that the poor decision was to accept sponsorship from an organization whose interests are so obviously not aligned with the ASF and Free software in general.
-
Re:XHTML and CSS
And check out Apache FOP for something up the same alley, but FOSS
-
Re:XSL-FO?
Is there a free implementation for rendering XSL-FO? Using "optimal" formatting (e.g. Knuth-Plass)?
Yes (Apache fop), and... maybe? I can't find a definitive answer, but there is this:
http://wiki.apache.org/xmlgraphics-fop/KnuthsModel
How long is "Hello World"? B/c IIRC XSL-FO is very verbose (not just because of XML, but the language design).
You have to write a "master" for each page type, but it's not that bad:
http://www.renderx.com/tutorial.html#Hello_World
Non-trivial documents do get big fast, though.
How much boiler plate do I have to put up to write a document conforming to ACM article standards? Bibliography management? Etc?
Two Imperial Assloads. I'm guessing. But I really don't know for certain.
I was having a lot of trouble coaxing plain TeX to do what I wanted, and Unicode was the straw that broke the camel's back in that case. Ease of installation of the document processing system was something to be considered, and Apache FOP is a trivial install.
What I have now is a XML processor written in Python (it used to be XSLT, but I'd had enough of that after a while) that munges my XML code into XSL-FO, and then fop produces PS and PDFs. All the contents and index are generated by the Python processor. (fop doesn't support the XSL-FO 1.1 indexing stuff--at least it didn't the last time I looked--so options are limited and nasty for eliminating duplicate page numbers in the index.)
However, for my needs, it works just fine. (I want to quickly produce A4/US Letter 1-/2-sided from a single source document.) But my typesetting needs are simplistic compared to those of math- and layout-heavy users.
-
XSL-FO?
Let the hate commence. Anyway:
XSL-FO is another markup language, but there's a good bit going for it, not the least of which is an application that renders it directly to PDF: http://xmlgraphics.apache.org/fop/
The main good thing about FO is the ability to take advantage of related XML technologies to help you generate the documents (and the various tools that you can use to generate them). You can embed SVG diagrams and MathML if you're comfortable with the namespaces; FOP can definitely render SVG via Apache's Batik project (which is also very good) and I'm pretty sure will also render inline MathML via an optional plugin. A lot of people mentioned OpenOffice, and the cool thing there is that since the documents it generates are XML documents (I'm pretty sure its equation editor emits MathML), you can use XSLTs to transform the documents that it generates into XSL-FO documents for rendering.
The obvious missing feature is the WYSIWYG app, but you'll find a bunch of links at the W3C's XSL-FO site.
Anyway, like I said, let the XML hate commence.
C
-
Re:Mac OS X ...Server?
You're holding some pretty old grudges.
- Seriously, HFS? The one deprecated in 1998 by HFS Plus and that Mac OS X can't boot from?
- Most, if not all, of the rsync woes (of which, I admit, there were many) were fixed years ago
Also, judging by the nature of your problems, I feel like you were severely underqualified to be running any sort of computer network. I mean come on:
- How could you both know what a dhcp server is and enable Internet Sharing without being able to debug that you had just hosed everyone on your network? There's even a warning when you enable it, telling what will happen if you've hooked it up to the wrong network.
- You host web sites on the defacto, open-standard server but are unable to accommodate for file system case sensitivity (do you not have control of your shift key, the files on the system, OR the code running the site?)
- Your claims are so vague that you sound like someone defending himself from incompetence.
I think you knew just enough just to be dangerous and couldn't take it when you were criticized for your actions.
(And you'll just get insults if you mention it here on
/. ;-)Yes, if you're pompous and self-righteous but ultimately wrong.
-
"Membership" Does Not Apply
You can't "buy" a membership in the Apache Software Foundation, and corporations cannot become members. As has been blogged elsewhere, El Reg has its terminology wrong on this one.
Microsoft has agreed to a platinum level sponsorship of the Apache Software Foundation. If you browse to the page, you'll see that the benefits of sponsoring, even at that level, consist of a logo and a press release.
You can't buy a membership in the ASF. The only way to influence the ASF is to show up and talk code. Anyone can join the mailinglists and start contributing patches, and everyone who contributes a substantial amount of code signs a license agreement to clear the IP. If folks contribute code of consistent quality, they become committers. As they show their interest in the project surpasses their day to day circumstances (like affiliation), they are invited to the Project Management Committee. Show that you have the interests of the foundation at heart, and you'll likely be invited to become a member and get to vote in board elections. That's how it works. Membership can be earned, but not bought.
-- Sander Temme - Member, Apache Software Foundation
-
"Membership" Does Not Apply
You can't "buy" a membership in the Apache Software Foundation, and corporations cannot become members. As has been blogged elsewhere, El Reg has its terminology wrong on this one.
Microsoft has agreed to a platinum level sponsorship of the Apache Software Foundation. If you browse to the page, you'll see that the benefits of sponsoring, even at that level, consist of a logo and a press release.
You can't buy a membership in the ASF. The only way to influence the ASF is to show up and talk code. Anyone can join the mailinglists and start contributing patches, and everyone who contributes a substantial amount of code signs a license agreement to clear the IP. If folks contribute code of consistent quality, they become committers. As they show their interest in the project surpasses their day to day circumstances (like affiliation), they are invited to the Project Management Committee. Show that you have the interests of the foundation at heart, and you'll likely be invited to become a member and get to vote in board elections. That's how it works. Membership can be earned, but not bought.
-- Sander Temme - Member, Apache Software Foundation
-
Dynamic pages pollute count
There are so many dynamic pages on the net now that one web site, like slashdot as an earlier poster commented, can contain literally millions of pages. People use programs like modrewrite, isapirewrite and linkfreeze to manipulate spiders into crawling pages that are near identical. For more than one customer I've made meta, title and content randomization, serialization and or URL rewriting schemes to make damn sure spiders index every possible dynamic page, and it works. I have a single dynamic page that must have been indexed hundreds, maybe thousands of times with slightly different content, and they are all in the index.
Google tries to detect a dynamic page by looking for ampersands and equal signs, as well as looking at the content of the page, it is really quite easy to fool.
e.g.: http://somesite.com/itemlist.php?listmode=1&category=beds&orderby=7
when 'rewritten' shows up as
http://somesite.com/items/1/beds/7.html
So 1 billion web pages could be, and I know a few thousand pages like this, just a few hundred thousand dynamic pages. Not that the pages don't have relevant information, some of the stuff can be redundant though. For instance, when the spider crawls across "Records per page = 10" > "Records per page = 20" > "Records per page = 30" etc.. or when lazy programmers don't use cookies and databases to store information but try and concatenate the URL with the user's selections. Thank god for that GET limit. People need to use POST!
If someone knows how to stop this message board from creating links out of false URLs please, let me know. -
Re:This is a move against Linux...
Could someone mod that guy insightful? I think someone finally found out what's cooking here.
The ASF is much, much more than Apache httpd. Consider Apache POI, for example.
-
Re:The Register is not credible
So my response is: wait for an announcement elsewhere.
You mean like the announcement currently at the top of Apache's homepage?
-
Re:Not really that "predictive".
First thing is to upgrade the version of SA you're using, then configure it better (install good rule-sets), train bayes, in that order.
I have accounts on servers who have different policies/versions, and have experienced no (important!) false positives on one and had to whitelist on the other.
Convincing the people sending SPAM-ish looking mail to do otherwise could also help, rather than just accepting it:
-
Re:I am with Bjarne on this one.
Actually...
The largest "real, in-use" Hadoop cluster that Yahoo! has is around 2000 nodes, counting a dedicated name node. As far as we're aware, we've got the largest Hadoop cluster. [If there is a bigger one, we'd love to talk to you and compare notes.
:) ]That said, we do have Hadoop running on tens of thousands of machines. Just not as one big cluster.
It is also worth pointing out, that most of our clusters are multi-user, multi-application. The number of nodes is really more indicative of the size of the Hadoop distributed file system than the number of nodes given to a particular application in our (grid team's) use case.
There is a lot more about Hadoop, and Yahoo!'s particular Hadoop usage for internal utility-type computing, at http://wiki.apache.org/hadoop/HadoopPresentations .
-
Re:Why are you expecting this?Check out Wicket, been using it for more than a year and absolutely love it. I have managed to do some pretty slick javascript based UI, without touching a single line of javascript.
All the templates are done in HTML and code is in Java. Can't get any simpler than that.
-
Java/Apache heavy?
Is it just me, or is this survey extremely Java heavy?
Not only that, but there are a good number of Apache projects in particular... Apache Tomcat, Apache Geronimo, Apache Derby, Apache Struts...
-
Java/Apache heavy?
Is it just me, or is this survey extremely Java heavy?
Not only that, but there are a good number of Apache projects in particular... Apache Tomcat, Apache Geronimo, Apache Derby, Apache Struts...
-
Java/Apache heavy?
Is it just me, or is this survey extremely Java heavy?
Not only that, but there are a good number of Apache projects in particular... Apache Tomcat, Apache Geronimo, Apache Derby, Apache Struts...
-
Java/Apache heavy?
Is it just me, or is this survey extremely Java heavy?
Not only that, but there are a good number of Apache projects in particular... Apache Tomcat, Apache Geronimo, Apache Derby, Apache Struts...