Domain: apache.org
Stories and comments across the archive that link to apache.org.
Comments · 2,937
-
Re:Open Standards != Open Source
Most web server software has always been gratis.
The first major web server was the public domain httpd by NCSA.
It was later supplanted by apache, which is a play on words of the patches layered on top of httpd.
http://httpd.apache.org/ABOUT_APACHE.html
According to Wikipedia:
Since April 1996 Apache has been the most popular HTTP server software in use.
Before April 1996, that title belonged to NCSA httpd.
Mosiac and lynx were free and available before Netscape. Even IE and IIS were gratis after a fashion, if you used Windows (which most did) they were bundled with the OS. If you didn't use Windows, you couldn't use IE or IIS anyway, and there were free browsers for other systems.
If you look at a time line of web browsers, you will see there was never a time when there weren't multiple, competing gratis browsers. http://en.wikipedia.org/wiki/Timeline_of_web_browsers
Yes, there were a few years when Mozilla(the old pay version) and IE (which is only sort of gratis) dominated as the best browsers, but there were always other options.
The innovation on the server side is just as if not more important to the growth of the ecosystem, and for all the history that matters, web servers have been gratis.
The web was NOT built on commercial software. There was a very limited time when commercial, non graits software dominated desktop graphical viewing software, but that was a limited subset of the ecosystem, after the creation of the system, and only lasted for a very short period of time.
H.264 patents expire 2028. There has never been a royalty baring standard that has survived on the web for that amount of time, and to allow one now will limit innovation on the web for years to come.
-
Re:Not a "first NASA's Open Source release"
Despite what http://oodt.apache.org/ says, this is not NASA's first "Open Source" project, as Worldwind is older (2007). However, OODT may well be NASA's first Free Software project because Worldwind was released under a non-free license, whereas this is Apache 2 license. Yes, you read correctly. NASA has its own NOSA license (NASA Open Source Agreement - yuck), which is not a free software license by the standards of FSF, but is non the less approved by OSI (which makes it officially "Open Source"). Look it up: http://en.wikipedia.org/wiki/NASA_Open_Source_Agreement
Paxcoder,
You are 100% correct, the site should read this is the first NASA Project @ Apache. Good catch, I will be sure to get the project team to make the correction.
-Cameron
-
Not a "first NASA's Open Source release"
Despite what http://oodt.apache.org/ says, this is not NASA's first "Open Source" project, as Worldwind is older (2007). However, OODT may well be NASA's first Free Software project because Worldwind was released under a non-free license, whereas this is Apache 2 license. Yes, you read correctly. NASA has its own NOSA license (NASA Open Source Agreement - yuck), which is not a free software license by the standards of FSF, but is non the less approved by OSI (which makes it officially "Open Source"). Look it up: http://en.wikipedia.org/wiki/NASA_Open_Source_Agreement
-
Re:Once it was said:
Ok, I understand you don't like Windows. However, when you say things like "in some cases, has lots of issues with parts of those products); namely Apache, MySQL, PHP, Perl, etc (which semi-fully function at a big speed decrease over a LAMP or WAMP stack or Apple implementation."
Do you know what WAMP means?
Yes, I should have said AMPOS2. WAMP used to be Warp Server and AMP, but then was co-opted for Windows Server and AMP. Sadly, though OS/2 had it (AMP) long before Windows, it's a small enough niche market that no one noticed or cared when they decided to call the Windows AMP stack WAMP.
As for the issues running AMP on Windows, just check the release notes for the workaround, outstanding issues and so on for each component of AMP. Here's a page with some of the notes on just the Apache issues: http://www.apache.org/dist/httpd/binaries/win32/
-
Ben Collins-Sussman blog post
[Note: The summary's second link seems to be getting slashdotted, so I'm copying its contents to a comment here. The words are not my own.]
This entry was posted by Ben Collins-Sussman on Monday, 3 January, 2011
Author’s Note: These opinions are my own. I'm one of the original folks that started the Subversion project, but no longer work on it. These thoughts do not reflect the official position of either the Subversion project or the Apache Software Foundation, which are located here on the ASF blog.
Subversion has reached the realm of Mature software — it’s yesterday’s technology, not cool or hip to work on anymore. It moves slowly. It is developed almost entirely by engineers working for corporations that need it or sell support for it. Alpha-geeks consider software like this “dead”, but the fact is that something like half of all corporate programmers use Subversion as their SCM (depending on which surveys you read.) This is a huge userbase; it may not be sexy, but it’s entrenched and here for the long haul.
Subversion isn’t unique in this position. It sits alongside other mature software such as Apache HTTPD or the GCC toolchain, which are famous projects that are similarly developed by corporate interests. There’s a tricky line to walk: none of these corporations “own” these projects. They understand that they’re acting as part of a consortium. Each interest sends representatives to the open source project, contributes code, and allows their engineers to participate in the full consensus-based evolution of the software. IBM, Apple, Google, and numerous other companies have figured out how to do this correctly:
- 1. Let your engineers know what’s important to work on.
- 2. Let them participate individually in the community process as usual.
- 3. Profit. 98% of the time the corporations eventually get the features they want.
Today, however, we have a great counterexample of how not to participate in an open source project. Subversion was initially funded and developed by CollabNet; today at least two other companies — Elego and WANdisco — are employing numerous engineers to improve Subversion, and are just as vested in selling support and derivative products. CollabNet and Elego continue to function normally in the community, but WANdisco recently seems to have lost its marbles. Last week, they put out a press release and a CEO blogpost making some crazy statements.
It’s clear that the WANdisco CEO — David Richards — is frustrated at the slow pace at which Subversion is improving. But the two posts are simply making outrageous claims, either directly or via insinuation. David seems to believe that a cabal is preventing Subversion from advancing, and that “debate” is the evil instrument being used to block progress. He believes users are crying for the product to be improved, that the Subversion developers are ignoring them, and his company is now going to ride in on a white horse to save the project. By commanding engineers to Just Fix things, he’ll “protect the future”of Subversion, “overhauling” Subversion into a “radical new” product.
Is this guy for real? It sounds like someone read my friend Karl's book and created a farce of “everything you’re not supposed to do” when participating in corporate open source.
Even weirder, he’s accusing developers of trying game statistics by creating lots of
-
Re:Trivial if you want to go the extra mile
How much spam actually is originating through gmail?
Sorry, I can't give you data. Suffice it to say it's a problem.
How does one prevent a spammer from spoofing these headers?
The headers aren't spoofed. When you use Hotmail or Yahoo, your IP is added to a tracking header by the webmail server so that IP reputation systems can pass along the blame as if it were a Received: header (there's more to it than that, but this should give you the principle). Since GMail doesn't do that, there's nothing to be done; the tracking can't go beyond Google's servers.
If a spammer spoofs headers so as to pretend to pass blame on, the trust doesn't extend far enough; the relay used by the spammer to add those fake headers isn't trusted and so the buck stops there. When dealing with real webmail providers, the trust can be extended to the established webmail relays and then followed into the IP tracking header.
We have meandered a bit off topic here
... my point is that this is possible for the nearly identical problem of webmail, so somebody merely needs to figure out how to do it for the IPv6->IPv4 routing process. The simplest solution is the one I outlined above; require a mail relay that speaks both protocols so it can properly record the conversion with a Received header. Modern IP reputation systems (and the clients that poll them) are fully IPv6-ready and will process this perfectly. -
Re:In addition by how much?
SSL uses strong cryptographic encryption, which necessitates a lot of number crunching. When you request a webpage via HTTPS, everything (even the images) is encrypted before it is transferred. So increased HTTPS traffic leads to load increases. Why does my webserver have a higher load, now that it serves SSL encrypted traffic?
All servers will display an increased load how ever
In my experience, servers that are heavy on dynamic content tend to be impacted less by HTTPS because the time spent encrypting (SSL-overhead) is insignificant compared to content generation time. HTTP vs HTTPS performance
yet with web 2.0 type stuff with lots of ajax
Many, very short sessions means that handshaking time will overwhelm any other performance factors. Longer sessions will mean the handshaking cost will be incurred at the start of the session, but subsequent requests will have relatively low overhead.HTTP vs HTTPS performance
all of the connections are going to kill you. The short answer is 10-20% but YMMV. No idea what happens when you start adding in Google adsense and other third party crap.
-
hbase is an option to NoSQL and Cassandra.
I recently read that someone moved their large operation from Cassandra to Hbase, a hadoop file system. http://hbase.apache.org/
HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:
Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
Cascading, hive, and pig source and sink modules
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
HBase 0.20 has greatly improved on its predecessors:
No HBase single point of failure
Rolling restart for configuration changes and minor upgrades
Random access performance on par with open source relational databases such as MySQL -
Re:Dictionnary attack doesn't show any weakness
He exploited the "is fast to calculate" weakness.
Clearly, we need hash functions which take long amounts of time to compute.
You're being facetious, but this is basically what the apr1 algorithm used in the Apache webserver does. It's a modified variant of MD5, where the hashing step is repeated 1000 times in order to slow down the creation of dictionary hashes:
/*
* And now, just to make sure things don't run too fast..
* On a 60 Mhz Pentium this takes 34 msec, so you would
* need 30 seconds to build a 1000 entry dictionary...
*/
for (i = 0; i < 1000; i++) {
apr_md5_init(&ctx1); ....from apr_md5.c, line 608
I don't know whose bright idea that was... the comment about the speed of this routine on a 60 MHz CPU speaks for itself. But regardless of how effective such "improvements" are, we're now stuck with this algorithm if we want to support the password hashes used in conjunction with
.htaccess files, for example.CJ
-
Re:Why does "no JCP" == "no Java"?
2) Oracle will fail to keep control of Java, and everyone will end up using Java as maintained by an open source group (probably Apache). Ultimately something like the JCP will still be needed to keep enterprise involved in Java.
Dear lord, I hope it's not Apache. Any organization that allows new versions of its libraries to target Java 1.4 when 1.5 has been out for over half a decade does NOT deserve to be put in charge of the entire language. Particularly when said library deals with Collections.
As a developer that works primarily in Java, I'm a bit worried. If my company sees Java as being a risk we might end up moving over to
.NET, and I just detest the documentation and library design of that platform.There are some things
.NET does better than Java. This includes: Generics, Properties, GUI, Web Services (although JWS is a marked improvement over Axis 1/2, and brings this more in line with .NET's WCF). This does not include: Database access (although the .NET Entity Framework may address much of this; from what I've seen it's a lot like JPA), Concurrent collectionsGranted, that list is just things I can think of off the top of my head.
-
Re:PolicyNodeImpl.java is from the Android TEST tr
The headers haven't only be removed - which is a GPL violation by itself - there's a *new* header:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/This is a blatant copyright violation, because you can't re-licence GPL code as Apache.
-
Re:Blame Sun, not Oracle
Guess you didn't read the full open letter but only the FAQ.
so go on and read
http://www.apache.org/jcp/sunopenletter.html
first.
The FAQ is right because you cannot claim 'Java compatibility' without passing the full TCK and paying license fees. Thats one of the reason why google only made a 'Dalvik' vm and not a 'Java' vm.
-
Re:Change this to an inflammatory title
I don't think that's right; this isn't about GPL vs. the Apache license. The issue isn't the licensing of OpenJDK itself, but about the licensing of the Java Technology Compatibility Kit (the JCK), which is used to test if an implementation is compatible with a given version of the Java spec. The JCK isn't available under an open source license at all. If the JCK were under the GPL, or even if it were under a license that didn't permit you to modify it, but only permitted anyone to run it, then Apache could use it to test their Java implementation, which is what they want to do.
-
Re:Sometimes you need real hardware
FYI: SSL-based name virtual hosting (using TLS) already exists for quite a while now, and is called SNI (Server Name Indication). You don't really need an IP per host anymore. For apache. see here: http://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI
-
We need distributed index storage.
We have an open source search engine : http://nutch.apache.org./ But we need a distributed index storage system that is uncensorable and/or trackable. Do we have that?
-
Re:Evil which ever way you look at it
First of all, it is unlikely that Google actually programmed that part. It was probably part of Harmony.
Combined with the fact that Google used a Java decompiler to obtain the code, and the version they had decompiled was not even OpenJDK, this stinks in a major way. It's not just your patent lawsuit, and not even copyright infringement - it's blatant plagiarism.
-
Re:Here we go again (SCO)
The other Anonymous is absolutely correct. Get a grip. The continuous stream of completely bogus performance claims coming from the Java camp only serves to completely undermine Java.
Also claiming that C coders are just shit doesn't really help the cause.
I never made such a claim. You're trolling.
.. high performance computing applications written in Java have recently won benchmark competitions. In 2008 and 2009, Apache Hadoop, an open-source high performance computing project written in Java was able to sort a terabyte and petabyte of integers the fastest.
To wit:
I'll be that this is an I/O bound benchmark - by design. The hdfs (hadoops distributed filesystem) is an abstraction layered atop native (OS-embedded) filesystems. These native filesystems are implemented as optimized, native code. These native filesystems are always (originally) written/tuned in native C (and optimized by C compilers).
There may be significant performance overhead/latencies in the TCP/IP stack (processing) too. TCP/IP drivers are implemented as optimized, native code (even if off-loaded to a fancy TCP/IP Off-load engine - TOE). All of this is originally written/tuned in C as well.
In addition, Hadoop's designers choose to employ native compression libraries - for 'performance reasons' (and due to non-availability of Java alternatives). See http://hadoop.apache.org/core/docs/current/native_libraries.html
So yes, this bencmark manages to use a dollup of Java code to spread the heck out of some relatively simple 'record-sort' benchmark logic. A lot of the run-time observed is system time - not user time. The JVM hardly matters. In this benchmark, it just runs a dollup of logic that spreads around the sort work - to lots of commodity class servers (running Linux). On each server, that logic drives lots of native/C code - in parallel. By ganging thousands of such servers (versus mere hundreds of such servers in the previous record-setting attempt), this new bechmark effort manages to set a world speed record. Sure. So what?
This same result could have been achieved with Python
... or Perl ... or Ruby ... or whatever. This is really just a demonstratation the power of distributed, parallel 'sorting' - irrespective of the implementation languages involved. It is hard to imagine that the modicum of top-level logic is anywhere close to being the rate limiting step. Therefore, this benchmark says little about the performance of Java - or any other scripting language that might have been employed to this same end.This isn't about Java (performance). This is about the performance advantages of the distributed, map-reduce algorithm. It hardly matters whether this distributed, map-reduced sorting benchmark was implemented in JAVA, C/C++, Python, VB, Ruby, Perl or whatever! Lets not miss the whole point.
Cheers, Steffen.
So basically this has nothing to do with Java. The fact it was written in Java is barely worth mentioning. What was novel was the implementation, which could have been in any number of languages and in fact, the language choice would likely have zero or very, very little impact on the over all benchmark.
And that's the trouble with everyone constantly pushing Java is uber fast. Pragmatically its not. Period. It is, however, faster than many language alternatives. It can, however, come occasionally come close to C and/or C++ in some obscure corner cases. But without a doubt, if you have some Java code which runs 5x than C code, the C code is absolute crap and the differences likely stem more from implementation details or improper benchmarking.
Just because Java is not as fast as C/C++ does not mean its useless. If that were true, languages such as Lua, Python, Perl, Ruby, so on and so on, would simply not exist. For a large number of use cases, fast enough is certainly fast enough. Just the same, constantly making completely bogus claims about Java's pragmatic performance only serves to derail the language as a whole.
-
Re:Here we go again (SCO)
Well, this one example does look pretty copied. Sun wrote sun.security.provider.certpath.PolicyNodeImpl - notice it's fully documented, and authored by Seth Proctor and Sean Mullan. It's not part of the standard Jaa library, it's part of the JVM's private implementation, and was released later on as part of OpenJDK.
Google have exactly this same code, minus the comments in their copy of Apache Harmony, but it's not in the official Apache Harmony, at least since 2005 (Don't believe me? Run svn log -v http://svn.apache.org/repos/asf/harmony | grep PolicyNodeImpl).
However, this code isn't central to Android. It's part of the test suite, it doesn't run on any phones.
On the one hand - this looks like a cut and dried infringement. On the other hand, it's a pretty trivial part of the project. Is that the best Oracle can find? If it is, then it's on a par with SCO holding up malloc.h as the "smoking gun".
-
Ant instead of shell scripts
The gist of article seems to be that for many tasks people should combine the powerful Unix standard commands like find, grep, xargs, sed, etc instead of writing dedicated programs in lower level languages such as Ruby, Python, Java etc. This idea is not new, and many of the people around here have heard it 15 or more years ago. Being a developer, I always liked the perspective of having to write lesser code.
However, the Unix command line and shell script approach never really worked for me, especially if other people in the team wrote them. The main reasons for that are:
- missing error handling (no checking for "$?", broken pipes,
...) - lack of consideration for special cases such as file names with blanks in them
- difficult post-mortem analysis if the data causing an error got lost in a pipe instead of being available in an intermediate or temporary file
- possible configuration nightmare to get non-ASCII characters working (depending on the actual platform you're on; it can be easy, too)
- terse syntax with a tendency to "write only" code (which makes sense for a direct input command line but less so for code that should be maintained for years to come)
All of this could be overcome by measures such as checking $?, redirecting stderr, using temporary files, configuring encodings properly, documentation comments and so on. However, this rarely ever happens in practice.
For the past couple of years I have been using ant for many tasks formerly delegated to shell scripts. Its main advantages are:
- provides many standard tasks to copy/move/delete files, search and replace in files, filter files, download files, send mails etc.
- provides many ways to limit commands only to certain files depending on name, date, contents etc.
- most tasks fail on encountering any error and consequently terminate the whole script (though this can be disabled for a certain task if needed be)
- generic <exec> task to execute shell commands in case ant does not provide a standard task; you have to be careful with this one though and set failonerror="true" or it will continue even if it fails
- pretty legible due to using english words instead of abbreviations for most things
- many simple typos are already detected when ant parses your script and not only when a task gets executed.
- platform independent syntax for file paths so your script can work on Unix and Windows.
- takes care of all escaping and non-ASCII issues with files names.
Of course it's not perfect. For example, it uses XML and consequently contains some syntactic noise, it lacks advanced string operations, there are no pipes and sometimes seemingly trivial things result in a lot of messing around with properties. Nevertheless I rarely see a need to write shell scripts anymore except for simple launchers. YMMV but despite ant initially being a build tool for Java developers, we use it for many sysadmin-like tasks with great success and a small amount of development time.
- missing error handling (no checking for "$?", broken pipes,
-
Re:Linux has the same drag as Mac in business
Couchdb is a good start. http://couchdb.apache.org/docs/overview.html
> sudo apt-get install couchdb
> firefox -url http://127.0.0.1:5984/_utils/Then see how close a MS access database that is (but I haven't used Access in a long, long time).
-
Re:If I may add
Strangely, considering we're a Microsoft shop where MIS people choose Microsoft becuase it stops them having to make decisions or think (something they're not too good at anyway), we have a Drupal site for our user community.
Its quite good, basically its another CMS, so pretty much everything sharepoint does, but less oriented on being a web-based network fileshare, without folders. Its used by a couple of high-profile websites, the Economist and the White House for example.
Apparently you can connect a Drupal site to Sharepoint "back end" using its CMIS module, and there are several file management modules in addition to the basic functionality. Collaboration doesn't even need to be discussed as its what the thing was designed to do
:)The site itself has fairly good documentation, and if you wantg really good example - check out Apache or Subversion, both very comprehensively documented.
-
Re:Yeah, right, remember OS2?
Excuse me, when I go reread the history book on this.
Didn't IBM and Microsoft wrote a chapter together on one OS already? OS2?
What relation does OS2 have with Java?
Excuse me, but reading the history book myself, a bunch of relevant things emerge:- Jikes - a JIT arising on the stage a bit (about 2 years) before Sun's HotSpot
- Eclipse - no need to say more
- all the goodies on the Apache Software Foundation's site
-
Re:Foo
What guarantee does OSS make that will save taxpayers millions of dollars?
Just a wild guess, but I'd say that it's because you don't need to pay to use it.
That's only one part of the cost of software. Granted, with a lot of mainstream commercial software that initial cost is not insignificant, however then the maintenance of it comes in to play. With a few notable exceptions, OSS systems tend to be far less implemented, leading to difficulties in finding staff to maintain the systems, and maintenance often can take longer.
At one volunteer organisation that I used to maintain some of their IT systems (on a volunteer basis, I should add) I recommended we ditch the Sendmail/Dovecot/DSpam email system and replace it with Exchange - simply because it was impossible to train any of the permanent staff on how to properly create and delete new email accounts and I got sick of getting calls every few days because someone had dome something stupid and the person whose duty it was to do this couldn't do any troubleshooting. The organisation already had a Windows domain and had Office throughout the organisation, which reduced the cost a bit further.
-
It's all in the past already
I'm not dreaming of a bare hands movement taking over the world
Well, you don't need to dream, it has already happened.
Ever heard of this "internet" thingie? A "bare hands movement" is what keeps it moving
-
Re:A checklist
Tomcat has a pretty tight security model (which is usually disabled
:) ). It shouldn't be hard to emulate something like that for Android, should it? -
Re:Best article
Better still:
- Sze's announcement (greater longevity version [possibly])
- Previous 1 quadrillionth (lone-)digit record
- New 5 trillion consecutive digits record (additional details
-
Re:an so are an infinite other digits in that numb
"had they used the Bailey-Borwein-Plouffe formula"...
You don't think calling the implementation "DistBbp" suggests they did?
-
Re:This isn't necessarily a bad thing
You might be interested in Apache Shindig.
-
I Use Their SSL Certificate
Snake Oil Security? I use their SSL certificate to lock down all my Apache boxes.
http://httpd.apache.org/docs/2.1/ssl/ssl_howto.html#certauthenticate/ -
The obvious answer used to be ZOE
Back in the day, ZOE was exactly what you're looking for. It's an open source, cross platform turn-key, solution (Simple Server is built-in) that is designed to archive, index and search your email (using the Apache Lucene search engine). Jon Udel has a good article on O'Reilly that includes some screen shots.
ZOE meets all of your requirements, though data import is a bit of a problem. There are several different strategies for data import, so one of them may meet your requirements.
Unfortunately, ZOE is abandonware so it's not for the faint of heart. The original author was on the bleeding edge and tended to make 'interesting' technology choices like Tapestry for the framework, and using his own, home-grown build system and a Creative Commons license that isn't usually used for software. He eventually abandoned Java development for Lua and let the registration for the home page lapse. As a result, it's difficult to recommend this for all but the most determined, high functioning users.
-
The obvious answer used to be ZOE
Back in the day, ZOE was exactly what you're looking for. It's an open source, cross platform turn-key, solution (Simple Server is built-in) that is designed to archive, index and search your email (using the Apache Lucene search engine). Jon Udel has a good article on O'Reilly that includes some screen shots.
ZOE meets all of your requirements, though data import is a bit of a problem. There are several different strategies for data import, so one of them may meet your requirements.
Unfortunately, ZOE is abandonware so it's not for the faint of heart. The original author was on the bleeding edge and tended to make 'interesting' technology choices like Tapestry for the framework, and using his own, home-grown build system and a Creative Commons license that isn't usually used for software. He eventually abandoned Java development for Lua and let the registration for the home page lapse. As a result, it's difficult to recommend this for all but the most determined, high functioning users.
-
Private Zimbra installation
While it's totally overkill for the job, I highly recommend you run a Zimbra Open Source instance for yourself. Although you don't need much of what it provides (Calendaring, contact sync, Jabber IM, etc), it will let you store your messages in a stable, searchable and accessible form. Zimbra can directly import from PST or via IMAP (with your mail client or imapsync) and once it has your messages it full text indexes them with Lucene and so you can search them via the web or IMAP clients. You can easily get your messages out via one of the supported export formats or just use your IMAP mail client to dump the messages into mbox/maildir/pst/whatever. While you could certainly roll your own, why not let someone else take care of all the hard work for you?
-
Re:And...
Yes, there are times when a "no-sql" solution is better than SQL, and the vector is pretty much that point where you realize that storing files in databases makes sense like hauling bales of hay in sports cars does.
It's more than that: it's also for every case where the lookup logic is NOT handled by the database. Consider when queries are fielded by a separate service, such as a dedicated search engine (e.g. Solr/Lucene), leaving the database is relegated to just primary key lookup for full records/documents. At that time the benefits and tradeoffs offered by the various NoSQL solutions suddenly become a LOT more interesting, because that's what these tools specialize in.
-
Re:stop making things up
citation? I can't find any history on IBM's JVM. (IBM is notorious for keeping this kind of information locked away on the 9-net...)
http://www.devsource.com/c/a/Architecture/IBM-Extends-Java-License-with-Sun/
(There's this little thing called Google, you know.)
I dimly remember IBM announcing that they would develop their own JVM not based on Sun's JVM, but I certainly can't find any citation either.
http://harmony.apache.org/contributors.html
They have been fighting with Sun/Oracle for years now over licensing and certifying.
-
You ignore licensed implementations
because the only implementations that could meet those criteria were the ones passing their compatibility tests and (in practice) only those based on their licensed source code. That makes Java a highly proprietary platform.
There are alternate implementations - you are complaining because they have to pass a test suite? Come on, how else can you be sure the VM works!! It's not like any aspect of how the Java VM is supposed to work is not well documented.
Two licensed (meaning safe from lawsuit) implementations:
Harmony (From Apache)
-
Bayes
Maybe he would be better off using some type of Bayesian classifier similar to the one SpamAssassin uses.
http://linux.die.net/man/1/sa-learn
It should work as well at classifying 'nuisance' emails as it does for classifying plain Spam as long as one trains it accordingly. Then, check the 'nuisance' emails at a lowest priority. He could also have his email go through several Bayesian filters, one trained to identify 'nuisance' emails and one trained to identify plain Spam. All email types could be handled differently.
In my experience, it's already too late to remove your email address from a web site when already too many people know it so it is not that efficient. Anyways, it seems like this guy might need some technical advise
;-) -
Link to Source
So does this mean that Android is not truly open source, i.e. available to anyone without right holder approval?
You can browse the source right here. All of that code should be Apache 2.0 license. I think the issue at stake is that they took a module of code that connects to Google's Market place for Android and they're not supposed to be doing that unless they are a member of the Open Handset Alliance. It's not like Google's launching a lawsuit against them but I'd imagine Google doesn't really appreciate that. Hosting that sort of thing can't be cheap (look at how much Apple claims it loses distributing apps) and maybe that's why your membership is needed -- to support that and keep it going.
I never realized that one had to a member of fruity club to develop Android hardware. I thought that was the point, anyone could innovate without corporate approval. It is just a gimmick to sell phones with promise of multi vendor support 'open apps', like MS?
You can get the source yourself and do whatever the hell you want with it. Carriers and phone vendors are demonstrating that they can even lock down Android so "open" doesn't mean f-ckall to the end consumer. You want to get down and dirty and hose up your own version of Android? Go ahead and pull it from that git repository linked above and do something fancy with the sqlite phonebook tree or whatever you want.
It's open source as can be but how do you "open source" a centralized app store with tons of traffic? I guess you're free to make your own app store and as far as I know, more are emerging. With sideloading you could make it as simple as a file download as long as the user's Android supports sideloading. -
Re:Program limitations
You're doing it wrong: http://lucene.apache.org/solr/
You don't want a database you want an index. Both Excel and MySQL (and others) are not the right tools. -
Re:A question that comes to mind...
because SSL sucks for virtual hosts
This has been fixed with TLS. See SSL with Virtual Hosts Using SNI. It doesn't work with IE6, but then, what does?
-
Old news.
Look at the timestamp of this presentation
:) It's a bit of old news.It was discussed here: http://www.theserverside.com/news/thread.tss?thread_id=48449
And it mostly shows that NIO is deficient. I encountered similar problems in my tests. Solved them by using http://mina.apache.org/ .
-
Re:Sphinx or Lucene
Solr in front of Lucene is a perfectly reasonable way to index highly structured information and allows structured queries.
-
Re:Is there a way to block Pakistan?All things are possible, it's just a question of how much work you want to put into it.
is there a way to detect visitors who are (probably) in Pakistan (like, is there a specific block of IPs that is assigned to Pakistan)
http://www.find-ip-address.org/ip-country/
redirect them to a page explaining that you'd rather not risk getting a death sentence from Pakistan, so you are not willing to serve content in that jurisdiction?
-
So Many Choices
There are some web sites dedicated to just source code: http://www.codeproject.com/ is a great place to find useful small applications with an explanation. http://sourceforge.net/ has excellent code. http://apache.org/ has very good projects. These sites don't require you to retype anything. While the programs in codeproject are small, some of the projects in source forge and apache are huge -- but many have very good small tutorials to get you up and running. For little hardware projects look at http://www.instructables.com/. Even the commercial products now have incredible online resources that in many ways surpass what we got in Byte, if you're not familiar with http://msdn.microsft.com/ check it out. Another approach is to install Linux, Ubuntu, Fedora, or any distribution comes with a package manager that allows you to browse applications by the thousands. I set one up in my house and my daughter had Tux Racer installed before I got home from work the next day. Computer magazines didn't go away, they were eclipsed. Oh did I mention http://eclipse.org/ its a full IDE, open source and a development environment as well.
-
Re:New Apple API?
The summary is incorrect.
The relevant part of the summary is simply quoting Adobe's Flash release notes. Those might be incorrect or simply lying.
I've never really understood why it seems so extremely difficult for Flash to render some simple 2D vector graphics and run a script or two in a VM without maxing CPU usage on just about anything. Is it a disguised version of Batik or something?
-
...and more
Also OpenSocial and SocialSite
-
Subversion
Title says it all: Subversion
-
Re:How does virtualization help
That article starts with "These scenarios are those involving multiple web sites running on a single server, via name-based or IP-based virtual hosts"
It sounds like this is talking about how to configure a single instance of Apache to serve up different websites based on the incoming IP address or the web site domain name. It doesn't sound like it applies to running multiple virtual machines, each of which has its own copy of Apache, each of which is trying to listen to port 80.
Although if I'm wrong, I'm sure someone will correct me.
-
Re:How does virtualization help
-
Re:Damage contained through one-time passwords.
What would be the motivation(s) for hacking Apache, anyway?
Well, some answers in the mail sent by Apache asking to reset the pwd:
We are assuming that the attackers have a copy of the JIRA database, which includes a hash (SHA-512 unsalted) of the password
you set when signing up as 'XX' to JIRA.
[...]
This is a problem because many people reuse passwords across online services. If you reuse passwords across systems, we urge you to change
your passwords on ALL SYSTEMS that might be using the compromised JIRA password. Prime examples might be gmail or hotmail accounts, online
banking sites, or sites known to be related to your email's domain, XXx.XX.And reading the report from Apache (very interesting), you see that hackers had fun messing around things, but the aim was really password-retrieval. They got spotted once they started to shut down services.
-
Re:Naturally, the passwords were not in clearHere is the actual e-mail they sent out, which unfortunately, I received:
Dear ____________,
You are receiving this email because you have a login, '________', on the Apache JIRA installation, https://issues.apache.org/jira/On April 6 the issues.apache.org server was hacked. The attackers were able to install a trojan JIRA login screen and later get full root access:
https://blogs.apache.org/infra/entry/apache_org_04_09_2010
We are assuming that the attackers have a copy of the JIRA database, which includes a hash (SHA-512 unsalted) of the password you set when signing up as '________' to JIRA. If the password you set was not of great quality (eg. based on a dictionary word), it should be assumed that the attackers can guess your password from the password hash via brute force.
The upshot is that someone malicious may know both your email address and a password of yours.
This is a problem because many people reuse passwords across online services. If you reuse passwords across systems, we urge you to change your passwords on ALL SYSTEMS that might be using the compromised JIRA password. Prime examples might be gmail or hotmail accounts, online banking sites, or sites known to be related to your email's domain, gmail.com.
Naturally we would also like you to reset your JIRA password. That can be done at:
https://issues.apache.org/jira/secure/ForgotPassword!default.jspa?username=_________
We (the Apache JIRA administrators) sincerely apologize for this security breach. If you have any questions, please let us know by email. We are also available on the #asfinfra IRC channel on irc.freenode.net.
Regards,
The Apache Infrastructure TeamSo, yeah. They were storing the passwords unsalted, which means that it is susceptible to a simple dictionary crack.
Needless to say, I'm quite disgusted with the Apache foundation right now.