Domain: jroller.com
Stories and comments across the archive that link to jroller.com.
Stories · 8
-
What Does It Mean To Be an Open Source Author?
lolococo writes to tell us that Laurent Cohen, founder of the open source project JPPF (Java Parallel Processing Framework), has decided to share what life is like for an open source contributor in general and little bit about what that means. "There came a time of coding, releasing, coding, releasing. The project started gathering some momentum, as a small community of users started to use it, but why was it not working in this case, or why did it not have this feature, or how could I do this, etc...? You get the drift. Oh my, now I had to start interacting with other folks! What was I to do? That started a (thankfully short) period of intense existential self-questioning. What was the purpose of this project? Why did I actually open-source it? I resolved this by deciding unilaterally that it would be a free contribution, for whomever would be interested enough to look into it. I also decided that it was my personal responsibility to support these brave folks into using the project, and to make it, as much as possible, a happy experience for them." -
MagLev, Ruby VM on Gemstone OODB, Wows RailsConf
murphee ends along a report from InfoQ: "Gemstone demoed [MagLev,] their Ruby VM built on their GemStone S64 VM, to an ecstatic audience. Gemstone's Smalltalk VM allows OODBs of up to 17 PetaBytes, with none of the old ActiveRecord nonsense: the data is persisted transparently. The Gemstone OODB also takes care of any distribution, allowing the Ruby VM and data to scale across many servers (Cheerio, memcached!). There's also an earlier quite technical interview with Gemstone's Bob Walker and Avi Bryant about MagLev." -
Lucene in Action
Simon P. Chappell writes "I don't know about you, but I hardly bother with browser bookmarks any more. I used to have so many bookmarks, back in the early days of Netscape's 4 series, that I would have to regularly trim and edit my bookmark file to prevent my browser from crashing on startup -- that's a lot of bookmarks, folks! Now, I go to my favourite web search engine, enter a couple of appropriate search terms and voila, there's my page! Search engines are so ubiquitous that we rarely give much thought to the technology that powers them. Lucene in Action by Otis Gospodnetic and Erik Hatcher , both committers on the Lucene project, goes behind the HTML and takes you on a guided tour of Lucene, one of a generation of powerful Free and Open-Source search engines now available." Read on for the rest of Chappell's review. Lucene in Action author Gospodnetic and Hatcher pages 421 (7 pages of index) publisher Manning rating 9 reviewer Simon P. Chappell ISBN 1932394281 summary Solid introduction to Lucene Who's it for? Lucene is a library and framework, rather than a complete application. It truly is an engine, around which you are expected to build and extend your own application. Like Lucene, the book is targeted at those who are looking for a tool to build their own search facility application rather than just "download and go." The book does include a number of case studies of Lucene usage (including at least one download and go search engine) but those are included to show how to use and adapt Lucene to fit differing environments rather than as ends in themselves. The Structure The book is sensibly divided into two parts. The first part looks at "Core Lucene" functionality, while the second part addresses "Applied Lucene".
Part one has six chapters, covering the central components and inner workings of Lucene. It's here that the book starts with a tutorial introduction, familiarising the reader with the concepts of Lucene as a search engine around which you wrap your own code. The other five chapters move steadily through good search engine fare, with indexing getting the whole of chapter two to itself The discussion of how to retrieve text from the documents being indexed is mentioned here but postponed until chapter seven, where it is dealt with exhaustively. Chapter three covers searching, and especially how Lucene ranks documents.
Chapter four examines analysis. In it's chapter introduction, the book explains that "Analysis, in Lucene, is the process of converting field text into it's most fundamental indexed representation, terms." This process is performed by an analyser, which tokenises text according to it's own built in rules; each analyser will have a different emphasis, some want only dictionary words, others might explicitly include acronyms and sometimes you'll want an analyser that will block stop words (those words in languages that are part of the structure, but that add nothing to the information being conveyed by the text; classic examples of stop words in English include "a", "and" and "the").
Chapter five looks at advanced search techniques; everything from sorting search results, searching on multiple fields to filtering searches. Many free or open source software tools are extensible, and Lucene is no exception. Chapter six addresses creating and using custom components within Lucene, everything from custom sort methods to custom filters.
Part two, the final four chapters, cover Applied Lucene. It is dedicated to practical uses of Lucene and answers the question "So, what can I do with a search engine?" Chapter seven covers ways and means to parse common, non-plain text document formats. The primary formats covered are RTF, XML, PDF, HTML and Microsoft Word. The ability to parse and index these file formats will cover the search engine needs of the majority of Lucene users. Chapter eight looks at a number of Lucene tools and extensions that are available; many of them being free and open source software. Chapter nine covers ports of Lucene. While for many users, Lucene being a Java library is not a problem, some users want its functionality in environments that do not have Java. The chapter looks at ports written in C++, C#, Perl and Python. Lastly, chapter ten takes a thorough look at seven Lucene case studies. Perhaps the "star" case study is the one about Nutch, a download and go search engine written by Doug Cutting , the original author of Lucene.
There are three appendices. The first offers installation advice for Lucene; a useful addition that those newer to working with Java libraries will surely appreciate. The second appendix has a very well explained description of the Lucene index format. This is the kind of information that can be hard to find, so it is welcome in a book of this sort. The last appendix contains a number of categorised resource references. The number and breadth of the resources provided could provide quite an incredible education in information retrieval theory if the reader was inclined to read them all. What's to Like? There are several things to like about this book. Let's start with the fact that the authors are part of the core development team of Lucene. This gives them both credibility and an excellent understanding of the internal workings of Lucene. Co-author Erik Hatcher is a fantastic writer, having previously been a co-author of the only Ant book worth bothering with, Manning's Java Development with Ant . (Full disclosure: I do know Erik personally.)
The structure of the book is well thought out and each chapter does seem to move your understanding forward when combined with what you learned from the proceeding ones. The division into core and applied Lucene is also helpful. While you'd hope that this was the case, it often isn't; hence I note it as a positive.
I especially appreciate that this book does not fill up page after page with API documentation. The authors appear to have grasped that if you have Internet access to download the software, you might just be able to access the documentation online; rather, they concentrate on the way to use the software. What a concept!
As a part of Manning's "in Action" series, the book has excellent layout and has obviously been thoroughly edited by both technical evaluators and copyeditors. This might seem to be a small thing to some, but a well-edited book stands out clearly from the crowd. What's to consider? If you are looking for a book on using and configuring a download and go style of search engine, this book would be less suitable. While the case study on Nutch is of good length, it would be too short to useful as a configuration guide. Conclusion I enjoyed reading this book. If you have any text searching needs, this book will be more than sufficient equipment to guide you to successful completion. Even, if you are just looking to download a pre-written search engine, then this book will provide a good background to the nature of information retrieval in general and text indexing and searching specifically.
You can purchase Lucene in Action from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Six Laws of the New Software
LordFoom writes "Still suffering from post-dotcom stress disorder, I keep my eye out for gentle balm to sooth my ravaged psyche. The manifestos at ChangeThis are not it. The most popular manifestos range from irritating to enlightening, with none of them particularly comforting. In particular the recent Six Laws of the New Software have done my dreams of writing lucrative code no good - although it has changed my idea of what money-making code is." -
JBoss Caught in Anonymous Posting Scheme
Reader scubabear writes "For years rumors have run rampant about employees of JBoss Inc. being actively encouraged to post anonymously, drumming up business by flooding the net with fake posts and simultaneously attacking competitors, all from behind a safe veil of anonymity. With the advent of a new feature for tracking users by IP on TheServerSide.com, the floodgates have been opened and those rumors have apparently been confirmed. The Java blog space now erupted with posts from a variety of bloggers (here, here, and here for a start) exposing a variety of anonymous/pseudonymous accounts used by JBoss employees to put forth their Professional Open Source message and simultaneously slam anyone who gets in their way in online technical communities such as TheServerSide, JavaLobby, and various personal blogs. The evidence shows how a corporation can manipulate popular opinion via anonymous personalities, that open source companies can be just as ruthless as closed source when it comes to marketing their wares, and that you should never forget that your cookies and IP address can and will be tracked online. No official response has been heard yet from the JBoss crew. Disclosure: I'm one of those bloggers erupting on this issue (see my story here)." -
JBoss Caught in Anonymous Posting Scheme
Reader scubabear writes "For years rumors have run rampant about employees of JBoss Inc. being actively encouraged to post anonymously, drumming up business by flooding the net with fake posts and simultaneously attacking competitors, all from behind a safe veil of anonymity. With the advent of a new feature for tracking users by IP on TheServerSide.com, the floodgates have been opened and those rumors have apparently been confirmed. The Java blog space now erupted with posts from a variety of bloggers (here, here, and here for a start) exposing a variety of anonymous/pseudonymous accounts used by JBoss employees to put forth their Professional Open Source message and simultaneously slam anyone who gets in their way in online technical communities such as TheServerSide, JavaLobby, and various personal blogs. The evidence shows how a corporation can manipulate popular opinion via anonymous personalities, that open source companies can be just as ruthless as closed source when it comes to marketing their wares, and that you should never forget that your cookies and IP address can and will be tracked online. No official response has been heard yet from the JBoss crew. Disclosure: I'm one of those bloggers erupting on this issue (see my story here)." -
JBoss Caught in Anonymous Posting Scheme
Reader scubabear writes "For years rumors have run rampant about employees of JBoss Inc. being actively encouraged to post anonymously, drumming up business by flooding the net with fake posts and simultaneously attacking competitors, all from behind a safe veil of anonymity. With the advent of a new feature for tracking users by IP on TheServerSide.com, the floodgates have been opened and those rumors have apparently been confirmed. The Java blog space now erupted with posts from a variety of bloggers (here, here, and here for a start) exposing a variety of anonymous/pseudonymous accounts used by JBoss employees to put forth their Professional Open Source message and simultaneously slam anyone who gets in their way in online technical communities such as TheServerSide, JavaLobby, and various personal blogs. The evidence shows how a corporation can manipulate popular opinion via anonymous personalities, that open source companies can be just as ruthless as closed source when it comes to marketing their wares, and that you should never forget that your cookies and IP address can and will be tracked online. No official response has been heard yet from the JBoss crew. Disclosure: I'm one of those bloggers erupting on this issue (see my story here)." -
JBoss Caught in Anonymous Posting Scheme
Reader scubabear writes "For years rumors have run rampant about employees of JBoss Inc. being actively encouraged to post anonymously, drumming up business by flooding the net with fake posts and simultaneously attacking competitors, all from behind a safe veil of anonymity. With the advent of a new feature for tracking users by IP on TheServerSide.com, the floodgates have been opened and those rumors have apparently been confirmed. The Java blog space now erupted with posts from a variety of bloggers (here, here, and here for a start) exposing a variety of anonymous/pseudonymous accounts used by JBoss employees to put forth their Professional Open Source message and simultaneously slam anyone who gets in their way in online technical communities such as TheServerSide, JavaLobby, and various personal blogs. The evidence shows how a corporation can manipulate popular opinion via anonymous personalities, that open source companies can be just as ruthless as closed source when it comes to marketing their wares, and that you should never forget that your cookies and IP address can and will be tracked online. No official response has been heard yet from the JBoss crew. Disclosure: I'm one of those bloggers erupting on this issue (see my story here)."