I agree, Flamenco's faceted metadata is a great way to look at structured data. But Solr has facets, and they're really easy to enable. So that kind of functionality is not really the hard part.
The really tricky part is finding out what your users need (which is of course not what they say they need).
Use your search logs: the most important part is seeing what they search for and especially if they have zero results. Talk to them about why: they may have different vocabulary or need something new, or it may be OK because they're checking for the absence of something. You can also look at what are the top queries and the top kinds of queries: maybe they really want to segment by assembly district: that's a great use for faceted metadata.
Then you can use the standard UI tools like sketches and wireframes to expand the search form for their needs. They may want to search for nicknames or maiden names, who knows? I don't, but they do.
It's an iterative process involving content coverage, back-end search functionality, and client-side interfaces, but when you do it right, it just hums:-)
I agree entirely. The search spammers abused the system and now it's gone -- a tragedy of the commons.
However, metadata on web sites is a powerful and useful tool, so smaller search engines can take advantage of it even though the web-wide ones can't trust anything that's not visible.
There's a new approach to fielded searching called "faceted metadata" search that's really designed for rich metadata systems and well-populated databases, such as online catalogs, recipes, auctions and technical documentation. It shows the applicable metadata fields in context, a dynamic taxonomy. So if you search for "pepper" in a recipe database, you can then navigate based on other ingredients, cusine, holidays and so on.
I think this is great for all those databases where there's tons of information but no easy way to navigate it. I written up a Faceted Metadata backgrounder with some examples. It makes sense, it's usable, I think it's the Next Big Thing in search.
Nope, Google shows a "snippet" of text surrounding the matched words in almost all cases.
The only time they don't is when there are no matched words on the page. That can happen when they match on anchor text pointing to the page, or when the page is empty because it's Flash or a full-text graphic or something equally icky. In that case, Google will default to the description tag.
Right, the public webwide Google search enigne ignores meta tags of all kinds, I think because there's too much spam in them. The Google Search Appliance indexes them and searches all the meta tags automatically.
While I occasionally miss complex operators in Google, the text snippets shown with search results in bold generally give me the answer right away.
I love the snippets and hit highlighting, and am encouraging all the site search developers I know to add these features. Great for end-user satisfaction!
I'm not a huge fan of PDF in general, and it's tricky to index because the files tend to be big, so you can get word co-occurances where they don't have much to do with one another.
That said, I've found a lot of PDF search engines, my current favorite is PDFWebSearch whichis smart about PDF pages and should handle local file system indexing as well as web site crawling. I have a whole page all about PDF search engines, including links to all the ones I know about at the SearchTools PDF searching report.
If you want to get in touch with the Glimpse people, I have a page about that too.
I'm trying to figure out if anyone uses ID3 tags for searching MP3s. So far, it looks as though Napster, Gnutella, Freenet, Mojo Nation, etc. do not look inside the files to check the contents of these tags. Am I missing something?
I'm curious if there's any incentive for people to actually fill out the tags, especially the more complex ID3v2 frames.
I agree, Flamenco's faceted metadata is a great way to look at structured data. But Solr has facets, and they're really easy to enable. So that kind of functionality is not really the hard part.
The really tricky part is finding out what your users need (which is of course not what they say they need).
Use your search logs: the most important part is seeing what they search for and especially if they have zero results. Talk to them about why: they may have different vocabulary or need something new, or it may be OK because they're checking for the absence of something. You can also look at what are the top queries and the top kinds of queries: maybe they really want to segment by assembly district: that's a great use for faceted metadata.
Then you can use the standard UI tools like sketches and wireframes to expand the search form for their needs. They may want to search for nicknames or maiden names, who knows? I don't, but they do.
It's an iterative process involving content coverage, back-end search functionality, and client-side interfaces, but when you do it right, it just hums :-)
I agree that the Google Box is closed, but as I understand it, there is no ongoing fee, except for support. You buy it, you own it.
I agree entirely. The search spammers abused the system and now it's gone -- a tragedy of the commons.
However, metadata on web sites is a powerful and useful tool, so smaller search engines can take advantage of it even though the web-wide ones can't trust anything that's not visible.
There's a new approach to fielded searching called "faceted metadata" search that's really designed for rich metadata systems and well-populated databases, such as online catalogs, recipes, auctions and technical documentation. It shows the applicable metadata fields in context, a dynamic taxonomy. So if you search for "pepper" in a recipe database, you can then navigate based on other ingredients, cusine, holidays and so on.
I think this is great for all those databases where there's tons of information but no easy way to navigate it. I written up a Faceted Metadata backgrounder with some examples. It makes sense, it's usable, I think it's the Next Big Thing in search.
Avi
Nope, Google shows a "snippet" of text surrounding the matched words in almost all cases.
The only time they don't is when there are no matched words on the page. That can happen when they match on anchor text pointing to the page, or when the page is empty because it's Flash or a full-text graphic or something equally icky. In that case, Google will default to the description tag.
Avi
Right, the public webwide Google search enigne ignores meta tags of all kinds, I think because there's too much spam in them. The Google Search Appliance indexes them and searches all the meta tags automatically.
While I occasionally miss complex operators in Google, the text snippets shown with search results in bold generally give me the answer right away.
I love the snippets and hit highlighting, and am encouraging all the site search developers I know to add these features. Great for end-user satisfaction!
Danny Sullivan has the best list of realtime and delayed voyeur sites I've found so far at his What people search for page.
I'm not a huge fan of PDF in general, and it's tricky to index because the files tend to be big, so you can get word co-occurances where they don't have much to do with one another.
That said, I've found a lot of PDF search engines, my current favorite is PDFWebSearch whichis smart about PDF pages and should handle local file system indexing as well as web site crawling. I have a whole page all about PDF search engines, including links to all the ones I know about at the SearchTools PDF searching report.
If you want to get in touch with the Glimpse people, I have a page about that too.
Avi
I'm trying to figure out if anyone uses ID3 tags for searching MP3s. So far, it looks as though Napster, Gnutella, Freenet, Mojo Nation, etc. do not look inside the files to check the contents of these tags. Am I missing something?
I'm curious if there's any incentive for people to actually fill out the tags, especially the more complex ID3v2 frames.