P2P Bibliographies with Bibster
Noksagt writes "P2P isn't just for government documents anymore! Bibster assists researchers in managing, searching, and sharing bibliographic data in a peer-to-peer network. This project shows great promise to researchers who currently search for citations through centralized servers (Google, Scirus, CiteSeer, ISI. and many others). By making it decentralized, researchers can share bibliographic data with no subscription costs and avoid typing this data in by hand. It can import and export citations using bibtex. The project is GPLed and free clients for windows and Linux are available. There's also a Sourceforge page for Bibster, so you can checkout from the CVS if the Bibster site is slow."
this is news for nerds guys...
the CVS server will slow down before the website.
I am going to download it, a create a bunch of papers written by myself. Soon, I will be published in Science, Nature, and many other of the top periodicals of chemistry, physics and biology. Perhaps I will co-author a paper with Stephen Hawking.
oh wait....
Seriously, having a collaborative system for journalism with moderation and web of trust like elements could be wonderful - anyone got any bright ideas on how to do it?
Future conversation between two illustrious academics:
"Could you send over that citation for that lagomorph genome paper?"
"Sure thing. I'll send some Steely Dan too, it helps me when I read papers about the lagomorph genome."
"31337, thx."
People who cite will also read the paper before doing so. This system will be useful when one has a paper in hand, but does not have the bibtex entry. No one uses just a citation without the content of the paper.
:)
So you have to prepare the content, and you might as well submit it to those journals, conferences
P2P isn't just for pirating music anymore?
I'm seeing a URL...no, a number. Yes, it starts with a 5. I believe it's past 500. It's becoming clearer...I see the number 503.
Did you just ask a question? If you did, it appears the answer is "No"
That looks promising. Will there be an easy way to see a citation index - for example, listing all the publications that cite a given article? (Citeseer does this, and this can be important to academic types.)
Is it just me or is a scientific database every idiot can add to a bad idea?
What would be really nice is to have the full texts of articles available P2P. That's the advantage of using centralized databases from subscribing locations (like universities): you can sometimes access full text for newer articles with just one click. Swapping full texts would be tremendously useful (and would keep us lazy scientists from having to actually get up and go to the library). Yeah yeah, I'm sure there are copyright issues... but doesn't fair use apply somehow? I'm a psychology research assistant at a major university, and at weekly lab meetings we often send around articles by email for everyone to read and then discuss, and I've never even really thought about copyright of them until now. Isn't open sharing of knowledge at the heart of the scientific endeavor? Oh, and also: it would be awesome if user comments could be added to each citation. Like: "this was an influential paper that opened new directions for research on human memory," etc. Of course, you can also get a ROUGH idea of that kind of thing by how many times a paper's been cited by other papers, as someone else already said.
Skip the whole bibliography bit, how 'bout one that shares essays and book reports? Take the whole Cliff Notes one step more and just pre-build book reports, position papers, classification papers, and other material to cover Freshman Lit, Philosophy/Critical Thinking, and Intro to American Literature. Heck, why stop there? You've got psych papers, econ, anthro...basically everything that'll keep a pure-bred math/physics type focused on their major.
:)
Then you can run bayesian filtering to "learn" your writing style and apply "corrections" as needed to make it your own. Think of it as a more liberal copyleft.
This post is intended as a joke. I sure as hell don't advocate plagiarism or anything of the sort. Read your books, write your own papers -- you'll be a more well-rounded person in the end.
The next big question is whether or not it's standards based. While it would be surprising if it used Z39.50, it would be a shame if it didn't use SRW and/or CQL.
Especially as NISO is recommending them in their current 'Metasearch Initiative' -- an industry/academic/government cross sector committee with the major players and interested parties for allowing cross searching of bibliographic databases with other sorts of things.
(ObDisc, member of both SRW Editorial Board and Taskgroup 3 of NMSI)
--Azaroth
If we were to look at another project, say, CDDB, which stores meta-data for CDs (Title, Arist, Track Listing), something not at all unlike storing meta-data for books (bibliographies), you'll note that CDDBs entries are frequently inaccurate, mispelled and just plain wrong.
When it comes down to it, I don't really trust Random Joe to provide accurate trustworthy info. It's not like its like Wikipedia, or anything, which has constant peer review and a clear history.
...married to a non-geek (getting her PhD in Psych). When I told her about this system, she said:
"My system's better anyway. I have a file, with the exact bibliography printed on the folder, for every article I've read or written. If I need one, it's right there. If I need to use the citation, I can just copy it from my Excel spreadsheet. Now why would this thing be better?"
Some people are born geeks, I guess.
Why does every new P2P app have to call itself This-ster or That-ster? Are the developers really so lacking in creativity that they can't come up with a new name?
I was going to say exactly the same thing.
However, rather than YAP2PN, I would rather see it all integrated into some existing thing. I only want to have one client.
citeseer has full text available for for most of its articles, and its a free service, so maybe copyright isn't such a big deal for some reason. Maybe it's because most papers in computer science are available from the author's website.
-jim
I wonder why there is no Mac OS X version. There are many scientists on OS X. It can't be a very hard port since they have a linux version, can it?
Next, they'll perfect image search:
A possible inquiry could be: I want to see defiance in the face of insurmountable odds.
As a result Imagester returns images depicting defiance in the face of insurmountable odds.
Seriously, are they offering anything better than standard keyword and author search? What I'd really like to see is such a bibliography database that ranks search results usign a PageRank-like algorithm (as I recall, the idea for PageRank derived from research on citation graphs, so this would bring things full circle).
I'd also like to see Google start parsing publications and indexing them by author, year, and citations. The bibliography databases that I'm familiar with require manual input of new entries; it would be cool if this could be done automatically instead. Of course, there will need to be some interface to correct erroneous entries, and this opens up a large can of worms.
I've been working on a similar idea for news, and as far as I can tell fair use completely applies to this specific idea of yours - education and the arts, unbiased, not for profit.
There are already some sites out there doing something similar like the Media Awareness Project [mapinc.org] which collects and archives research on drug policy. From what I can tell, they only get sued when they get too big, present content with a bias, or try to profit.
I find it hard to believe my little project is the only one out there. We're working on web/p2p jointly, but there are bound to be others, and they'll all probably be open source. So once one good once comes out, we'll see lots of applications of this within research and academic communities.
I don't know if this is a direction I like seeing P2P networks go, in the sense that full articles would be available for download. With some tweaking of the idea, I think there could be an advantage.
Many universities are paying tons of money to privitized databases to store either full text articles (for some)or simply the abstacts so students can search and read articles to their hearts delight. They are, in my experience, unreliable as well. The systems crash, you get database errors or lose the connection.
With enough metadata I would hope someone could come up with a CCDB type system for universities which would at least have the abstract info (summary, author, journal name, date) etc to at least look up in the system. Decentralize it and share it on among all universities. Even if it just stays within 'academia' it would be great. Hopefully speed, reliability and accuracy would improve.
sounds like almost like usenet, for the comments over p2p, and SubEthaEdit for the group editing, with the added ability to include hidden comments, ofcourse.
OK, these p2p apps are awesome, but I see a problem, they each need to maintain their own p2p system(protocol), by forking from another project it or by writing from scratch or they need to piggyback another network...
When will someone sit down, using an open source model ofcourse, and write the 'granddad' p2p protocol? It doesn't have to require everything, just has to be able to support everything... Encryption, hidden routing(not being able to tell who is requesting data vs. who is just passing data along), multiple source download, huge scaling, efficient and distributed search, etc.
This public network could become the defacto to what open source apps work off of. As long as the protocol is the focus(a nice gui as well, but seperate the frontend from the backend), you could use it link to files on your website, or you could have multiple apps(a music/napster like app, a scientific research paper app, a bibliographies app, a usenet discussion thread app) each of them using a common protocol, and routing between them, but each app filters out the noise it doesn't want.
It could be the killer app, it could have every major p2p app migrate to it. Project Gutenberg, Bibster, linuxiso.org, all using a common protocol and network.... *drools*
eTBlast is a bibliographic search engine to which you submit an entire abstract. A little natural language processing and the results returned are to articles which have similiar abstracts. Though the tool operates on the Medline database, there is no reason the algorithm couldn't be used with Bibster.
Doom 3 has been massively pirated this weekend, at record highs. Apparently, it's shaping up to be one of the most pirated games ever. Estimates are that id Software has lost up to 2 million dollars. Activision isn't saying anything at this point. Gamespot and the BBC both have articles on the news. The PC Gamer editor has some words for the pirates in the BBC article. This setback is set to cost Activision and id Software millions.
There needs to be some way to double check the citations, or rate the sources of the cites, or those who like to pad their papers and make up scientific-sounding stuff for websites will have a good time with this. Too good, and it will be full of bogus references to Timmy's article on Cold Fusion.
Doom 3 has been massively pirated this weekend, at record highs. Apparently, it's shaping up to be one of the most pirated games ever. Estimates are that id Software has lost up to 2 million dollars. Activision isn't saying anything at this point. Gamespot and the BBC both have articles on the news. The PC Gamer editor has some words for the pirates in the BBC article. This setback is set to cost Activision and id Software millions. John Carmack is reportedly very unhappy.
For example, a faculty member may be sponsored by several different projects, each of which wants that faculty member to update their web page with each new publication.
Odds are, most faculty will update their own personal page and possibly one project page. This leaves the other projects needing to harangue the faculty member in to updating their pages.
For example, a postdoc comes and visits, write a bunch of papers and then moves on. It would really be nice if the postdoc could take their publications with them to their next position.
For example, you are on an airplane and need access to your usual bibliography,
For example, all your publications are on one machine, and that machine is unavailable.
Bibster seems like a good start in addressing these issues.
Locally, Professor Edward A. Lee had a similar idea, with the added wrinkle of having centralized project specific servers check the repositories of individual researchers and update the project specific list of publications with the bibliography info and the paper itself.
Maybe trusted sources could sign bibliographies. You could add certain contributors to your web of trust.
A decentralized, indexed and well documented database that everyone can access...
What's the difference between a database and a hard drive again?
Literalism isn't a form of humor, it's you being irritating.
. . . will this actually be useful?
>This system will be useful when one has a paper
>in hand, but does not have the bibtex entry.
Perhaps I'm spoiled by working in a field with very good online databases and journals that require only brief bibliographic entries, but it's hard to imagine where this would actually be useful. 95% of the papers one has in hand were located via an online database and came with bibtex entries. On the rare occasion one finds a paper copy of an article and no bibtex entry, it's usually faster to generate one by hand than to find it in a database.
If there are people who find it useful, I'm happy for them. But, I don't see it myself.
It also seems like it could worsen the propagation of errors in citations. An interesting, if tangential, discussion of the topic is in a paper by Simkin and Roychowdhury. (Note that I'm not endorsing the authors' claim that propagating errors in citations indicate that papers have not been read. A more plausible argument is that authors tend to assemble their citations *after* having completed the paper and crib citation text in order to save time formatting their own. Then again, I suppose that suggests that there are a lot of people who actually will use a service like this one.)
Citeseer does not have acces to all articles, and therefore the results are not always useful. Actually my bibtex file contains many articles that they don't have, but it is somewhat bothersome to share the information with citeseer. I guess that citeseer could make a page that parsed bibtex files to add to the archive, but I haven't seen such a page (I didn't look lately)
:)
The advantage of the p2p client is that it suddenly becomes easy for me to share my information with the world. I can just hope that citeseer grabs the opportunity to download a lot of popular entries to add to their data base.
If course a lot of problems would be solved if "web of science" made a better user interface that allowed me to export my search results to bibtex, or to easily search among articles written in a given year. Similarly it would help a lot if "scitation" was able to support more journals.
Finally I wish to point out that this client is a step towards a better and fairer world. In poor countries, many institutes cannot afford web of science, so this p2p client will provide a nice and cheap way for them to get some useful info. Hooray for bibster
Many big journals have very standardized url's, so it should be possible for the client to use wget to check for the validity of the entries. This would allow me to check my own bibtex file, but it would also allow my client to kindly warn other clients with wrong entries or to create a blacklist of potential liars on the web.
HOw about all entries from a particular IP-number?
Oftentimes several researchers will be working on the same problem. The p2p client might allow me to spy on competitors to guess what they are working on. Is this good? Does the client allow me to keep some entries secret while sharing others with the world?
In what character set is the bibliographic information stored? If one has a library whose titles are in a zillion different scripts, then it is really essential that one keeps one's bibliography entries in UTF-8. Does this system use such an international encoding, or does it expect us to submit to romanizing all our foreign-script titles just to avoid the issue?
oh, you mean Freenet?
OK, so if citeseer has text for most articles and abstracts + citations for all, then explain why we need a P2P service to do less?
Well, don't answer that. This isn't really about me. I hope.
I've installed the thing. It seems to see peers. So I thought I'd search for a very, very common author. I entered Dana Scott. Nothing. I entered Tanenbaum. Nothing. I entered local boy Vaandrager. Nothing. I entered Barendregt. Nothing. I entered "concurrent". Nothing.
I entered my name. I got everything I've ever published. But then I had imported my own Bibtex files, so I'm not surprised (I've never cited any matches for the above). I entered "coalgebra". I got matches from phiwum again.
Is the user base small? Skewed? Am I just incapable of using the damn thing?
(Note: I don't *think* it's firewall problems, but I could be wrong. I don't see anything in the logs though. But I'm a liberal arts moron, so don't expect much from me.)
Worse than this: I'm a philosopher now. I'm not really doing computer science. I'm starting to guess that this tool won't be too useful for me. At least not for a while. Not until Metaphysics gets its ACM topic category.
Phiwum's law: anyone that names an obvious law after himself and then puts it in his own sig is just pathetic.
What would be really nice is to have the full texts of articles available P2P.
.bib file.
That's quite easy to do: if I have the article in ps or pdf, then the name of the file is the name of the bibtex-key. And every article is in the 'articles' directory next to the beloved
it would be awesome if user comments could be added to each citation.
I use the annote field. However, how can you be sure that the review is accurate?
Am I the only one that tried this thing out and thought, "Damn. Look at that real estate."
I'd like it more if I was uploading in the background and my queries had a lighter, smaller interface, say a shell interface. Better yet (much better) an xemacs interface that works well with reftex.
I know that the latter is much too much to ask for a young product, but I hope that the authors give developers (not me) some APIs to get some lighter weight clients out there.
Assuming I ever get a non-local match for my queries, that is.
Phiwum's law: anyone that names an obvious law after himself and then puts it in his own sig is just pathetic.
What would be really nice is to have the full texts of articles available P2P.
S2S is such a network for academic users as the target group. It is currently in a test phase. Sponsored by the German government. Also includes an expert client, where you can sign yourself up as an expert for a specific area and get to answer questions. According to the current statistic, the network provides over 1 million documents.
Homepage is here, but in German: http://s2s.neofonie.de/
My cats ate my karma. They also wrote this comment.
Citeseer only sees about 70% (iirc) of the papers in Computer Science, and basically none outside... and its attempts at BibTeX are usually rubbish... and it's up and down like a tart's... (well, until the mirrors are properly sorted and stable).