Integrating Wikipedia With a Local Intranet Wiki
An anonymous reader writes "I work for a large company taking a preliminary look at developing an honest-to-goodness wiki. We have tried to launch a company-wide wiki before, but with little success. The technical domains of each part of the company are different, thus each article needs a good deal of background to be useful. Of course, due the proprietary nature of our work we cannot share our articles outside of the intranet. What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki. When a user accesses Wikipedia from inside our intranet, they receive the wikipedia content, plus the local domain specific information. For example, links to company-specific wiki pages would be available in Wikipedia pages. Has anyone else tried to do something like this? I know it sounds like a logistical nightmare; are there any thoughts on how to make this successful?"
URLs. Look into it.
create a firefox addon that downloads a master list of wikipedia urls to add a link to the intranet site to. you can use regular expressions to parse the wikipedia source so that your link is consistently placed. the master list can be updated at will, and could probably be filled the first time with a simple database request. or something.
Build a web application to merge wikipedia content with internal content (iframes maybe).
Then setup a DNS alias to redirect the wikipedia traffic to this web app.
http://en.wikipedia.org/wiki/Wikipedia_database
Download their database, put it into your system, and you're set.
Perhaps the easiest thing to do would be start with a complete dump of Wikipedia and add your own stuff to it. Their database dump page is here.
It is 2.8TB, however. They allude to a "Wikipedia API" for working on a "random subset" of Wikipedia; maybe that would be helpful too.
DATABASE WOW WOW
Have you considered taking a recent wikipedia snapshot and using that as a foundation to seed your internal wiki.
Your internal users can then add their own revisions to this as required to customise it where necessary.
Of course, you'll lose the ability to pick up new changes/revisions to the original WP pages, but it might be a simpler way to go.
I assume you want up to date content and to have it clearly seperated from what is yours. Why not enclose the content within an IFRAME? Seriously, it's stupid and simple but might be all you need. Alternatively you coudl use some form of an intelligent proxy/page modifier, either as a mediawiki plugin or whatever floats your boat (i.e. every time a page is loaded also try to get the wikipedia stuff).
"What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki"
What does that even mean? If you want to design something, you'll have to use more precise language. And for god's sake, stop using the word leverage without thinking about it. You used it backwards - if you are augmenting your internal wiki with external wikis, you are leveraging your internal wiki with the external wikis. You leverage a boulder with a lever, but you don't leverage a lever with a boulder.
It seems to me I've seen a browser extension somewhere that lets users add their own comments to any arbitrary web page, and those comments can be made public so anyone else running the same browser extension will see them when they load the same page. I bet you could use something like that, with all your users having a browser plugin that pulls URL-based content from an internal server.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Sounds like a weird setup, so you'll probably need to do most of it yourself. Perhaps the easiest way is
1) setup a normal local wiki, with care to name pages the same as the relevant wikipedia page [I'm guessing you know how to do this]
2) use DNS redirects or similar tricks to get all wikipedia requests to go to a proxy
3a) do html injection on the page and stick your stuff at the bottom [MITM attack using ettercap or something like that]. This is probably a pretty bad solution, but is going to be the easiest to research as its textbook hacking.
3b) host dynamic pages that mash-up the 2 wikis (python,php,something like that). This is probably the closest to the right way to do it, no hmtl injection just a DNS redirect, but will require serverside processing for every.
3c) use injection, but only inject a bit of javascript/an iframe that tacks on your wiki stuff at the end (when avalible), This doesn't require much to be done serverside, just inject the same html on all pages.
Whatever you do you will probably spend more time reading hacking tutorials than wikihowtos
IranAir Flight 655 never forget!
Agreed. Appending to wikipedia is the ass backwards way to do it. Everyone suggesting greasemonkey and other addons are just enabling your backassery.
What you do is create an internal wiki, and wherever relevent you link to the wikipedia article. Or an external doc. Or nothing at all and expect your employees to look it up on their own.
You probably want interwiki.
Blearf. Blearf, I say.
watch 1.20hs here and see for yourself. This monster will change email, chat, wikis and forums. I'd be worried if I was a slashdot overlord. In fact, an idea for an extension to google wave would be to implement slashdot's moderation system into it.
Maybe I drank too much of the kool-aid, but I think wikis and forums will all have to rapidly adapt, or adopt the coming plague from Mountain View.
There are four sorts of people in the world: fools, lunatics, idiots and morons. - Umberto Eco, Foucaut's pendulum.
Maybe I'm missing something, but why not just have an external links section on your internal wiki, or a "Required Reading" section? Seems like the solution you're proposing is a little bit heavyweight for the described problem.
Am I the only one which cannot see any legitimate uses for this hack.
Why lure your users into thinking the content is on wikipedia if it is on your network?
Can't your users use wikipedia _and_ your wiki.
Sincerely I think that the goal for this hack is luring users to think they're reading/editing wikipedia for someone's profit.
You need to make sure that there is a clear demarcation between your content and the wikipedia content and this will limit your integration. The last thing you want is for one of your users to upload confidential information onto wikipedia in the mistaken belief they are putting it on the in house wiki.
Be nice, sponsor me: http://jailbreak.ragabonds.org.uk
Open page in intranet for...say, capcitor.
Script grabs wikipedia article, strips out header, sidebar, etc and fill in remaining links/images with proper URLs to wikipedia (so they work)
Stores in a database for diff'ing and updating later, dumps remaining content from Wikipedia at the bottom with a good 'ol <hr> and you're off!
Why? Can't you just link to wikipedia pages where appropriate? OK, my company has an internal server we link through to sanitize referrer info so our internal wiki titles don't get all over teh interwebs. But if the wiki users can't figure out "hey, this article is too specific - maybe wikipedia has more general information that would help me," you've got bigger problems than your wiki management.
Nowadays, any content management system worth anything has a built-in wiki and most allow direct linking and searching between the local wiki and wikipedia.
For example Documentum and Sharepoint both have federated search providers for Wikipedia.
Plus, because the OP works for a "large company" they probably already have DCTM or MOSS installed somewhere.
Why reinvent the wheel when you've already bought a better one? (job security?)
A very small part of My PhD looked at this (but with "collaborative textbooks" rather than wikis) -- see Chapter 4. Adding a very simple metadata-based navigation layer over the top of the wiki is pretty easy, clean (doesn't confuse users), and seems to do the trick. The wiki itself shows in an embedded frame. Of course, I had to go further and let students do difficult number theory proofs backed by machine reasoning systems within the book, but you won't have to solve that problem!
I'm (gradually) putting this fairly simple but useful part of the software into an online resource at www.theintelligentbook.com, though it's in my spare time and the system is down at the moment. I'll put my contact details back up there shortly in case the question-asker wants to discuss it technically.
What happens when a user doesn't understand that this isn't a local copy, and edits a wikipedia page with private information?
This is a bad idea, period.
With a browser extension (probably relatively easy with Firefox or Opera), you can modify HTML DOM and include an iFrame with company specific information. This should probably be unobtrusive on the Wikipedia page, but it should be clearly marked as internal to your company.. users aren't always the brightest, and there's always the possibility of them editing the Wikipedia page itself to correct local content which should never be published on the Internet. It might also be possible to force the Wikipedia page into an Frame, and have company content clearly identifiable in another frame.
With a proxy, you would add some Javascript near the end of the HTML page, which does pretty much the same thing.. you will be limited to the security settings of the browser, though..
Also interesting are the extensions allowing you to comment on any public webpage, and share those comments with other people. Most of these use a public server, but you could probably modify an existing firefox extension to talk to a local server (which you then need to script). I think there's even an open protocol for this.
Of course, if you're going the browser extension path with Firefox, why modify the HTML at all? Modify the user interface, so that the company's wiki becomes part of the browser? Somebody has a site they want to bookmark for the wiki? Have a button for it. They want to create a new topic, based around this page's content? Have a button for it? They want to see all related internal pages, have a sidebar which updates with info from the local server. Standardising on Firefox in an organisation isn't a bad idea at all, especially if you can bring such benefits to the company. :-)
One Tab for your Internal Wiki. Another one for wikipedia.
You can also highlight a particular word in your internal wiki, do a right click and search wikipedia (if your search is set so). The search term automatically open the wikipedia content in a new tab. How amazing. Isn't it?
Is it only me wondering how did this article ever made it to /. ?
Senthil
Frames! Enough said.
I think this is a very interesting story. Aside from the technical question raised, I am wondering why the first corporate Wiki wasn't successful. If it failed the first time because the culture isn't right or there wasn't any management support, a second wiki tool - no matter how seamlessly integrated - won't succeed either. Even if you have a company with many different technical domains it's even more reasonable to be able to share information. And an article shouldn't try to be totally comprehensible. You could write a parent page describing the concept, and subpages that are specialised for the different domains. I'd love to discuss this further.
This is something the Google Wave protocol and platform completely anticipates.
Its based on a tree structure and source code management. People who edit from the synergized wiki could add to either the private or public versions, and patches to public versions or additional documents could be changed and maintained internally.
A very hot and dry sauna.
links to wikipedia: [[wp>subject]]
internal links: [[subject]]
you can give the links different colours with CSS, e.g. wp link = blue, internal links = green
1. On your personal wiki server, have a copy of each page of the wikipedia you want to apply modifications to, and add whatever you want on those.
2. Have a modified http proxy on the intranet that detects queries to the wikipedia about items that you have on the server and re-route them.
For example, let's say you want custom information on http://en.wikipedia.org/wiki/Socks. You copy it to http//yourintranetserver/wiki/Socks, and make your changes.
Then, if someone from inside your network tries to get http://en.wikipedia.org/wiki/Socks, they get yours instead.
At the same time, the proxy needs to be intelligent enough to redirect back to the wikipedia page if your server doesn't have a page. Ideally the http redirect rules should be put automatically in place when a new page is added to yourintranetserver.
Go and look at Freebase: http://www.freebase.com/
They provide an API to obtain articles and structured data from them. They handle all of the wikipedia import.
Additionally, you can do much more with the structured data there
For instance - Olympic Cyclists and the Way They Died.
http://www.freebase.com/view/user/doconnor/default_domain/views/olympic_cyclists_and_they_way_they_died Try doing that with Wikipedia.
1) Install Wikipedia software locally and use this for any locally created articles
2) The web server running this simply proxies out to en.wikipedia.org for that request if not available in the local version. The easiest way to do this is with Apache + rewrite rules
This means that users can get to articles locally and on wikipedia from the same command
You then need to consider the following
1) The search request needs to go to the local version of wikipedia then the external one and concatinate the results together - a small proxy script should be more than capable of doing this
2) You may want to create a reference table which maps external wiki articles to related internal ones. Again a small script could insert these into the external wikipedia articles during rendering
Why not run MediaWiki on your intranet and use InterWiki links to Wikipedia in your own articles?
they were pretty good at page-hijacking, IIRC :-)
seriously though, perhaps i mis-read the question? are you looking for automated tools to do the hyper-links?
Ignore the nay sayers. Of course there is a lot of value in aggregating content and creating a compound page that blends your internal content with other sources.
From a usuability and authority-of-source perspective, however, I think it would be best to list each source in a separate section on the page, starting with your internal content at the top. You can get to the other content either by embedding links into your internal content, or by collecting the links in a separate section.
Wikipedia itself uses the embedded technique. When composing or editing an article, the author can embed markup for external references. On display, this markup is turned into a footnote link at the point of embedding, and a footnote at the bottom of the page. I don't see why you couldn't do something similar. In this case, however, you would be embedding references to Wikipedia articles.
I don't see why you couldn't do something similar. In your internal wiki templates, have a custom markup for embedding wikipedia queries related to the article. On display, turn this markup queries either into embedded links to footnotes, resolve the queries and deposit them at the bottom of the page, or toss them into iframes and let the user sort it out.
The other technique is to have a custom form in your internal wiki template where you collect the cross-references. On display, turn these queries into links or resolve them into content.
In any event, why limit yourself to Wikipedia? Include cross-references to patent search engines and other domain-specific sources.
A big word of caution, of course, is owed to the legal angle. Make sure you follow the law whenever reusing anyone else's content, even if it's just a link. Have your legal department sign off on your reuse policy. Don't distract them with technical aspects of what you want to do. They're lawyers; they only care about the law. Ask them a specific legal question, such as, "what is our legal exposure if we republish (links to or actual content from) Wikipedia on our internal wiki?".
"We receive as friendly that which agrees with, we resist with dislike that which opposes us" - Faraday
Accidentally I saw this site. I haven't tested and I don't know the results. I think it's in the early stages of development.
Until the skies turn blue...
Until the air of freedom strikes us...
My first thought was to use a Greasemonkey (or Greasemonkey analogue) to add whatever you wanted to pages that show up on Wikipedia. The way it could be integrated is you have the internal wiki with its markup and everything named as the same page title as what's on Wikipedia. When a page on Wikipedia is loaded, the script appends the internal wiki onto the Wikipedia page.
Others' concerns about Wikipedia being out of date or contradictory are valid though. You would probably do better to either.
Extra Points earned if, when someone clicks to edit an entry that has internal information, the process is seamless and feels like editing the Wikipedia page itself.
So basically you want to benefit from the community effort that has built Wikipedia but not give back? Do I have that correct?
Think about what Wikipedia would (not) be if everyone had your attitude -- to keep their contributions private.
Scan each internal page and see what wikipedia pages it links to. Store that info in a database. Make a firefox plugin that works as follows:
When a user is at a wikipedia page query the database and see which internal pages link to it. Add those links to the "See Also:" part of the wikipedia page.
No you wouldn't get inline links but it sounds much easier.
Use mediawiki as your wiki and add the interwiki plugin. See http://meta.wikimedia.org/wiki/Help:Interwiki_linking
I don't get it. Are people in your company using Wikipedia so much in their daily work that this would really be useful. Just set up your internal wiki. It is your focal point. Why try and integrate the two beyond just making a link to Wikipedia? Using Mediawiki, you can even use Interwiki links to easily link outside of your internal wiki.
Why not try the other way round:
Create your wiki, add pages, add links from your wiki pages (which you have full control over) to relevant wikipedia pages?
Much simpler, and should still produce the desired effect.
You are trying to force a technical solution on a social problem. It's probably not going to work. Your best bet for success is to try and install a WYSIWYG editor for mediawiki. There are several out there. wiki, underneath, is just a programming language. It requires training people - no matter how much it is designed to be "easy." Make it easier.
Consider Sharepoint. As much as /. is Anti-Microsoft, if your users are used to Exchange and Windows then Sharepoint is worth paying for.
I've worked for Larry Sanger's Citizendium.
I wrote a very simple extension for my own mediawiki site that pulled in external pages as an iframe within a wiki page. I'd imagine you can do the same, Build your own wiki, with the wikipedia pages included below your own content.
The experimental Tearline Wiki system we've developed at Galois might suit your needs. Inside the firewall, you use MediaWiki with the Tearline system, and get a combined view of your internal wiki(s), possibly different wikis on different sub-nets, and you can integrate it with Wikipedia or other internet-based wikis to get the global context of the article.
As others have said, integrating your content with other people's content can be a legal issue.
Contact me if you want more information on Tearline :)
peace,
isaac
Just keep them separate.
I work for a huge corporation and we have our own thing called etipedia.
Also, don't forget, wikipedia is X rated.
n/t
Every organization needs their own, up to date version of .
But seriously, process the SQL dump when you retreive a monthly (quarterly?) update. Generate a set of strings that are relevant to your organization, and strip articles that don't match.
Someone can always visit the upstream site, or you can use the interwiki facilities, as mentioned elsewhere.
Uh...we do this with some success and we just add external links to wikipedia articles in our local intranet media wiki articles. Each division has their own wiki article and subjects are organized therein. No need to reinvent the wheel just link to it.
Use interwiki links. I use them to link our intranet, mediawiki, our external developer wiki, and our external support wiki.
You will probably be unable to use them since using them requires the ability to get off your lazy ass and read the MediaWiki documentation or google for it, which results in plenty of information.
Also the fact that you're going to have to be able to insert a row in a database is probably going to be over your head.
READ THE DOCUMENTATION YOU LAZY FUCK.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I heard your company likes Wikis so we copied the entirety of Wikipedia onto your web server, so you can wiki while you wiki.
Use squid proxy to inject the extra content, that way it's centralized.
this is a really dumb idea. forget about it. tell them its not possible and move on to burning the company's money on something of more value.
If Wikipedia is indeed a good base for a lot of your company knowledge, you can do something dead simple: build a single PHP (or whatever language you prefer) page with an IFrame in it. Inside the IFrame you let users browse Wikipedia or any other web resource. Outside, in the parent document, there is a script that looks at the current IFrame URL and checks a local database for additional information. This could be additional text or even a stream of internal comments on this URL. The beauty of this idea is that you don't need a local copy of WP, and you don't need any HTML scraping. And it'll work with other only resources besides Wikipedia as well. You make the URL the reference point for the internal database lookup and you're done. It also has the benefit that your users will be able to easily distinguish between public and proprietary content on the page, because those two will be clearly separated. And, you can set this up within a few hours.
We started using Atlassian Confluence a year ago, and I am so pleased with this product. We are currently ~100 users contributing and using it on a standalone server internally with SSO for Windows domain. You should seriously consider this wiki, because it beats everything I have seen.
You may want to check the Semantic MediaWiki (semantic-mediawiki.org) or SMW+ (wiki.ontoprise.de).
Both are built on top of MediaWiki (which powers Wikipedia) so you can tap the very rich pools of extensions (numbering in the hundreds).
SMW+ is actually built on top of SMW, and it focuses on increasing usability and it preinstall pre-configured extensions out of the box to make it easier to deploy.
With SMW/SMW+, you can put in semantic annotations for an article describing just about anything you want to assert about the article. One assertion you can make is a Wikipedia link. It even has the smarts to know that the assertion/property is a URL and it will put in the necessary bits to make it clickable.
And that's just the tip of the iceberg, you can do some other amaaaazzziiing stuff with the semantic smarts of SMW/SMW+. To get a better sense of what SMW is, you may want to check out - http://simia.net/download/SemTech2009.ppt
Full disclosure - I help out with the SMW project :)
unless i am completely misunderstanding you, this seems like a pretty easy hack on any wiki engine. just query the page's title at other wikis and append the content to the bottom. for example: you have a page called Server Farm -- detailing your companies server farm. whenever that page is loaded in a browser, the dynamic content generator in the website downloads the page with the same name from wikipedia, strips out their formatting, and sticks it at the bottom of your page. your users can only edit your local content. this should probably take only a couple hours to fully implement and test.
if you wanted, you could also parse the "see also" section, and fetch those links and add them. or if you want to be a little more clever, you can allow people to embed keywords for relevant wikipedia articles (or other URLs) in your internal wiki's articles and then fetch those, too. in the previous example, say your server farm consisted entirely of apple xserves. you could fetch that wikipedia article and maybe the spec sheet from apple, etc.
yawn.