Linux Office Suites
Cowculator writes: "Sun Microsystems will release the beta version of StarOffice 6.0 in October, with the development version already available. This ZDNet article has some more details, including a link to the development version..." Other submitters sent in notes about Gobe Productive and Hancom Office 2.0, not to mention KOffice and the Gnome office applications. As far as I know all of these are lacking the single most important thing, a robust and complete set of import filters for Word, Wordperfect, Excel, Powerpoint, etc.
Just as important, the lack of EXPORT filters! If you're going to send a document to other people, they need to read it too.
I think StarOffice got off to a wonderful start. I'm very concerned about their progress. The next major version will really be a turning point in the industry one way or the other. If it's solid, and it rocks, with great compatability, then there is a great alternative to office. If it's buggy, or doesn't work well with office formats (especially Excel, where it's the weakest), then MS will win. And I'm going off to live on a deserted south pacific island.
Sigh... If I had to bet, it's depressing where I'd probably put my money... Sun's dropped the ball a few times lately.
Tip to the folks working on it: cool object oriented design is neat, but it's usability, stability, and compatability that will make StarOffice a success. Don't try to do things beyond MS Office, just match it on all fronts! Anything else is an esoteric waste of time.
-me-
Love many, trust a few, do harm to none.
The lack of import filters is regrettable, but hardly surprising - as soon as they do work properly, Microsoft will make bloody sure to change the format again.
.html or .rtf anyway - even if you rename the extension to .doc. So the poor lusers don't even know it's not MS word format...
.rtf to .doc , and "pretty graph" style applications need something more powerful.
:-)
The other factor is that even if the word/excel/powerpoint import is working, people act all surprised if their embedded Viso drawings/ autofcad dxfs etc don't work. It's pretty silly to expect them to work too, unless you've got some magical linux version of autocad (come home to unix autocad!) or visio installed. KDE's KParts framework is as capable as OLE on windows (although I wish they hadn't dropped CORBA), but it can't embed applications that don't exist.
Export filters are pretty irrelevant for the majority of word or excel documents -
MS Word will silently load files saved as
Excel loads CSV fine, even CSV with embedded formulae in standard enough infix notation. Once again, this covers a large number of cases, although it's not as transparent as just renaming a
Powerpoint is more problematic - although I've noticed that the flashier and more advanced the powerpoint presentation, the less likely it is that it's saying anything useful.
It's obviously pretty essential for users to be able to transfer between the office suites in question and MS Office, if the others are to gain any kind of mainstream acceptance. However, most MS Office users don't actually use something like 90% of the functionality. It's the other 10% that's important.
Further, the only really important Microsoft Office applications are Word, Excel and Access. There isn't the same volume of existing data that must be readily accessible for the other applications.
Now, suppose you could get a solid intermediate format covering those basics (something XML-based, perhaps) adopted as some sort of standard by the free software/open source guys, and have all these office suites using it. It then just needs someone to write a single filter for, say, MS Word docs, to convert to and from the intermediate format, and then all the other Office suites can do it.
I can't believe no-one's thought of or attempted this before, but I don't know of any actual examples. Does anyone else? It must be technically possible; at least, if it's not, you haven't got a hope of converting to the format used by any individual free/open source office suite either.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Word processors dating back to the DOS days can read Rich Text Format. If you're sending it to a windows newbie who panicks when it doesn't say ".doc", tell him to open it anyway -- word will understand it.
They've never seen any other mail clients, and don't understand why people outside the company can't read their HTML mail with embedded OLE objects and attached vCard files. I play games with them... they send me Rich Text email, I change it to plain text and send it back. Their client is set to send Rich Text by default, so it gets changed back. Then if I reply again, I change it back to plain text. They must wonder what the hell is going on.
Many people could get by just fine with an "alternative" office suite, if they didn't have to exchange files with the computer illiterate.
My biggest concern (having implemented Star Schedule server for 30 people so far in a 50-employee company) is that no regard at all has been given to the groupware functionality in OpenOffice. I have very few gripes with Star Schedule, but will need to explain why the newest verions of Star Office cannot be used with the Schedule Server.
If someone were to start a project to make a newer better groupware tool for open office (or some other open-source cross-platform tool), I would find a way to contribute (as I think quite a few others would).
Unfortunately it seems as if ogsproject has died.
Maybe if someone took action and said "All groupware discussions will take place on groupware@openoffice.org" or similar, then at least it wouldn't appear on discuss.
Does Sun not care that there are customers of their software who will be left stranded with data in an obsolete server and egg on their face. I hope not.
Don't even try to match it on all fronts, IMHO. As much as MS would have it otherwise, most Office users are only using a very small subset of the functionality available.
If you can support bulletproof import/export of simple Word documents, with basic things like the formatting, cross-references, tables and so on working reliably, you've got 99% of the portability problems solved. The big issue is the number of documents that already exist in Word format, which people will continue to need to read/edit in whatever new format they're stored. Most of those documents don't use super-advanced VBA scripts, half a million text boxes and WordArt.
Now, if you can go one better, and fix the terminally annoying bugs in Word -- cross-references not updating properly and woefully broken bullets and numbering spring to mind -- then you've also got a technically superior product that solves real problems that MS Word doesn't. Add in the silly omissions -- genuine three-part headers and footers, as used by many, many business documents, for example -- and you're clearly winning.
Of course, similar arguments apply to other Office applications, particularly Excel and Access. I'm simply highlighting Word because the issues are likely to be more widely understood.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I must say.. I recently switched to using StarOffice even in windows, just for consistency.
Everyone says 'it's not the same as office'. no. It's not. And it doesnt' have every last feature, but it has it's own unique features, and is a deadly office suite nontheless.
The only real hurdles I've come across so far, that prevent me from converting the entire office, are a) embedded VB (important in some sheets... very important) and b) I can't figure out how to open Password-protected Excel sheets.
my K6 @ 266mhz with 64mb of RAM.
So you're the cheap non-upgrading bastard that's to blame for the slowdown in the tech industry...
:)
Gobe actually has great import/export filters, but they're even better: They actually developed an API that anyone can write to, so if they port the API and the filters over to linux (which they are apparently doing), then any application can choose to just write to that API and will immediately be able to save or write in any of the M$ formats that Gobe supports.
BTW, this functionality is based on how BeOS does translation for other formats, too, mainly graphics. Linux could really use to take a lesson from this, because it was one of the coolest and best functionalities of BeOS. Hopefully Gobe will port the full API over, not just the filters themselves.
The problem with open-source bloatware seems to be even more severe than with closed-source bloatware. Look at how slow Mozilla is compared to IE. (Okay, YMMV, let's not start a flamewar -- that's just what I found recently when I compared performances on my system.) And complexity is the enemy of open-source projects -- it raises the barrier to entry for people who want to contribute.
I also don't know what you do about lusers who send one-page text e-mails as Word attachments. Even if a certain version of Star Office can read Word 98, it'll be broken when Word 2004 comes out. Are the same lusers really going to be clueful enough to realize they need to convert back to Word 98 before they send it?
Probably a better solution is to convince everyone who currently e-mails Word attachments to start e-mailing PDF attachments. It could still be used inappropriately, but at least everyone could read it with open-source software.
Find free books.
According to this article, the integrated desktop and probably the start button will be gone in version 6.0.
quote
OpenOffice, and its predecessor StarOffice, are integrated office packages and include a word processor, web browser, and spreadsheet tools. In fact, StarOffice 5.2 contained just about everything a desktop user could need, including an integrated desktop. But with the adoption of desktop environments such as GNOME and KDE, future releases of StarOffice and OpenOffice will no longer carry the integrated desktop.
end quote
The above quote is from the following source:
LWN.net
I for one would like to put aside the KDE & GNOME bias that pushes many to adopt this word processor or that.
Our fundamental problem to be solved is a lack of UNIVERSAL and fully functional MS-Office import *and* export filters. At this point, I would say it's the biggest problem Linux users must struggle with (emphasis on "users" here... the administrators must still struggle with Linux's crappy font management, etc).
RTF, HTML, and the "other" semi-formatted languages don't support popular features very well, such as tables and frames. Would YOU export your resume from a Linux app as HTML or RTF, and leave it to Office to render correctly? HR people are the most "clingy to Office types", and if your resume looks shitty - it's YOUR fault not theirs (world is not fair).
If your RTF resume looks bad in Office, *obviously* you are not a good candidate. You show little attention to detail to allow your resume to overlap characters and corrupt text. I've seen Office mangle some RTF docs that look PERFECT elsewhere -- it's an anti-competitive feature of MS Office. RTF documents from Office, re-import perfectly.
SO... to get to my point, we need good filters. The KDE Office and AbiWord folks should get together on the OpenOffice mailing list, and work to make sure the OpenOffice filters are exactly what they need. There's NO EXCUSE for not standardizing our I/O filters now.
As a great example of co-operation between KDE and GNOME applications, look at gPhoto. This started as a Gnome digital camera app, but the code became something better... a standard Linux API for cameras. Now there's a ton of KDE and Gnome apps, all of which run on top of gPhoto.
Just because KDE and GNOME use fundamentally incompatible desktop libraries, does not forgive these folks for not working together on EXTENSIONS to the desktop. We need more success stories like gPhoto, in areas like Printing, Font Management, pretty Wizards for Samba, etc.
I think about the lack of such examples in Linux, and the thought depresses me...
_Scott
When is Slashdot moderation going to favor less frequent "signal" posts, over "dozen posts a day" noise accounts?
Filters (importing/exporting) are only a temporary solution. If only maybe for the sole reason that MS will break it's format to make them useless again.
But for the reason that we need to create an open file format. I know there is some work going on with the OpenOffice project to do so, but it needs to have the support of all office suites and applications.
I greatly applaud, and welcome, the Gobe production suite, but all we need are more proprietary file formats.
If we could get everyone...WordPerfect, StarOffice, Hancom, Gobe, KOffice, Gnome, etc. to band together to create an "Open File Foundation" to create a standard to which each could build file formats that could be shared accross platforms and applications suites the magnatude of such a collaboration would be huge!
It would be(or is) my dream that one day an office could theoretically have each of it's employee's using their office suite of choice and be able to seemlessly share documents amoung co-workers and others outside the office!
This could be the very straw that broke the camels back, so to say. If MS did not want to comply with the Open Standards it faces incompatability with the rest of the world. In this day and age of the internet, p2p infrastructure, and the like, it's a common compatibilty, not only across platforms but across applications, that is going to be needed. People will eventually see this and if MS dosn't want to play...so be it, alternatives abound!
This not only goes for File formats but also to other formats such as audio, video, and other streaming media. Ogg is a nice place to start and I pray it takes hold.
The days of closed formats and single platform narrow mindedness are coming to and end!
So for the time being...and unfortunatly a requirement to even get into the door to create such standards...yes we do need decent filters! but only for a temporary solution!
--- Brad (http://www.LinuxReview.net)
Someone pleeeaasse setup a site dedicated to writing really _good_ MS Word 97+ serialization routines in ANSI c. I would but I'm alread sidetracked on a tangent of a subproject and the stack is just too high right now. This is not hard folks. I know it sounds like a boring project but it's not!
Are you familar with the principle of Recursive Composition (a.k.a The Composite Pattern)? This is without a doubt my favorate programming construct. The key here is that you define an object that can be a child as well as potentially contain children itself. If you can uniformly parameterize the properties common to a set of these objects you can use the priciple of Recursive Composition to build a tree of these objects and then serialize it back using preorder depth first search tree traversal.
For example, a binary networking protocol might have a header, some parameters, and a data payload area. The header has an arbitrary block of security information, which in turn might have a DES encrypted key and an integer describing the length of the payload. So to encode this message using Recursive Composition, define a packet_t type that has the three sub components such as the arbitrary security block, which in turn has an encrypted DES block as a child component. See the tree? Now, if you can parameterize the temporal properties of these objects you can delegate the responsibilty of encoding certain areas of the network message to functions like: enc_security_block(struct security_block *sb, char *dst, size_t off, size_t len) would then call enc_des_key(struct des_key *dk, char *dst, size_t off, si ....
The classic example of Recusive Composition is that of GUI components. You have an abstract object called say Component. Components can contain other components. Sub types would be ButtonComponent, TextComponent, TableComponent, etc. These components might contain subcomponents as well (e.g. ButtonComponent might have a TextComponent for it's label). See the tree again? Now, when it comes time to draw these components you don't have one big block of speggetti code that considers all of the different component types but rather delegate that responsibility to method of the component itself. This greatly reduces the complexity of the problem (actually making it feasable whereas it was not before). Again, we just have to parameterize *where* these components are to draw themselves such as FrameComponent_draw(Window *win, int x, int y ...etxc.
So what does this have to do with writing serialization and deserialization routines for Word documents? Microsoft Words format (and the format of just about every other sophisticated document format out there) is flattend by serializing an internal tree of nodes (like the GUI Components and more so the network packet encoding described above). The tree of nodes is no different from the trees used above to describe Recursive Composition. So by recusively delegating the resonsibilty of encoding/decoding a region of a MS Word document you can parse it into a tree and then do preorder dfs tree traversal to serialize it into any format including .doc.
The hardest problem here by far is determining what the primative types of the document are (e.g. like the security_block and the payload length integer in the network packet). If you don't know what the leaves of the tree look like you cannot start to write a lexer. Find out everything you can about the format of each of Word's elements. There are several projects that claim to have decoded the format to a certain degree. These would be a great start. However I have spoken to these guys and the problem is they are only interested in supporting their own product (Abiword and the KOffice guys talked about a calaborative effort but got hung up on choosing libraries and language and other trite crap). An group independant from these organizations should be established so that the library is not tied to one product.
Once you have a good idea of the bits and bytes behind the layout of nodes in the format you can write a (at first crude) lexer or Lexical analyser. This is simply a peice of c that will break the format into tokens. It's simple in the respect that it doesn't have to worry about the logical layout of elements at all. It's only concerned with nibbling off the primative elements (tokens) themselves. The interface might be as simple as init(char *filename), gettoken(struct lexer *lex).
Now you have to write a parser. This is what bison/yacc is for. This is non trivial but theres a great book called _lex & yacc_ by John R. Levine that can describe how to write a yacc grammer in 200 lines that in convential c would take several thousand lines, take twice as long, and still not work. Ahh yacc grammers to me are like dougnuts to Homer Simpson.
Once you have a working lexer and parser (probably a 1000 lines of code), you can start to build a tree. You need a tree structure. The W3C has written a specification for representing documents as a tree of nodes in memory called the Document Object Model (DOM). Mozilla uses the DOM. It's XML and HTML centric but it's really totally arbitrary. A DOM tree could easily be constructed by adding createNode, appendChild, etc calls to the yacc parser. It just so happends that I have written a DOM implementation in ANSI c. Its called DOMC and it would be perfect for this task.
If you do this much you are sitting pretty. You can just traverse the tree and spit out whatever the analigous elements are for say ps, html, sgml, xml etc.
What I mean is, if format was so important, Microsoft word would have never caught on, because its wordperfect->word filters were terrible. Even its word 5-> word 95 -> word 97 -> word on mac filters were terrible. Everytime I would look at a document in a new version, things would move around. Same goes for Lotus/Quattro->Excel. They even changed fundamental syntax for the spreadsheet! (in quattro, functions begin with an @ sign, whereas in excel, an =; a number of the function names are different as well, I believe.)
My point is that compatability isn't everything. Platform can be even more important. One of the major reason's MS Office is a 'standard' is because Microsoft moved the industry to Windows with 3.1, and the industry leaders (WP, Lotus, etc.) on the dos-based platform understood only too late that slow adaption to Windows meant their death.
So, StarOffice might stand a chance, even if they are not 100% compatable, because other considerations can be more powerful. For instance, with Microsoft pushing increasingly restrictive licensing, and the emminent maturing of many linux desktop and business apps, this may give enough of a toehold for real market penetration. By the same logic, even if the conversion filters are flawless, they might not capture the attention of the business world, many of whom won't likely even consider Star Office as an alternative.
Two things prevent most people and businesses from moving from Microsoft products to other products:
.DOC or .XLS file to a business partner's secretary, and s/he will be able to load it effortlessly, and it will look the same.
1. Application Lock-In, and
2. FUD
1. Application Lock-In
Everybody and their brother, nearly every business, and all of their strategic partners, not to mention schools and government - all of them use to some extent Microsoft tools, day in, and day out. People have had YEARS to learn the nuances and problems, how to get around them, and what the applications can do. All of them know that they can email a
A Linux Office suite? How are these people to be certain that it will work - plus how are they to cope with the differences that are sure to be in place between the Linux Office Suite and the MS Office Suite? How do they know they will be able to send this exported XLS file to their friend, and it will open in MS Excel properly?
2. FUD
Which leads us to the second issue, that of FUD - if they don't know, they will be full of fear, uncertainty, and doubt as to whether to use the office suite for Linux, because these files they are trading down the hall or across the city may represent a potential deal - if the presentation software doesn't go a smoothly as Microsoft's, it may mean loss of money - maybe a job! If the XLS or DOC file is mangled (either by the Linux Office Suite, or by the MS Office Suite reading the Linux version), time and money will be lost trying to figure out what happened, or at least getting it loaded and converted using "standard" MS Office.
These are the two problems a Linux Office Suite has to overcome (actually, two problems any MS Office competitor has to overcome). Because MS has such a huge lock-in, and the FUD is raging - companies won't switch - because their partners aren't switching (and their partner's partners aren't, etc).
It is a tough situation, and will be hard to overcome. Education to override the FUD will help, but even if you had perfect compatibility, all MS would have to do is introduce a "new" format that Office would default to, and you will end up holding up and vindicating the FUD. People will then be doubly uncertain to try the Linux stuff, even though it would be MS who broke the compatibility! I don't know what the answer to this is, but if Linux is ever to really gain on the desktop, those two issues will have to be addressed...
Reason is the Path to God - Anon
As far as I know all of these are lacking the single most important thing, a robust and complete set of import filters for Word, Wordperfect, Excel, Powerpoint, etc.
There's a real good reason we haven't seen this yet. It's the chicken in the egg problem. Before you can have fully capable import filters, you must first impliment the feature set of the app you're inputing from. For example, Microsoft Word has a bunch of features that do not yet appear in most other "word processors". If your word processor doesn't impliment these features, a filter that does is quite useless for your application (except in regards of ignoring things your application doesn't understand).
Unfortunately, before those features are implimented in your own application, you're going to need some more acceptance (to bring more developers on the project). Unless you can say you do what the mainstream needs/wants, you're still an obscur project. *sigh*
Off this topic, one other thing that kind of bothers me is the massive ammount of reinventing the wheel. Now while having many options is good, there are just far too many open source projects that are each trying to create their own robust, fully-featured office suite. Why is the community wasting so much time?
Some of these really should merge and share code more. Or at least, there should be one organization that is dedicated to creating a unified set of the features found in all open source office suite projects. That way, they could create a big set of libraries that do these things... so when the next guy has this reckless desire to make his own office suite... well, you get the idea.
Why bother.