Online Document Search Reveals Secrets
An anonymous reader writes "New Scientist is reporting that many documents published online may unintentionally reveal sensitive corporate or personal information, according to a US computer researcher. Simon Byers, at AT&T's research laboratory in the US, was able to unearth hidden information from many thousands of Microsoft Word documents posted online using a few freely available software tools and some basic programming techniques." Update: 08/16 19:06 GMT by H : The story is originally from Crypto-gram, not New Scientist.
funny how the lastest cryptogram treats of exactly the same subject, i just received it an hour ago
http://www.schneier.com/crypto-gram.html
Just go into the document properties section. This is why I publish everything to Adobe Acrobat before posting online.
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
From the article:
I just created a Word document, blah.doc and put some text into it. I made sure I had a couple of undo points. I closed it and opened it back up, I couldn't undo SHIT. So where the hell am I being granted this mysterious "convenience?"
I know that the guy stressed the fact that Micrsoft isn't alone in this disctinction, but this is just another example of why Microsoft SUCKS.
I put the doc in a samba share and viewed it with vi. I found the path to the doc, the original name, my userid on my laptop, and the company name. All were hidden from the simple searches like this:
s.l.a.s.h.d.o.t...o.r.g
WTF?!?
Oh, WAIT a minute! This is also from the article:
WHEW! I feel so much better. Please disregard the first six paragraphs. Thanks.
Mom says my
This will become a common way for 'big' corps to spy on 'small' corps (and individual users?), to find new ways to both screw them over, and appear 'omniscient'. They'll never (or rarely) get called on it. Meanwhile, anyone who tries to reveal information discovered in this way which is incriminating towards said big corps will get sued for being "hackers" and/or "terrorists".
Honey, I shrunk the Cygwin
It doesn't matter how good your corporate security is if you don't train your users (including managers) in basic security practices.
Lots of people put sensitive documents in public webspace, primarily because they don't know any better. Eventually the cost-benefit analysis will be done, and corporations will pay to have their users trained. Until then, this type of thing will continue to happen.
--
Use Vobbo for Video Blogs
Idiotic use of P2P software has lead to stuff like this more than once. People often overlook the document tab on Kazaa.
Well, it is amongst people who object to being mailed Word documents, anyway. They're just a really bad format for publishing information in.
See Richard Stallman's 'no-word-attachments' article, for example...
How many word processing progreams do place hidden meta data within theri formats?
For example does OpenOffice/StarOffice and other open source programs have the saem security problem?
Don't Tread on OpenSource
Simon Byers, at AT&T's research laboratory in the US, was able to unearth hidden information from many thousands of Microsoft Word documents posted online using a few freely available software tools and some basic programming techniques.
Are you going to share that info or what?
Throw it up on freenet man!
http://www.schneier.com/crypto-gram.html
To do this yourself, just type:
<a href="http://foo/">bar</a>
You can't judge a book by the way it wears its hair.
An accomplished searcher can learn much about the world we live in, as slashdot reported some time ago.
An interesting reminder, to be sure, given yesterday's blackout.
Makes a guy wonder just how much is still available regarding key electrical and telephone infrastructure. Emergency power capabilities of broadcasters (radio, television, mobile phone). Gas lines, in the parts of the country that have them. Water systems. There's likely a bunch of data out there, ready to be mined.
Everyone should just be forced to use LATeX and then there won't be any hidden information. . .
How long until someone blames Microsoft, I wonder...
Sure, but they point they're making is that it's not intuitively obvious to most people that there could be text in a Word document other than what appears.
So a relatively security-conscious person who just doesn't know anything about Word file formats could easily publish something online on purpose without knowing that there is (invisible) sensitive information in it, even if they'd never put that information in a public place on purpose.
[TMB]
Some needs to script a open source program that searches for this information on the web. Could you imagine knowing all the financial information about microsoft. We could use this information against companies we dont like and the open source revolation will spawn to mass amounts.
A sysadmin once sent me a form letter type thing with my new password in it. The username/password was a spreadsheet object and I was able to open it to see everyone's passwords. He changed them all when I pointed this out. BTW, why do people send email messages that just say "see attached file" and the attached file is a memo with some trival content that could have been the text of the email??
Anyway, I have to admit that I was also burned by word. I was in the habit of opening the last memo I wrote from the recent documents list and using it as the starting point for newer ones. At some point, I put a bunch of policy statements on a CD and was later told that everyone was reading the hidden text. Doh!
This was back in the days of office 97 I believe. I'm not sure if Office 2k or XP still have this feature/bug.
Remind me not to save my importand documents to C:\My Documents\Porn\Annual Budget Report.doc anymore.
See Google. It can read word/pdf etc. Sure there is a mountain of information there if you look
Rus
Cheap UK and US VPS
I have received two such word documents from two seperate job recruiters. The actual companies looking for the employee were hidden in the document, as well as contact information for the person at the company. Screw the middle man
is probably whats doing this. GOD it sucks. Theyve manged to make it more confusing and less useable in XP than ever before. You ever tried to tell a user what to click on in a toolbar? WTF happened to putting the goddamn command in the toolbar???
All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.
No really, what IS the big deal? So supposedly, he did an online search, and did some text-extraction from Word docs, which Google helpfully does for you anyway, and came up with some "secrets" which were published online anyway, thus contradicting the term itself. Google also indexes PDF, DOC, PPT and many other formats anyway.
Moreover, if the information was indexed, it was either put online intentionally (either because it wasn't secret data, or out of malicious intent), or unintentionally. The latter case was probably because of poor sysadmining/webmastering, which isn't a big secret anyway.
Sorry for the sorry rant, but it's yet another friday evening with nothing to do.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Remember kids: strings is your friend. If you happen to get a job offer in the form of a Word document and the HR drone who sent it to you wasn't careful, you can often see the version that got sent to other candidates and, more importantly, how much money they were offered. It can do wonders for your bargaining position.
Oh, so I'm not supposed to save all my important passwords in plaintext in a clearly marked "passwords.txt" file in my webserver for easy access?
Oh damn.
--- I w00t, therefore I'm l33t.
It seems that goatse is down.
Can anyone check to see if it's still working?
thx!
Anyone wants to guess what's the second most dangerous animal for human beings?
Other human beings?
455fe10422ca29c4933f95052b792ab2
It doesn't pertain to just documents. I've seen code samples posted to sites like experts-exchange where DB connection strings still had UID and PW data in them. Seems people don't re-read before they post very often.
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
It looks like you're trying to post a document on the web.
Would you like to...
1. Divulge corporate secrets?
2. List your passwords?
3. Remove KB823980 and open port 135?
It looks like your trying to close Clippy.
Would you like to...
1. Shit in your hat?
2. Put fist through bling bling flat panel?
3. Go home for teh weekend?
It's probably something like a jellyfish. But I'm going to guess that it's those vile bugs that live inside the beards of Linux Communist Hippies.
google indexed PDF documents, it even turns them into HTML
of course you could always try http://searchpdf.adobe.com/
Now there's a way to search through more than a million summaries of Adobe(R) Portable Document Format (PDF) files on the Web. Your search results will allow you to see the summaries before deciding to view the original Adobe PDF.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
how did he get paid so much to his research? .doc
i'm a doofus and came up with this search
"index of" secret
google it baby
.cig
View some of the past word docs you've received in a hex editor...
Near the bottom there is often information from other documents of the sender that they were recently working on. I don't know why it saves this. Maybe something to do with the undo buffer?
At work I used to look at internal memos that would be sent out on a weekly basis and find out all sorts of other stuff that was going on.
You mean word documents/[INSERT alternative MS product here] may contain crap you didn't think they did?! NO SH*T...
This has been a problem and reported for years now...
It's called Save As...
Oh? You say end users can't be trusted to understand technology and or can't be trusted to dispose of or not reveal sensitive information? Another...DUH!
NOT MY PERSONAL INFO! NOOOOOOOOO!
This isn't just nothing new, it's old news. Wasn't this how they caught the guy who wrote the melissa virus? When that little popup window from MS Office came up asking for their personal info, did they just think Office was trying to get to know them better, in order to be their friend?
It's just silly pressmongering. Those dumbasses have to come up with a terrifying computer factoid every day, or the ignorant compu-phobes they prey on might come to their senses.
Just my opinion.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
The fact that MS "productivity" products store user information in the files they produce hit the news very shortly after MS Office '95 came out.
It seems no one much cared back then because MS has obviously left this serious security flaw in their software.
Imagine that?
It's only going to get worse; google's really expanded on the number of File types it indexes and caches.
One of my clients was recently caught out when google indexed private metadata she didn't know was still there, so I can well understand the gravity of this situation.
455fe10422ca29c4933f95052b792ab2
Is that all it takes to hack into Microsoft's file servers anymore?
If your theory is different from practice, then your theory is wrong.
it's called http://www.google.com and you search by "top secret documents filetype:doc".
I mean, c'mon. Anyone with half a brain can open a M$ Word document in a plain text editor and without and work whatsoever find out what the SMB name of that computer is, on what drive the OS is installed, what printer and what the name of that printer is and so on and so forth.
If this is newsworthy, so is telling "If you press F3 in Explorer, or if you start DirectPlay, your computer will try to connect to Microsoft servers to do stuff behind your back".
- mpg
- mov
- mp3
- secret - doesn't have to be file extensions...
- "My Documents" - yeah, that's secure...
- etc
Anyway, as you can see, it's pretty effective. Sometimes admins wise up, and all you have is the Google cache. But sometimes they don't, and you get to look. Thanks Google!A programmer is a machine for converting coffee into code.
how many incidents will it take before people realize that ALL Microsoft products are insecure?
What will it take? What happens when a script kiddie hacks a hospital and shuts down the life support systems in ICU? Or just juggles the meds for the patients so that everyone in the hospital gets the wrong meds?
Or perhaps they glitch the Air Traffic Control system and airplanes rain down from the sky and tens or hundreds of thousands of people die??
Before the last war in Iraq started they showed the "state of the art" US command center just across the border in a big tent.
Tens of dozens or more, soldiers and dozens upon dozens of PC's. You could clearly see on the displays that they were *ALL* running Windows.
I though, "Oh shit, the security of this country is being placed in the trust of the worst product ever..."
Those PC's I saw were NOT Tempest, for one, and then add the Windows factor in plus the state of war and you're asking for serious trouble.
Windows will at some point cause a massive catastrophe and cause great loss of life and property. You can bet on it.
This country is far too dependent upon computers to operate. When the computer goes down, well, sit on your hands for awhile...
I remember the days before computers, everyone got things done just fine. Now no one knows how to function without them..
The next edition of Office 2003 will include tools that will allow users to remove personal information from a document. It will also include new "information rights management" that will let an author specify who can read or forward a document."
So microsoft are going to add tools to remove what shouldn't be there in the first place? Can't they just fix their software to not include it in a first place! What is next? INTERNET EXPLORER 2008 will have a new feature that allows the user to stop virus's being automatically executed. Order now at $399 to protect your computer from our shoddy software!
---- There are 10 types of people in the world. Those that understand binary and those that don't
Gates and co will take care of all your sensitive info, very soon. With the help of the DMCA Sen. Fritz and MS servers we all will be so secure that no one other than MS and the right Government agencies will be able to unlock your lock online .docs. So smarten up bow to Redmond and pay up suckers! Its upgrade or lose mania time again can your business not afford the wonderfull new security thats coming? Good luck getting your secretaries to use anything other than MS orafice!
OH THE SHAME I fell off the wagon and use sigs again!
At the courthouse, all people who use Google and who got caught were also standing in a queue to await indictment proceedings before a federal grand jury...
Microsoft blames the slowup on commodity protocols and recommends MS-Jailer 2.0 (just released) to speed up the whole process "with only one degree of separation".
Scott McNealy's face turned red, and he proclaimed, "Look. Those people are standing out there in the heat, which is about all you should expect with the power efficiency levels of throwback 32-bit CISC technology throwing the book at them and with Microsoft's jailing software countercompetitively tied to the kernel, which is in C++, not C, not Java, not standard, not open like..."
How many people actually protect their website
/stat/ or /stats/ or a variation
statistics?
Adding a simple
with a combination of "web" or the name of any of
the common statistic generation programs gets you
access to the statistics of a *lot* of websites.
Then from the stats you could find any "hidden"
data which is not linked on the site including
internal company documents, girlfriend's nude
photos or mp3s.
Alternately you could just google for the
statistic reports of sites and get there
more easily.
This is another case of ill informed or lazy
users not following what should be a simple
security policy which could cause serious
repercussions.
For those who want to know how to protect
yourself, read this link.
But instead of explaining it all technical and telling people how they can strip private information, you should use Microsoft's own techniques of FUD against them by telling people that Microsoft Word files contain all their private information and that information is gathered into a database by a ring of 1337 h4x0rz around the world, who then use the information to steal your credit card numbers.
People are so stupid that they will actually believe that.
Kiss my ass. How is this flamebait?
Mom says my
Tony Blair got busted in the WMD case because of the names of the people who revised the WMD Documents were still in the Word file. Now, it seems, that the Downing Street only puts PDF files on the web - and has removed all the MS word documents that were already there ....
Tools reveal secret life of documents - Documents like in Word save too much Info - Blair Episode
By Mark Ward
July 03, 2003
The UK Government was just the latest in a long line of organisations that has learned to its cost just how much information can be gleaned from innocent looking files. Earlier this year it issued a document called the 'dodgy dossier" about Iraq's concealment of weapons of mass destruction that was written using Microsoft Word. Every Word document remembers who made the last few revisions to it. The log reveals the names of four of the people who prepared the Iraq document for publication and the government Communications Information Centre that some of them work for. It was this log that Number 10 press chief Alastair Campbell had to explain to the House of Commons Foreign Affairs Select Committee in late June as part of its investigation into the Iraq dossier's history. Some of this information can be seen simply by right-clicking to view the properties of the downloaded document in a file listing. Utility programs can get even more information from Word revision logs.
The life stories of the documents we create are becoming increasingly important as the scrutiny of industries and governments gathers pace. Every time you write or edit these files you leave a trail of information revealing what you did and when you did it. With the right tools it is possible to extract this data and work out the trail of authors and workers who created a document. That is why we should all use opensource and open data formats - so that we can humanly read what all we are "putting" into the document. The Word version of this document has now been removed from government websites but copies of it are still available elsewhere on the net.
Unabridged and unedited article at
http://news.bbc.co.uk/2/hi/technology/3037760.stm
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
Have to post a link to this famous example, the dodgy dossier. There was a writeup here. If you're thinking of making the case for war, don't release Word documents to the press - unless they're very very docile.
Cat Schwartz, of TechTV fame, discovered that cropped JPEG images may also contain uncropped thumbnail images (warning: PG-13 content). There's some debate whether the images in question came from Photoshop or from a thumbnail image stored by the digital camera, but it was a humbling oversight in either case.
Aside from the paranoia overtones, I still disagree. The tools for doing this are on the web. Right now. So in other words, a weapon has been released that is free and easy to use. If anything, this will help small, poor companies with no resources for industrial espionage get a little information out of people who don't know any better, including their large-company rivals. All they have to do is hire one of the celibate wonders that read slashdot, and they're in business.
-Looking for a job as a materials chemist or multivariat
By using tools that break the "encryption" on, for examply, the Washington Post .pdf file mentioned in the article, isn't the researcher violating the DMCA? Isn't his whole project bragging about doing this, a la 2600?
I hope he remembers a few packs of cigarettes in order to buy himself a few nights of sleep in the Big House.
Solution: stop installing word on your corporate network. Stop allowing users to install software. No more problem.
Objection, your Honor, this is pure speculation. The prosecution has not established sufficient grounds that this witness is qualified to predict what dominant and powerful techniques will be used in the future.
SUSTAINED!
Thank you, Saddam.
Or, if you're not Saddam himself, you are at least his personal cock sucker.
Actually, the truth is that the mosquito kills more humans each year than any other animal. Go look it up.
This isn't really new -- check out this story I wrote for CNet/ZDNet over a year ago.
This type of thing happens all the time but just with digital media but with other media also. People go through others garbage recreating shredded documents, camcorders catching people in the act, carbon paper, copying machines. You always need to be careful when dealing yours and others information.
Gnu For President 2004
... then suffer foot wounds.
At the risk of being moderated Troll and Redundant,
Why are these people posting Word Documents online?
The Word Wide Web is not the Microsoft Wide Web.
Post in plain ASCII text, or HTML if you feel the need to pretty it up.
People keep using tools that are far more powerful and complex than they need, then they screw up, and blame the tools. Pick a simple tool to do a simple job, and you don't need to worry about your ignorance of the tools you are using causing you problems.
I hate it when I make a joke and I get modded "+5 insightful". Mod the stupid comments "funny", not "insightful", pleas
Is security the responsibility of the software of the users? Should we point the finger at that horribly insecure software that shouldn't allow this sort of thing to happen or the ignorant users who put the sensitive data in the document? Both?
No, it's an insect. We discount the people from the list.
This has happened to the UK government several times. The latter link shows whose sticky fingers were on the infamous "dodgy dossier".
Gareth
Mosquitos just transfer viruses and existing diseases known to the humans. Mosquito byte by itself is harmless for humans, unless there's also a virus injected into the human blood. You're right in terms of numbers, but let's say we leave the mosquitos out of this list.
As Saddam's personal advisor, I'd advise you to crawl back into Dubya's stinking ass.
Ok, time is up.
The most dangerous animals (counted by the number of humans, killed globally within the recent years):
1: Snakes.
2: Wasps and bees.
3: Alligators/crocodiles.
See the post above explaining why mosquitos is not exactly correct.
let's say we leave the mosquitos out of this list
Okay. It's just that they are so small and easy to pick on.
So who is going to be the first to claim that running a Word document through strings violates the DMCA?
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
In some cases all you need to dig up the sensisitive information is to "cat word.doc". Assuming you have some *n*x utilities lying around. *n*x should be outlawed.
Well, interesting that you should hit BBC where it hurts, an in some cases maybe they deserve it. But, I think this "story" that you talk of is not one of those. Look at my take of the story behind the story behind the story .... The Blair-Bush team is more devious than you might want to believe ...
ALL Quotations below are from the MSNBC article ... my comments are in the [TT] [/TT] format ....
At the end of the day, officials say, the Lakhani case remains a story about potential threats and not the real-life terrorists they had once hoped to nail. For all the hoopla over the case, the official confirmed, it was essentially a government-arranged sting that never involved any contact with actual terrorists. If there was no contact with a "real" terrorist, how were they supposed to trap these "real" terrorists ??? But, wait, it gets even better .. the terrorists in the case were essentially all actors -undercover informants playing associates of terrorists for the purposes of making a case against Lakhani, an international arms dealer who, according to Christie, has a history of alleged criminal activity. So, now a person who tends towards a certain kind of behavior is now enticed ... Does this bring to mind the word "entrapment?" They noted that Lakhani, in taped conversations with undercover operatives, expressed his hostility for the United States, his sympathy for bin Laden and his willingness to work with terrorists to supply them with the weapons they needed to shoot down airliners. Is this hostility really surprising when it is known that majority of the world population thought Mr. Bush's actions have been consistently unjust and immoral? We didn't want this to get out before we could determine whether this guy would cooperate or not.? So, now you tempt him even more. Promise the fix to the junkie. Even if the tendency to additction is genetic.
And what better thing to do to tempt the suseptible by playing both sides - being the buyer, and being the supplier. It sort of reminds me of the "Truman Show." The FBI and Department of Homeland Security agents then arranged to involve the Russian FSB, that country's security service. When Lakhani flew to Moscow last month to actually inspect the missile he would be buying, he met with two FSB agents posing as missile suppliers. Looks like a scene from a C grade movie. The snookered Lakhani then arranged for the missile to be shipped from St. Petersburg and made a commitment to the two FSB undercover operates to buy 50 more such weapons as well as a multiton quantity of C-4 plastic explosives. Snookered Lakhani ... yes, snookered .. you know what snookered means ? It means Fooled or duped
When you wake up, ask yourself ... could you be living in times scarier than these ....
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
Microsoft is to blame for:
Bubonic Plague.
Baldness.
The Hair Club For Men president, getting run over.
Sonney and Cher breaking up.
Sonney smacking into that tree.
The Solid Gold Dancers sucking.
Michael Jackson turning white.
Global Warming.
The previous Ice Age.
WMD's in Iraq.
Dogs and Cats living together.
That birthmark on Gorbachev's forehead.
Plastic fake vomit.
Fake switch ads.
execrable.
fauxking ?pr? ?firm? 'information'/blame filtering(tm) is worse than useless. in fact, it's a crime against humanity. those who practice deception for a living, will take their place among the walking dead.
the lights are coming up now (no pun intended).
the stars were quite visible for folks who almost never see them, last night. don't forget to glance skyward tonight/often.
you can pretend all you want. our advise is to be as far away from the walking dead contingent as possible, when the big flash occurs. you wouldn't want to get any of that evile on you.
as to the free unlimited energy plan, as the lights come up, more&more folks will stop being misled into sucking up more&more of the infant killing barrolls of crudeness, & learn that it's more than ok to use newclear power generated by natural (hydro, solar, etc...) methods. of course more information about not wasting anything/behaving less frivolously is bound to show up, here&there.
cyphering how many babies it costs for a barroll of crudeness, we've decided to cut back, a lot, on wasteful things like giving monIE to felons, to help them destroy the planet/population.
no matter. the #1 task is planet/population rescue. the lights are coming up. we're in crisis mode. you can help.
the unlimited power (such as has never been seen before) is freely available to all, with the possible exception of the aforementioned walking dead.
consult with/trust in yOUR creator. more breathing. vote with yOUR wallet. seek others of non-aggressive intentions/behaviours. that's the spirit, moving you.
pay no heed/monIE to the greed/fear based walking dead.
each harmed innocent carries with it a bad toll. it will be repaid by you/us. the Godless felons will not be available to make reparations.
pay attention (to the weather, for example). that's definitely affordable, plus you might develop skills which could prevent you from being misled any further by phonIE ?pr? ?firm? generated misinformation.
good work so far. there's still much to be done. see you there. tell 'em robbIE.
Since when is .la Los Angeles? Wtf??
just kidding robbIE.
He says hidden information can "incredibly useful" in improving the functionality of the software. "But if some of that data is sensitive, there have to be ways of ensuring that it isn't distributed where it shouldn't be," he says.
Apparently they need to use some of the software he used to get a conjugation of the infinitive "to be" back into their text.
www.sitetronics.com/wordpress
In Soviet Russia, you make document with Word.
In Capitalist America, Word documents you!
--AC
Rights online? What? Where did THEY come from?
Pls No Negative Modding!
The Word version of this document has now been removed from government websites but copies of it are still available elsewhere on the net.
Here's a copy of the document. Should save anyone else the trouble of googling for it </karmawhore>.
Certainly, the best solution would be not to use proprietary formats.
But for those who don't want to change, is there a "Word sanitizer" tool available? Something that will convert one Word doc to another, minus the hidden text?
This is old news. Anyone who would post a DOC file on the web is a dunce anyway. My 2 cents. :)
Finding personal information in the document metadata is one thing, but finding the documents is another.
I still find user accounts on which if you do a manual "up to parent directory" and the user has no index.htm{,l} file, you often get a fully navigable listing of their entire html directory.
Sometimes you find personal files that were never directly linked to, nor intended to be.
To-do List: Receive telemarketing call during a tornado warning. Check.
Does anybody know of a program that can clean up deleted info in Word docs? I'm thinking of something like Ad-Aware that scans for certain files, shows you possible security issues (supposedly deleted text, metadata in document properties, etc.), and asks you what action it should take (wipe out/edit text, delete file, etc.).
It has been known for a long time that metadata are hidden within Microsoft Word documents. Microsoft even has Knowledge Base article 237361 explaining how to reduce the amount of metadata appearing in MS Word 2000 documents. Here's an excerpt:
This step-by-step article explains various methods that you can use to minimize the amount of metadata in your Word documents.
I'll bet there are more, but they won't disclose them.
It's a pity that more people don't just save as RTF. It's just as good for most uses, and it's a less obscure format.
You're not.
There are two ways of saving a word document:
Fast Save dumps the binary from memory into the file. Full Save compacts the binary image, and reorders it. This takes time.
Word's text stream is stored using a piece table. One of the benefits of a piece table is that if you keep the meta information about the text, you can get nearly infinite undo. The way it does this is by having an original data stream, and an appended data stream. Whenever you add data to the file, it gets added as a chunk to the end of the appended data stream. Whenever you delete, the meta table is updated to remove the text from the stream, but otherwise the text itself is left unaffected.
As a result, text is never removed from the document. A Fast Save (which is the default) under Word dumps the Piece Table as-is (there is probably some compaction over time to remove the no-longer-used data, but it probably only occurs above a given threshold of used to unused text). A full save deconstructs the piece table's meta information, and turns it back into one contiguous stream of data.
It's all just a function of the way the text is stored while it's being edited. Different editors have different mechanisms; some store data based on lines, and some store it using a gap buffer. But ultimately, the problem exists because Word uses a piece table, and it dumps the entire table to a file by default.
It's actually a sensible way of handling the text data. However, whoever designed the Fast Save algorithm probably didn't consider the ramifications of the text still being stored in the document. The best workaround? Wipe the unused sections of the piece table. But then you might as well return to using a Full Save, as you'll be ditching the performance benefits anyway.
Simon
Coming soon - pyrogyra
McBride's still working on his angles here with his team of lawyers, but you can be assured mr. bill bigpockets will pay for this.
Something to do with proprietary, copywrighted Xenix code comments being published in Word documents on a Microsoft.com server...
Back in the days of Word 5.1a (the last good version), I recall hidden data only getting saved if you used Word's "Fast Save" feature. Since Fast Save wasn't measurably faster, I turned it off. Is this no longer the case? (A quick look through the preferences panel in my copy of Word reveals a Fast Save option; it's turned off.)
Schwab
Editor, A1-AAA AmeriCaptions
The absolute best way to avoid this happening.
Copy your final text from your working draft into a brand new document. Yep good ol' copy and paste. You will only copy the selected text. All the auto-save data and edit history will not be copied into the new document. If your document has charts/graphs/placed images, etc. You will need to do a select all to be sure you got it.
If you always do this for final drafts you won't ever have a problem again. If in doubt of whether your current copy is clean.. just do it again, then delete the old copy.
Try it out. If you need to confirm... go ahead save as plain text, or whatever. It works as advertised.
A fool throws a stone into a well and a thousand sages can not remove it.
If you're interested in these articles, read them on the Crypto-Gram newsletter instead of waiting for /. ers to read it and post them here.
Your favorite
Don't worry, happy, happy not-stalinist-totalitarian-at-all-honest DRM will make sure this can't happen in future!
Comment removed based on user account deletion
Unfortunately, a great many companies require documents to be in MS Word format. I have heard horror stories about people being required to submit MS Word resumes for jobs working with open/free software. What /is/ the attraction of that particular file format, anyway?
I know. Vendor lock-in. Still hadda ask.
Batou: Hey, Major... You ever hear of "human rights"? Major: I understand the concept, but I've never seen it in action
The ability to read deleted/old info is not new to me. Recently my girflriend was updating her Curriculum Vitae. It had been edited on several versions of Word,on a 3.5 floppy. When we opened it in OpenOffice, we got the last version she had edited. Then she reopened it in Kwrite, and we saw there an entirely different document- in fact, the same CV, but from a whole year prior to the last version. Maybe other people have noticed you can do this?
i find those all the time, i enjoy searching for files and such people do not intend for the mass public to see. you just need to know how and where to search. heck, have fun looking at Gov't spending.
SimonTek
If you make a PDF with ghostscript or PDFTex is there still metadata in them?
What /are/ you ranting about?
Batou: Hey, Major... You ever hear of "human rights"? Major: I understand the concept, but I've never seen it in action
Once, when negotating an investment deal, we got a Word document with the investment bank's comments on our proposed contract.
:-)
They tracked changes. All we needed to do was display them... and we got juicy stuff like "if they accept either our fix for clause X or for clause Y we can still s---w them royally in scenario Z".
Made for a very effective negotiation. For us.
Oh, wait, the article was about the problems this raises for the document's _author_.
Never mind
I've started getting nasty about Word attachments. I've set up my mail transport agent to send automatically send spurious responses to messages with .doc attachments with a polite policy message to the effect that their attachment had been rejected as being prepared with potentially insecure and malicious code, and if they care to send it in a sensible format I might deign to read it. If the sender isn't in my whitelist, I just consign it to the spam bin. Saves a lot of time, and some people have got the message. And if they don't get the message, I don't much care if I don't hear from them.
It's articles like this that really kind of tick me off. I'm CONSTANTLY telling my customers that regardless of what they heard, Windows is not the best way to go. I'm informing them about the ups and downs of linux, and Mac. I also happily inform them that I can build their networks with a Linux main and windows in vmware which minimizes the amount of damage that can happen due to viruses. I've gone so far as to bring clients to my home to see my personal network so they can make an educated decision about if linux/vmware is the right choice for them. And I show them the mac machines I have.
But when people read articles like this...it makes my job all that much harder. While it's true that MANY of the people in my feild in this area do just that, I've worked for YEARS to prove I'm trustworthy, things like this cause my clients to have doubts in my dedication to THEIR needs, security, and functionality and quite frankly it #$%#$% me off.
Yes, but it's precisely because it isn't intuitive that training is required. In theory, tips along the lines of "don't-put-MSWord-documents-on-the-web" would be covered in the security thingie.
Furry cows moo and decompress.
I use a free hex editor called frhed. The product says its home page is http://www.tu-darmstadt.de/~rkibria
Its great for checking AND CHANGING the actual contents of ANY file.
I actually experienced something like this myself. My gf was working on her graduation report and while googling, she found a powerpoint file on the Dutch Microsoft site. It was about their partner strategy, i.e. who would manage their partners and in what way. I mailed it to the director of my division, who found it rather interesting. Some days later, it was gone from the MS site.
8 of 13 people found this answer helpful. Did you?
Snakes just transfer poisonous chemicals known to the humans. Snake byte by itself is harmless for humans, unless there's also a poisonous chemical injected into the human blood. You're right in terms of numbers, but let's say we leave the snakes out of this list.
Be wary of any facts that confirm your opinion.
www.internalmemos.com :D
It's a pity that more people don't just save as RTF. It's just as good for most uses, and it's a less obscure format.
If you've every tried opening an RTF document from MS Word in any other program, you'll realise why this is a bad idea.
You know what HTML from Word looks like, right?...
you can get nearly infinite undo
A paragraphs worth is should suffice.
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
The first thing that came to mind when I saw the article's de[t. when-is-delete-not-delete? dept was the press release from the Windows XP London launch
interesting, we have a story about word being nasty, bloated, and most importantly, a security risk. I suggest getting rid of the security risk since there are plenty of other choices.
Somehow this makes me a troll? What do we mod down ANYTHING that even hints at being a negative statement about microsoft products now days? How slashdot has changed...
WHAT?!?? (Score:5, Funny)
by zedmelon (583487) on 2003.08.15 15:36 (#6708189)
(Last Journal: 2003.08.02 12:09)
...and I saw that some assmonkey had moderated me as flamebait. I suppose I can see a staunch MS fan could see it as flamebait, but I didn't think they were allowed to post on /.
;)
Mom says my
...and they still won't give a fraction of a shit about security. Convenience is God to 98% of all the users out there. Their attitude is, "the (programmers/techs who installed this) shouldn't have made it possible, anyway. I have no responsibility for anything I do. Plus I hate to learn and think at all."
And of course their managers are just as irresponsible and hold no one accountable except for tech people, for anything that happens via computer.
This news is so old it has fossilised.
Shouldn't someone as why was nothing done when this was first publicised?