Online Document Search Reveals Secrets
An anonymous reader writes "New Scientist is reporting that many documents published online may unintentionally reveal sensitive corporate or personal information, according to a US computer researcher. Simon Byers, at AT&T's research laboratory in the US, was able to unearth hidden information from many thousands of Microsoft Word documents posted online using a few freely available software tools and some basic programming techniques." Update: 08/16 19:06 GMT by H : The story is originally from Crypto-gram, not New Scientist.
Just go into the document properties section. This is why I publish everything to Adobe Acrobat before posting online.
Good security is based upon reality and common sense. Common sense is a function of having common knowledge.
From the article:
I just created a Word document, blah.doc and put some text into it. I made sure I had a couple of undo points. I closed it and opened it back up, I couldn't undo SHIT. So where the hell am I being granted this mysterious "convenience?"
I know that the guy stressed the fact that Micrsoft isn't alone in this disctinction, but this is just another example of why Microsoft SUCKS.
I put the doc in a samba share and viewed it with vi. I found the path to the doc, the original name, my userid on my laptop, and the company name. All were hidden from the simple searches like this:
s.l.a.s.h.d.o.t...o.r.g
WTF?!?
Oh, WAIT a minute! This is also from the article:
WHEW! I feel so much better. Please disregard the first six paragraphs. Thanks.
Mom says my
This will become a common way for 'big' corps to spy on 'small' corps (and individual users?), to find new ways to both screw them over, and appear 'omniscient'. They'll never (or rarely) get called on it. Meanwhile, anyone who tries to reveal information discovered in this way which is incriminating towards said big corps will get sued for being "hackers" and/or "terrorists".
Honey, I shrunk the Cygwin
It doesn't matter how good your corporate security is if you don't train your users (including managers) in basic security practices.
Lots of people put sensitive documents in public webspace, primarily because they don't know any better. Eventually the cost-benefit analysis will be done, and corporations will pay to have their users trained. Until then, this type of thing will continue to happen.
--
Use Vobbo for Video Blogs
Well, it is amongst people who object to being mailed Word documents, anyway. They're just a really bad format for publishing information in.
See Richard Stallman's 'no-word-attachments' article, for example...
How many word processing progreams do place hidden meta data within theri formats?
For example does OpenOffice/StarOffice and other open source programs have the saem security problem?
Don't Tread on OpenSource
Simon Byers, at AT&T's research laboratory in the US, was able to unearth hidden information from many thousands of Microsoft Word documents posted online using a few freely available software tools and some basic programming techniques.
Are you going to share that info or what?
Throw it up on freenet man!
Indeed. Search for system.dat, user.dat or pwl on Kazaa, there are always some files found.
Although I cannot guess how many of those are honeypots.
Be wary of any facts that confirm your opinion.
An accomplished searcher can learn much about the world we live in, as slashdot reported some time ago.
An interesting reminder, to be sure, given yesterday's blackout.
Makes a guy wonder just how much is still available regarding key electrical and telephone infrastructure. Emergency power capabilities of broadcasters (radio, television, mobile phone). Gas lines, in the parts of the country that have them. Water systems. There's likely a bunch of data out there, ready to be mined.
Everyone should just be forced to use LATeX and then there won't be any hidden information. . .
No one can tell, man. That post is encrypted in itself.
How long until someone blames Microsoft, I wonder...
Sure, but they point they're making is that it's not intuitively obvious to most people that there could be text in a Word document other than what appears.
So a relatively security-conscious person who just doesn't know anything about Word file formats could easily publish something online on purpose without knowing that there is (invisible) sensitive information in it, even if they'd never put that information in a public place on purpose.
[TMB]
A sysadmin once sent me a form letter type thing with my new password in it. The username/password was a spreadsheet object and I was able to open it to see everyone's passwords. He changed them all when I pointed this out. BTW, why do people send email messages that just say "see attached file" and the attached file is a memo with some trival content that could have been the text of the email??
Anyway, I have to admit that I was also burned by word. I was in the habit of opening the last memo I wrote from the recent documents list and using it as the starting point for newer ones. At some point, I put a bunch of policy statements on a CD and was later told that everyone was reading the hidden text. Doh!
This was back in the days of office 97 I believe. I'm not sure if Office 2k or XP still have this feature/bug.
Remind me not to save my importand documents to C:\My Documents\Porn\Annual Budget Report.doc anymore.
I have received two such word documents from two seperate job recruiters. The actual companies looking for the employee were hidden in the document, as well as contact information for the person at the company. Screw the middle man
Remember kids: strings is your friend. If you happen to get a job offer in the form of a Word document and the HR drone who sent it to you wasn't careful, you can often see the version that got sent to other candidates and, more importantly, how much money they were offered. It can do wonders for your bargaining position.
It doesn't pertain to just documents. I've seen code samples posted to sites like experts-exchange where DB connection strings still had UID and PW data in them. Seems people don't re-read before they post very often.
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
It looks like you're trying to post a document on the web.
Would you like to...
1. Divulge corporate secrets?
2. List your passwords?
3. Remove KB823980 and open port 135?
It looks like your trying to close Clippy.
Would you like to...
1. Shit in your hat?
2. Put fist through bling bling flat panel?
3. Go home for teh weekend?
google indexed PDF documents, it even turns them into HTML
of course you could always try http://searchpdf.adobe.com/
Now there's a way to search through more than a million summaries of Adobe(R) Portable Document Format (PDF) files on the Web. Your search results will allow you to see the summaries before deciding to view the original Adobe PDF.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
View some of the past word docs you've received in a hex editor...
Near the bottom there is often information from other documents of the sender that they were recently working on. I don't know why it saves this. Maybe something to do with the undo buffer?
At work I used to look at internal memos that would be sent out on a weekly basis and find out all sorts of other stuff that was going on.
NOT MY PERSONAL INFO! NOOOOOOOOO!
This isn't just nothing new, it's old news. Wasn't this how they caught the guy who wrote the melissa virus? When that little popup window from MS Office came up asking for their personal info, did they just think Office was trying to get to know them better, in order to be their friend?
It's just silly pressmongering. Those dumbasses have to come up with a terrifying computer factoid every day, or the ignorant compu-phobes they prey on might come to their senses.
Just my opinion.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
It's only going to get worse; google's really expanded on the number of File types it indexes and caches.
One of my clients was recently caught out when google indexed private metadata she didn't know was still there, so I can well understand the gravity of this situation.
455fe10422ca29c4933f95052b792ab2
it's called http://www.google.com and you search by "top secret documents filetype:doc".
- mpg
- mov
- mp3
- secret - doesn't have to be file extensions...
- "My Documents" - yeah, that's secure...
- etc
Anyway, as you can see, it's pretty effective. Sometimes admins wise up, and all you have is the Google cache. But sometimes they don't, and you get to look. Thanks Google!A programmer is a machine for converting coffee into code.
Well, not sure about what the OP through was funny, but I sure do think this is, from the article:
"It is feasible that an individual may include their social security number on copies of a resume sent to prospective employers, but delete it from the version put online to guard against identify theft," Byers writes.
Who in their right mind puts their SSN in any version of a resume??!
everything in moderation
how many incidents will it take before people realize that ALL Microsoft products are insecure?
What will it take? What happens when a script kiddie hacks a hospital and shuts down the life support systems in ICU? Or just juggles the meds for the patients so that everyone in the hospital gets the wrong meds?
Or perhaps they glitch the Air Traffic Control system and airplanes rain down from the sky and tens or hundreds of thousands of people die??
Before the last war in Iraq started they showed the "state of the art" US command center just across the border in a big tent.
Tens of dozens or more, soldiers and dozens upon dozens of PC's. You could clearly see on the displays that they were *ALL* running Windows.
I though, "Oh shit, the security of this country is being placed in the trust of the worst product ever..."
Those PC's I saw were NOT Tempest, for one, and then add the Windows factor in plus the state of war and you're asking for serious trouble.
Windows will at some point cause a massive catastrophe and cause great loss of life and property. You can bet on it.
This country is far too dependent upon computers to operate. When the computer goes down, well, sit on your hands for awhile...
I remember the days before computers, everyone got things done just fine. Now no one knows how to function without them..
Gates and co will take care of all your sensitive info, very soon. With the help of the DMCA Sen. Fritz and MS servers we all will be so secure that no one other than MS and the right Government agencies will be able to unlock your lock online .docs. So smarten up bow to Redmond and pay up suckers! Its upgrade or lose mania time again can your business not afford the wonderfull new security thats coming? Good luck getting your secretaries to use anything other than MS orafice!
OH THE SHAME I fell off the wagon and use sigs again!
How many people actually protect their website
/stat/ or /stats/ or a variation
statistics?
Adding a simple
with a combination of "web" or the name of any of
the common statistic generation programs gets you
access to the statistics of a *lot* of websites.
Then from the stats you could find any "hidden"
data which is not linked on the site including
internal company documents, girlfriend's nude
photos or mp3s.
Alternately you could just google for the
statistic reports of sites and get there
more easily.
This is another case of ill informed or lazy
users not following what should be a simple
security policy which could cause serious
repercussions.
For those who want to know how to protect
yourself, read this link.
Tony Blair got busted in the WMD case because of the names of the people who revised the WMD Documents were still in the Word file. Now, it seems, that the Downing Street only puts PDF files on the web - and has removed all the MS word documents that were already there ....
Tools reveal secret life of documents - Documents like in Word save too much Info - Blair Episode
By Mark Ward
July 03, 2003
The UK Government was just the latest in a long line of organisations that has learned to its cost just how much information can be gleaned from innocent looking files. Earlier this year it issued a document called the 'dodgy dossier" about Iraq's concealment of weapons of mass destruction that was written using Microsoft Word. Every Word document remembers who made the last few revisions to it. The log reveals the names of four of the people who prepared the Iraq document for publication and the government Communications Information Centre that some of them work for. It was this log that Number 10 press chief Alastair Campbell had to explain to the House of Commons Foreign Affairs Select Committee in late June as part of its investigation into the Iraq dossier's history. Some of this information can be seen simply by right-clicking to view the properties of the downloaded document in a file listing. Utility programs can get even more information from Word revision logs.
The life stories of the documents we create are becoming increasingly important as the scrutiny of industries and governments gathers pace. Every time you write or edit these files you leave a trail of information revealing what you did and when you did it. With the right tools it is possible to extract this data and work out the trail of authors and workers who created a document. That is why we should all use opensource and open data formats - so that we can humanly read what all we are "putting" into the document. The Word version of this document has now been removed from government websites but copies of it are still available elsewhere on the net.
Unabridged and unedited article at
http://news.bbc.co.uk/2/hi/technology/3037760.stm
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
Have to post a link to this famous example, the dodgy dossier. There was a writeup here. If you're thinking of making the case for war, don't release Word documents to the press - unless they're very very docile.
Incorrect. You didn't read the article.
He did the search, as you said, but he didn't use Google's conversion; instead, he looked directly inside the DOC file, where Word keeps a bunch of information for its own purposes -- stuff that was deleted, stuff that was just in the wrong memory location when the save happened -- whatever.
He found legitimate docs, with legit contents; but they also contained some stuff that the authors didn't intend to publish.
-Billy
Aside from the paranoia overtones, I still disagree. The tools for doing this are on the web. Right now. So in other words, a weapon has been released that is free and easy to use. If anything, this will help small, poor companies with no resources for industrial espionage get a little information out of people who don't know any better, including their large-company rivals. All they have to do is hire one of the celibate wonders that read slashdot, and they're in business.
-Looking for a job as a materials chemist or multivariat
By using tools that break the "encryption" on, for examply, the Washington Post .pdf file mentioned in the article, isn't the researcher violating the DMCA? Isn't his whole project bragging about doing this, a la 2600?
I hope he remembers a few packs of cigarettes in order to buy himself a few nights of sleep in the Big House.
This isn't really new -- check out this story I wrote for CNet/ZDNet over a year ago.
This type of thing happens all the time but just with digital media but with other media also. People go through others garbage recreating shredded documents, camcorders catching people in the act, carbon paper, copying machines. You always need to be careful when dealing yours and others information.
Gnu For President 2004
... then suffer foot wounds.
At the risk of being moderated Troll and Redundant,
Why are these people posting Word Documents online?
The Word Wide Web is not the Microsoft Wide Web.
Post in plain ASCII text, or HTML if you feel the need to pretty it up.
People keep using tools that are far more powerful and complex than they need, then they screw up, and blame the tools. Pick a simple tool to do a simple job, and you don't need to worry about your ignorance of the tools you are using causing you problems.
I hate it when I make a joke and I get modded "+5 insightful". Mod the stupid comments "funny", not "insightful", pleas
This has happened to the UK government several times. The latter link shows whose sticky fingers were on the infamous "dodgy dossier".
Gareth
He says hidden information can "incredibly useful" in improving the functionality of the software. "But if some of that data is sensitive, there have to be ways of ensuring that it isn't distributed where it shouldn't be," he says.
Apparently they need to use some of the software he used to get a conjugation of the infinitive "to be" back into their text.
www.sitetronics.com/wordpress
Does anybody know of a program that can clean up deleted info in Word docs? I'm thinking of something like Ad-Aware that scans for certain files, shows you possible security issues (supposedly deleted text, metadata in document properties, etc.), and asks you what action it should take (wipe out/edit text, delete file, etc.).
It has been known for a long time that metadata are hidden within Microsoft Word documents. Microsoft even has Knowledge Base article 237361 explaining how to reduce the amount of metadata appearing in MS Word 2000 documents. Here's an excerpt:
This step-by-step article explains various methods that you can use to minimize the amount of metadata in your Word documents.
I'll bet there are more, but they won't disclose them.
It's a pity that more people don't just save as RTF. It's just as good for most uses, and it's a less obscure format.
You're not.
There are two ways of saving a word document:
Fast Save dumps the binary from memory into the file. Full Save compacts the binary image, and reorders it. This takes time.
Word's text stream is stored using a piece table. One of the benefits of a piece table is that if you keep the meta information about the text, you can get nearly infinite undo. The way it does this is by having an original data stream, and an appended data stream. Whenever you add data to the file, it gets added as a chunk to the end of the appended data stream. Whenever you delete, the meta table is updated to remove the text from the stream, but otherwise the text itself is left unaffected.
As a result, text is never removed from the document. A Fast Save (which is the default) under Word dumps the Piece Table as-is (there is probably some compaction over time to remove the no-longer-used data, but it probably only occurs above a given threshold of used to unused text). A full save deconstructs the piece table's meta information, and turns it back into one contiguous stream of data.
It's all just a function of the way the text is stored while it's being edited. Different editors have different mechanisms; some store data based on lines, and some store it using a gap buffer. But ultimately, the problem exists because Word uses a piece table, and it dumps the entire table to a file by default.
It's actually a sensible way of handling the text data. However, whoever designed the Fast Save algorithm probably didn't consider the ramifications of the text still being stored in the document. The best workaround? Wipe the unused sections of the piece table. But then you might as well return to using a Full Save, as you'll be ditching the performance benefits anyway.
Simon
Coming soon - pyrogyra
I exported it to RTF then reimported before saving it again as .doc. This erased other people's access to my thought processes, and it reduced the file size by 80% to boot.
In the end it didn't matter much, though. I usually include a plain text version of the resume right in my email as a backup along with the .doc attachment. On interviews, I've noticed that most people just print the plain text version. If I really didn't need to make the word doc, and people are too lazy to print it, why do companies insist you send it in .doc format anyway?
Once, when negotating an investment deal, we got a Word document with the investment bank's comments on our proposed contract.
:-)
They tracked changes. All we needed to do was display them... and we got juicy stuff like "if they accept either our fix for clause X or for clause Y we can still s---w them royally in scenario Z".
Made for a very effective negotiation. For us.
Oh, wait, the article was about the problems this raises for the document's _author_.
Never mind