First of all, Taco, I am glad you chose to gather feedback on this. I was the right thing(tm) to do, IMHO.
With regards to the linking issue, my take is that giving some guy a forum to plug his own website in return for a good story is perfectly fine. I just wish I could see which sort of link I am getting. Something like "original story" and "submitters take on the story" as mandatory fields.
In terms of incentive you just might want to add a link to the submitters homepage to the story heading, along the line of freshmeat. this would offer the incentive for submitting while making links to my-own-blog(tm) in the story text less attractive than now.
Sorry to be annoying, but i consider your approach - while intuitive - basically flawed. If you are testing with constantly changing input data, it is very hard to determine the effect of any changes to your configuration. What you would do instead is capture a days worth of data, or maybe more, and hack together some script that lets you replay the day against you test configuration. thus you can always make suer that any changes you made haven't messed up the configuration. you can also vary replay speed to do some stress testing, and you might want to consider building up a set of "interesting" mails to use as testcases.
testing with an unknown and essentially random input set has its values, but i consider it incomplete.
Ah, no, sorry, we just comment on the comments already posted as comments. Nobody ever cared about the article, but these days you can safely ignore the blurb at the top as well - everybody else does.
I have been doing several large scale dwh projects, esp. in the telecommunications field. the previous post contains good advice.
I would also suggest looking into a decent ETL toolkit. Esp. newer generation technology like datastage px and abinitio work well in distributed scenarios and are specifically built for this type of jobs. Also, in many cases, they are significantly faster for this type of workload.
Of course, you might also want to look into real- or neartime solutions, esp. when you have large data volume and have a problem finishing everything through the night. Some middleware ala MQseries might come handy when doing that.
Be extra careful with data validation, as another poster has mentioned already. Only load data that is validated, anything suspicious should be manually inspected and loaded during the next run.
cheers
majello
Has anyone noticed the differences between google maps and google earth.
for instance, when looking at the place where my office is in tokyo, google maps shows the office tower it is in, while google earth shows a picture of the area before the tower was built.
funny, me thinks
cheers
Majello
Writing docs isn't that much different from approaching a new software project. rule one for me is to get things structured as fast as possible. people here have already suggested several ways to do this:
* TOC / Word Outline view
* Mindmaps
Those are good when you want a hierarchical structure. Another way to approach the problem are concept maps, which are good to get the basic information in your brain on paper in an ordered way. But you still need to organize it afterwards. Concept maps are also a good way to learn fast how things work together, so they can serve as an outline and guide to the reader.
In the end, as soon as you have the structure right, just start writing down whatever comes to your mind. If you feel you have all things down, start going through your document in a linear fashion and see if it follows through and if you missed anything. And lastly, get someone to QA your doc. It is way too easy to write something that makes perfect sense to you while being totally incomprehensible for anybody else.
cheers
majello
Wow. I can distinctly remember this one. It even had graphics on my speccy. Somehow I did now mind looking at crude vector graphics being visibly flood-filled at the time. the thing with the river the rope and the boat was actually the only part i did not manage right away. and finding the one ring in the valleys of the misty mountains.
I miss the rubber keys. *sniff*
Seems like I've been on computers for too long, if you can count the spectrum as a computer.
Ob on topic: a movie based on the little hobbit would need to be distinctly different to the LOTR movies we are seeing now. I think the tone and style of the book is different enough to not even warrant a comparison to the opening of FOTR, even if the place is the same. just think of the trolls in the cave.
May your shadow never grow less, or stealing would be too easy.
now that gives me a massive flashback to the times when i was doing MUMPS on VAX systems. Hear the mantra of these times: "every array is a b-tree, every b-tree is both on disk and in memory"
was fun though, but the syntax was easily the most obfuscated I ever saw in a production level language.
Actually, according to google he is:
cheers, majello
I guess i'll be holding out and wait for web 3.11 for workgroups. Or Web NT 3.1. Or Web System 10 including cocoa butter. cheers Majello
First of all, Taco, I am glad you chose to gather feedback on this. I was the right thing(tm) to do, IMHO.
With regards to the linking issue, my take is that giving some guy a forum to plug his own website in return for a good story is perfectly fine. I just wish I could see which sort of link I am getting. Something like "original story" and "submitters take on the story" as mandatory fields.
In terms of incentive you just might want to add a link to the submitters homepage to the story heading, along the line of freshmeat. this would offer the incentive for submitting while making links to my-own-blog(tm) in the story text less attractive than now.
just my two cents
majello
Sorry to be annoying, but i consider your approach - while intuitive - basically flawed. If you are testing with constantly changing input data, it is very hard to determine the effect of any changes to your configuration. What you would do instead is capture a days worth of data, or maybe more, and hack together some script that lets you replay the day against you test configuration. thus you can always make suer that any changes you made haven't messed up the configuration. you can also vary replay speed to do some stress testing, and you might want to consider building up a set of "interesting" mails to use as testcases.
testing with an unknown and essentially random input set has its values, but i consider it incomplete.
cheers, Stefan
Ah, no, sorry, we just comment on the comments already posted as comments. Nobody ever cared about the article, but these days you can safely ignore the blurb at the top as well - everybody else does.
cheers
Majello
I have been doing several large scale dwh projects, esp. in the telecommunications field. the previous post contains good advice. I would also suggest looking into a decent ETL toolkit. Esp. newer generation technology like datastage px and abinitio work well in distributed scenarios and are specifically built for this type of jobs. Also, in many cases, they are significantly faster for this type of workload. Of course, you might also want to look into real- or neartime solutions, esp. when you have large data volume and have a problem finishing everything through the night. Some middleware ala MQseries might come handy when doing that. Be extra careful with data validation, as another poster has mentioned already. Only load data that is validated, anything suspicious should be manually inspected and loaded during the next run. cheers majello
Has anyone noticed the differences between google maps and google earth. for instance, when looking at the place where my office is in tokyo, google maps shows the office tower it is in, while google earth shows a picture of the area before the tower was built. funny, me thinks cheers Majello
Writing docs isn't that much different from approaching a new software project. rule one for me is to get things structured as fast as possible. people here have already suggested several ways to do this: * TOC / Word Outline view * Mindmaps Those are good when you want a hierarchical structure. Another way to approach the problem are concept maps, which are good to get the basic information in your brain on paper in an ordered way. But you still need to organize it afterwards. Concept maps are also a good way to learn fast how things work together, so they can serve as an outline and guide to the reader. In the end, as soon as you have the structure right, just start writing down whatever comes to your mind. If you feel you have all things down, start going through your document in a linear fashion and see if it follows through and if you missed anything. And lastly, get someone to QA your doc. It is way too easy to write something that makes perfect sense to you while being totally incomprehensible for anybody else. cheers majello
Wow. I can distinctly remember this one. It even had graphics on my speccy. Somehow I did now mind looking at crude vector graphics being visibly flood-filled at the time. the thing with the river the rope and the boat was actually the only part i did not manage right away. and finding the one ring in the valleys of the misty mountains.
I miss the rubber keys. *sniff*
Seems like I've been on computers for too long, if you can count the spectrum as a computer.
Ob on topic: a movie based on the little hobbit would need to be distinctly different to the LOTR movies we are seeing now. I think the tone and style of the book is different enough to not even warrant a comparison to the opening of FOTR, even if the place is the same. just think of the trolls in the cave.
May your shadow never grow less, or stealing would be too easy.
Majello
Mmmmhm, orthogonal persistence.
now that gives me a massive flashback to the times when i was doing MUMPS on VAX systems. Hear the mantra of these times: "every array is a b-tree, every b-tree is both on disk and in memory"
was fun though, but the syntax was easily the most obfuscated I ever saw in a production level language.
greetz
Majello