Obtaining Multi-Tier Application Logs for Reseach?
arohann asks: "I'm a research assistant in a well-known university in the US. As part of the research work my group is doing, we need access to the logs from a production system of an n-tier web-application. I've been looking around for a while with no result. Most places reply with a flat 'No!'. I was wondering if there anyone who could help/advise with this. Please read about our requirement below and do let me know if you can help?"
"We want to examine the request arrival behaviour of a real-world web-application and will also need to examine how long each request takes to be processed at each tier. We would collect this data over a few days and then use it to build a real-world model of the request behaviour of an internet application. This model would be used in our analysis and profiling of clustered, multi-tier, internet applications.
Of course, we realize it maybe that some of this data cannot be shared due to client privacy concerns. However, let me assure you that we are not interested in any client details and we're not particularly concerned with what kind of an application it is as long as its at least 3-tier, is a production system (we need a real-world model), and is used daily. We are also willing to sign a confidentiality agreement if necessary and follow any company protocol required to ensure that security and confidentiality are preserved.
Of course, if this results in any research paper publications, we would give credit to the supplier of the data.
Hoping to hear back from everyone soon ;)"
Of course, we realize it maybe that some of this data cannot be shared due to client privacy concerns. However, let me assure you that we are not interested in any client details and we're not particularly concerned with what kind of an application it is as long as its at least 3-tier, is a production system (we need a real-world model), and is used daily. We are also willing to sign a confidentiality agreement if necessary and follow any company protocol required to ensure that security and confidentiality are preserved.
Of course, if this results in any research paper publications, we would give credit to the supplier of the data.
Hoping to hear back from everyone soon ;)"
...
...And rightly so for companies whose constitution is 'maximise profit'.
> if this results in any research paper publications,
> we would give credit to the supplier of the data.
If that's all you offer in return, which company will allocate the resources to verify:?
(a) this breaches no privacy laws (b) business advantage isn't sacrificed?
Some suggestions:
1. Offer a quid-pro-quo to companies you contact: in return for access, you will deliver (say) a multi-page detailed architectural review and specific recommendations on potential improvements, reviewed, say, by your professor.
2. Talk to people who run websites for non-profits, or open-source/ creative-commons websites like wikipedia.org, sourceforge.net, even slashdot. The attitude there may be more sympathetic to your efforts and the admins more willing to knock up a few Perl scripts to strip logs of sensitive information.
3. Offer to be a website maintainer for a large indepedent open-source / community effort and obtain agreement on your access to logs.
I've written a couple of these (including one for an extremely large software company that I'm sure you've heard of), and I'm currently working on one right now on the side (for my own personal gain).
If you reply to this comment with your email address, then maybe we can work something out.
I need some help with testing my current project, and you need some data. It 's actually more work for me to have someone besides myself test the software but the quality should be higher and it could help you out.
It would probably take at least a few weeks of work on your end, but you would definately have your data at the end of it. Thanks, Brad