Obtaining Multi-Tier Application Logs for Reseach?
arohann asks: "I'm a research assistant in a well-known university in the US. As part of the research work my group is doing, we need access to the logs from a production system of an n-tier web-application. I've been looking around for a while with no result. Most places reply with a flat 'No!'. I was wondering if there anyone who could help/advise with this. Please read about our requirement below and do let me know if you can help?"
"We want to examine the request arrival behaviour of a real-world web-application and will also need to examine how long each request takes to be processed at each tier. We would collect this data over a few days and then use it to build a real-world model of the request behaviour of an internet application. This model would be used in our analysis and profiling of clustered, multi-tier, internet applications.
Of course, we realize it maybe that some of this data cannot be shared due to client privacy concerns. However, let me assure you that we are not interested in any client details and we're not particularly concerned with what kind of an application it is as long as its at least 3-tier, is a production system (we need a real-world model), and is used daily. We are also willing to sign a confidentiality agreement if necessary and follow any company protocol required to ensure that security and confidentiality are preserved.
Of course, if this results in any research paper publications, we would give credit to the supplier of the data.
Hoping to hear back from everyone soon ;)"
Of course, we realize it maybe that some of this data cannot be shared due to client privacy concerns. However, let me assure you that we are not interested in any client details and we're not particularly concerned with what kind of an application it is as long as its at least 3-tier, is a production system (we need a real-world model), and is used daily. We are also willing to sign a confidentiality agreement if necessary and follow any company protocol required to ensure that security and confidentiality are preserved.
Of course, if this results in any research paper publications, we would give credit to the supplier of the data.
Hoping to hear back from everyone soon ;)"
i doubt you will get to see logs like that without taking them by force
Snowden and Manning are heroes.
No!
Be relentless!
a candy bar?
Before you laugh, it goes amazingly far where I work....
My guess is your gonna need to try to contact them using something other then email, probably some sort of ceritifed letter.
I will echo that: No!
System logs are for the machine's administrators and for software developers, not researchers.
If you guys want research material, build your own systems and sink in the tens of miillions of dollars to do that. If your app is decent you'll have more log data that you could possibly wish for.
"Piter, too, is dead."
I'm not familiar with this "reseach" that you speak of, but it doesn't sound like something I'd want to donate my logs to. It sounds like some sort of weird internet version of felching.. Ew. No thank you.
I'd start with companies that are already offering internships through your university. Find professors and graduate students that already have a working relationship with private sector folks and get introduced through them.
Just cold calling or sending in letters or email is about as effective as you've found it to be.
Also you should try looking through published artcles in trade journals and find out which companies are sponsoring research in your field by association with existing published research.
The fact is that you'll certainly have to sign an NDA and likely they will have to scrub the data anyway. One way or another it's going to cost the donors $$$ that you aren't going to reimburse. Your project will have to fit in with their research goals or they'll be returning a favor from someone else.
-- John.
Your best bet would be to have your professors call in a favor from former students or their contacts in the industry.
:)
Most companies will consider this to be a security risk. They don't even want you to know the rough design of their backends let alone collect data from it.
Some companies wouldn't know how to gather what you want and wouldn't risk letting you touch their systems.
Most of these systems are probably messy, kludged together by former employees and hacked by current employees just enough to keep them running.
If you have some time, get an internship and do your research on the side.
Welcome to capitalism, we hope you enjoy your stay. While here, please note that TANSTAAFL.
Asking for data from a business requires a lot of work on your part. You must somehow convince them that all the effort they are going to spend collecting, sanitizing, and providing you with the information is going to pay off for them in a reasonable way. Since this request involves several months of data, and more employee involvement than a 5 minute survey you'll have to build a strong relationship with a company who has this data.
Opportunites include:
- The research will help you identify areas where improvement will save $$$ in [bandwidth|speed|latency|etc]
- We can supply one or more interns to do all the internal work as well as work on a few other projects of your choosing
- You (manager, CEO, IT lackey) got your degree here and still have fuzzy feelings for the school
- Oh benevolent ones! May we sip at the firehose? Verily, this research will help this university provide graduates of the caliber which will dazzle the eyes! Yea, they will be cheap, too.
The key here, as in everything to do with business, is to network, network, network. Don't email - you cannot possibly explain your research in a way that will make them go, "Gee, I think I'd like to devote company resources to these kids tha the university of whatever!" in an email. At best send an email such as, "Dear sir, blah blah blah, we are researching n-tier applications and would like to spend a few moments talking with you about your architecture. When would be a good time to call?" Give it two days - Call them in any case except if they patently refuse to talk to you. Don't engage in email conversations - in order to get good buy-in, you need to talk to them (if only briefly) so they can associate a voice with the email. Then email all you want.You may have better luck calling at the outset, intriducing yourself and your research, then asking who at the company would be suited to help you out with your research. Then engage that person. Don't get too low on the totem pole or you may end up with someone who is inneffective within the company at getting you what you want. Certian companies (Google, forinstance) are resource rich and may be easier to work with, especially if you can get one or two workers involved and spending their 20% time helping you. If your research isn't exciting on a general level, you're in for a rough ride.
Once you've started a conversation (with several people at different companies - you're still trying to get something they will be reluctant to give) then you can start edging into what you need to complete your research. This whole process will take 2-6 months just to set everything up. I hope you've started early.
Good luck.
-Adam
...
...And rightly so for companies whose constitution is 'maximise profit'.
> if this results in any research paper publications,
> we would give credit to the supplier of the data.
If that's all you offer in return, which company will allocate the resources to verify:?
(a) this breaches no privacy laws (b) business advantage isn't sacrificed?
Some suggestions:
1. Offer a quid-pro-quo to companies you contact: in return for access, you will deliver (say) a multi-page detailed architectural review and specific recommendations on potential improvements, reviewed, say, by your professor.
2. Talk to people who run websites for non-profits, or open-source/ creative-commons websites like wikipedia.org, sourceforge.net, even slashdot. The attitude there may be more sympathetic to your efforts and the admins more willing to knock up a few Perl scripts to strip logs of sensitive information.
3. Offer to be a website maintainer for a large indepedent open-source / community effort and obtain agreement on your access to logs.
I've written a couple of these (including one for an extremely large software company that I'm sure you've heard of), and I'm currently working on one right now on the side (for my own personal gain).
If you reply to this comment with your email address, then maybe we can work something out.
I need some help with testing my current project, and you need some data. It 's actually more work for me to have someone besides myself test the software but the quality should be higher and it could help you out.
It would probably take at least a few weeks of work on your end, but you would definately have your data at the end of it. Thanks, Brad
If you are from a "large university", how come you can find any big app log files right on campas? Most "large universities" have plenty of "n-tier web-applications". Me thinks your request smells bad.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
You say you want it to make a business model etc, but you don't say what your final goal is.
./ tell us what you really want in the end, and we might find a workaround. One rule with asking questions is never assume you are right. You ask for help, so let us draw the conclusions.
Since you're already on
Only those who risk going too far can possibly find out how far they can go. T. S. Eliot
... of building yourself a net of vmwares for this purpose?
"Nae Kin! Nae Quin! Nae laird! Nae master! We willna be fooled again!"
I can't help, but it smells fishy to me. Why don't you tell us the name of the "well-known university"? Where is your email address? Why don't you answer the post that asks for your email address? Why are the questioners of an Ask Slashdot question never replying to the answers?
That will make them realize that you understand some of the constraints they are under, and that you'e a nice person (:-))
In particular, transform
http://www.sin.com/porno into
193'd-seen-HTTP-address
--dave
davecb@spamcop.net
Does sourceforge fit your definition?
Nerd rage is the funniest rage.
Um, if you're at a "well-known university in the US", then I'm sure your campus IT and Admin departments have some interesting setups. Try getting log data from the on-line registration system, during the register/add/drop period at the beginning of a semester - plenty of traffic and data moving around.
+0, Obvious?
Christ, what do they teach in schools these days?
I want to delete my account but Slashdot doesn't allow it.
we can give you access to one of our servers logfiles provided that we remain anonymous and do not tell you what the server does. we log each function and request with granualities of +/- 1ms. the server also uses a reflecting proxy so all requests logged are from localhost but that shouldnt matter to your environment. the loads and relevant requests are all real world.
post your email so we can send it to you.
The way to make this happen is to offer something more than credit in return. If you are going to collect a lot of data on an application, you'll have the opportunity to give them some cool reports about where the bottlenecks are. Offer the reports in exchange for the data. Sure, they *could* make the reports themselves, but most places never do.
If you can't get in on your own, convince some important vendor (e.g., IBM's Websphere group) that you're legit and can help them if they'll help you.
And if nobody bites on the "we'll give you reports for free" thing, then charge them $30k for your reports or $20k if they let you reuse the anonymized data in your research. You'll need a business listing and a sales guy with a nice suit to pull this off, but I promise that it will work, especially if you let the sales guy keep all the money.