As has been discussed here very recently, this is yet another case of poorly socialized nerds fantasizing that there is some kind of perfect "legal hack" that will instantaneously invalidate decades of case law. It ain't so.
It's instructive to pause and think a bit about why those defenses don't work. It largely comes down to the following: the legal system is designed to deal with unforeseen circumstances in a case-by-case manner as they arise, and adjudicate them according to some good general rules of thumb that have been passed down from previous cases that are analogous to the present one. This is very much unlike software, where the machine, confronted with a circumstance the programmer did not anticipate, will blindly and unreflectively apply the rules it's been given, in a completely literal fashion.
You'd be much better off to claim that you started the installation then left the room.
Um, no. In all of these cases, you've created a situation where you intend that an on-screen button will be pressed on your behalf. And, you've gone to extremely elaborate lengths to do so. The more elaborate and carefully reasoned the setup, the clearer it becomes that you are deliberately causing the button to be pressed.
And all of this for nothing. Because, suppose if we granted your argument that you did not agree to the terms of the EULA, but rather, it was a random accident, causally unrelated to anything you did, that the "Accept" button got clicked. Well, in that case, you don't have permission to use that copy of the software. So if your argument was right, you'd have gone to extreme lengths to achieve exactly nothing.
Your point is very clear - but I could leave my laptop to a shop, a handy cousin or anyone really and they could install and agree to things without my consent. Not so clear now, I think.
No, that's perfectly clear too. Suppose your cousin installs Word in your laptop in that case. You did not intend to have that specific piece of software installed in your machine, nor did you take any action that you could reasonably expect to result in Word being installed in your machine. You have not agreed to anything, so you're not liable for anything. You're not legally entitled to use that copy of Word, though.
Yes, I should not lend my computer. I should, I should.
No, your computer is yours to lend as you like, and you can't in general be held responsible for what the borrower does. The exceptions start to come in when you should have reasonably suspected that the borrower was going to do something unlawful using your computer, and especially if you stood to benefit from such unlawful activities. So if you lent your computer to your cousin, thinking that he might install Word in it, in the hopes that you will end up with a copy of Word, that could well be different.
And all so entirely boring that people are happy to provide that information to you over a cup of tea.
The conversation over the cup of tea is not searchable or analyzable. More importantly, when such information is given over a cup of tea, it is given in context.
As vux984 is pointing out elsewhere in this thread, yes, each little bit of information involved in this problem is harmless on its own, especially when it remains in its original context. The problem is when many such pieces of information are aggregated together, and stripped of their context. The aggregation is a problem because it enables the inference of many other pieces of information that were not disclosed; the context is important because when information is stripped of its context, wrong inferences may be drawn from it.
If you're a higher risk you *should* get charged more.. because if you're not getting charged more than *I* am getting charged more.
No, this is not in general true. Not all measurable risk factors are fair game for variable insurance rates. For example, an insurance company that explicitly used race as a pricing factor would find itself in trouble, no matter how strong of a correlation it could demonstrate between race and cost of claims.
More generally, it is unfair for insurance to be priced according to factors that you have no control over, especially if it's possible for you to move into a higher-risk group involuntarily. The best example of this is the practice of charging higher health insurance rates to sick people than to healthy ones. This in fact decreases the value of the insurance to all policyholders, because you have no control over which of these two groups you will be in tomorrow. You pay the insurer low fees while you're healthy so they will cover you if the time comes when you are sick, but then when that time you can no longer afford the coverage!
So, to take it back to the thread topic: data mining for insurance factors is problematic because it can lead to insurance companies pricing coverage on the basis of all sorts of risk factors that, despite being real, they shouldn't use.
The reason something like facebook or google is a problem is that ALL the information in the network is owned by one entity, linked together and tagged in ways that a bunch of independant websites and personal blogs never could be. Tons of data in aggregate, actively being linked together by the very users being monitored is far more than the simple sum of its parts.
While I mostly agree with your comment, I don't think that the single owning entity is an important factor. In fact, Google is the perfect counterexample here; they are set up to index and analyze tons of data they don't own.
What I had in mind is companies like Google, working in the service of organizations like credit agencies, health insurance companies, PR companies, the government, company HR departments, or just identity thiefs. Why are these services given away for free? Because the information they gather about you is valuable. Why is that information valuable? Because somebody expects to use it to get the better of you in the future.
The sister angle comes in because people will implicitly or explicitly give information about people other than themselves, that then will be used against people who did not disclose it about themselves.
Yes. The problem is the people who use technology to allow other people to destroy my privacy in new, much more brutally efficient ways. Only because the first set of people really want to gather, aggregate and analyze as much information as possible about me, in order to use it against my advantage.
When uploading pictures to facebook, the uploader requires the copyright holders permission, if they are pictures you took, then you could tell facebook to take them down
The case GP probably has in mind is pictures that other people took, that depict him.
Good luck discovering those pictures without a Facebook account!
Good luck sending copyright notices to every company that has a feature like this, over and over.
Good luck keeping your friends! (And I mean that in the old, real-life sense.)
How do you know it's not possible to tell the system to run the app with the new DLL?
I think that refusing to run applications that have been tinkered with is a reasonable security measure to protect against malware. I'd be upset if there was no way for power users to change stuff, yes, but I don't think we have the full story here yet.
I know I'm being pedantic, but it grates on me to hear the term "materialized view".
I think this kind of objection is silly; but, what's worse, in this particular case, the factual basis of the objection is wrong. I'll get to that.
There is no such thing, and "materialized view" is a contradiction in terms, since a view by definition is never fixed and changes as the data in the tables it references change. You would be better off referring to "materialized views" as snapshots. Because once you "materialize" something, it is definitely no longer a view.
Various databases support automatic refresh of the physical tables in question when changes are made to the base logical tables. In this case, you're materializing the view on disk, and changing it as the tables in question change.
Indexed views in SQL Server, for example, don't even allow the refresh to be deferred; an update to the base tables must refresh any indexed views that use that table. By your criterion, these are neither snapshots (because they change as the data in the base table change) nor views (because they are materialized).
RDBMS's can never give you this kind of horizontal scalability, because they make a promise to you that you can transactionally modify any two bits of data anywhere in your database. Fulfilling this promise requires that either your whole database lives on a single machine, or that you use a distributed transaction protocol like 2PC (which totally kills performance).
This isn't as big of a problem as you make it to be. That just means that if you limit or eliminate that RDBMS ability, you can query the data much faster and distribute it over a bigger cluster.
You're still missing the point that the relational model is a logical data model. This is the one biggest misconception in all arguments against relational technology. If row-based RDBMSs oriented toward small transactional updates have intrinsic performance limitations, this doesn't defeat the relational model; it just means that it's the wrong implementation of the model for one type of application.
All the nice RDBMS features like transactions, joins, foreign keys, triggers, etc. can only (reasonably) work within a single physical database.
But other than joins, you're not even addressing the most fundamental relational feature, which is the separation of the logical and the physical data models.
A TOS which grants any entity full rights to your stuff, including to license it further, means pretty much just that: you forfeit any legal rights or recourses you might have had. If they want to use it for any purpose whatsoever, they can. You just gave them that right.
Not quite true. You agree to grant them, as copyright holder or licensee, a copyright license that does not limit their use of the photo. However, there are still other limits on how they may use the photo in question.
To go back to your example:
E.g., just because I saved your family photo on my hard drive, doesn't mean I can cut and paste your daughter's head into an ad for condoms, nor as an ad for Adult Friend Finder, nor on top of a porn-star's body and sell subscriptions to that site, nor pretty much anything else.
Yup, but the reason why you can't do any of that isn't just that you don't have copyright or a license. There are two reasons:
Even if you have copyright, if a photo contains the recognizable likeness of one person, you still need the permission of that person to make use of the image in certain ways--like promoting a brand of condoms or an adult site.
Permissions don't trump the law. If the law considers your use unlawful, it doesn't matter whose permissions you have and from whom.
Facebook, IIRC, makes you represent that you have the right to grant them a license to any pictures that you upload, but when you get down to it, they have no means through their EULA of ensuring that they have the right to use the recognizable likeness of every person that appears in a photo on their site. That's quite simply because users can upload photos of other people, even people who are not FB members.
Copyright law requires a written transfer of ownership.
A copyright license is not a transfer of ownership. And there are well-established examples of copyright licenses being granted without any written papers or signatures. For example, free software.
This whole thing sounds like an urban legend to me, but in general you can't just publish a photo with somebody's recognizable likeness and label it in a way that may tend to smear that person's reputation, without the consent of the person depicted in the photo. So, even if you had copyright or a valid license on the photos of these ladies, if you published a book of those photos and labeled it "Dirty Facebook Skanks," you could get in trouble.
This page contains a useful non-lawyerly overview of the topic.
There's a reason relational databases took over the world of databases: They provide a good combination of flexibility and structure to efficiently represent data. Which is what databases are supposed to do.
And don't forget data integrity at the logical level, like protecting against the duplication of information (normalization), verifying that the data meets all kind of semantic conditions (constraints), atomicity of complex reads and writes (transactions), etc.
It is very, very important to know that these "death of the RDBMS" articles seldom take notice of all of the many problems that RDBMSs address. They normally just completely ignore at least the data integrity problems, and focus on the query speed problem. Well, guess what, of course you can do better than a plain RDBMS at the query speed issue if you ignore all the other stuff. Unless the RDBMS implements some kind of super-fast read-only materialized view, in which case, well, then it's not clear whether any technique you have can't be duplicated by the RDBMS.
Yes, these newer simple key/value databases like BigTable and CouchDB are effectively a subset of RDBMS functionality, so of course the same thing can be implemented relationally by just not using features.
What worries me about these arguments, however, is that they're missing a point that's very similar to yours here: these high-performance key-value databases can be implemented as features in an RDBMS. Basically, if you have a technology that allows some limited type of database to be distributed across tons of nodes and to be queried really fast, well, that's a kind of limited-functionality materialized view with a special engine to access it. So put it in as a subsystem to the full RDBMS, and use your plain old full-featured relational engine as the system of record that solves the concurrent transactional update and data integrity problems, and have it also push out the deltas to the specialized store that supports the the high-performance distributed querying.
Nobody is denying that there are many applications where you don't need all that the relational model provides, and that those applications can be made to perform faster by not providing certain features. What people repeatedly fail to understand is that this is not a refutation of the relational data model, because it is a logical and general data model that's capable of modeling the data in such applications, and does not dictate the implementation.
The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.
Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)
Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.
But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.
So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.
There's been some research that has shown by certain ages a person's capability to learn an initial language drastically cuts off. Barney, and the rest of his never-change-facial-expressions-non-human-faced friends, deprive babies of the non-verbal cues normal human interaction produces.
Not that I know much about this topic, but I understand that the problem with this what you're claiming here is that children don't learn language from TV humans with changing facial expressions, either. They learn it from actual real-life interaction with other people. So to the extent that you can blame TV for this, it's not easy to single out Barney as being particularly relevant.
I'd recommend Eve Clark's book, First Language Acquisition if you want to check whether I'm remembering this right.
I guess if there is a man in the middle attack, malicious code could be put into the page anyway via a script tag. So there is no guarantee that the set of code that can run during your program's execution is exactly equivalent to your program's source code.
Yeah. That's a different sort of trust problem than what I was thinking about. That's the issue of code delivery--is the code that I think I'm giving the user the code that the user is actually getting?
GGP talks about running eval when you trust the source. My answer to that is to question whether you should trust the source in the first place. Good security design often involves splitting a system into parts that trust each other as little as possible, so that if one part of the system gets compromised, the damage can be contained there.
To get this sort of privilege separation working, the interface between the subparts of the system must be carefully designed to be as restrictive as possible. Using eval on a string presumed to be JSON data is basically the antithesis of that: the interface between the subparts of the system is an ad-hoc subset of a Turing-complete language that has access to lots of resources in the component that runs eval.
In that sense, just using eval against your own XMLHTTPRequests is no less safe than presenting a plain HTML page. As long as you validate the contents of your JSON objects on the server side before returning them back to the page.
The other problem here, and a deep one, is that correct validation of data is no simpler than correct parsing and semantic analysis of the data. If you're going to write a validator and you want it to be correct, you're going to need to implement the equivalent of a correct parser. Given that you'd already be writing a parser, it would then make no sense to use eval.
If you can't trust the source, you should parse it using a safe parser.
And if you always parse it with a safe parser, you remove the whole need to trust the source not to inject code into your input data.
I've spent many years programming in Lisp, even getting paid for it. The only legitimate use I know for procedures like eval is to generate and compile code runtime for performance reasons, in software systems where the computations that will be done can't be known when the system itself is compiled. Paradigm example: a complex data analysis and reporting system where the users enter complex formulas to perform analysis on large data sets.
But guess what? The way those systems work is that the formula language provided to the user is strictly defined, with a real grammar; the user-entered formulas are parsed with a real parser that validates them against the grammar; the parser outputs an abstract syntax tree representation of the user-entered formulas; the abstract syntax tree is transformed into a Lisp expression that evaluates to a function that computes the value of the formula; and then that Lisp expression is passed to some eval-like procedure that outputs the corresponding compiled code. So the user input is four layers removed from eval, and extensively validated.
So basically, you need a really big justification to ever use eval or any other function that can potentially cause your program to execute arbitrary user input, and your system needs the equivalent of a parser and semantic analyzer in it anyway before it can be remotely safe to use eval.
The eval is only as insecure as the host it's coming from.
Translation: the eval is insecure.
If you don't trust the source of the script/objects, don't use that source.
Or, alternatively, process the data in such a way that you do not need to trust the source of the data. This means parsing and validating the data to ensure it conforms to a strictly defined format, but even if you don't get that far, there is one very simple thing you can do that makes a big difference: don't use eval.
And all http technologies are subject to man in the middle attacks, but you can always use httpS to help to mitigate that.
Yup. And you can also not use eval. If you never use eval, you basically guarantee that the set of code that can run during your program's execution is exactly equivalent to your program's source code.
It's instructive to pause and think a bit about why those defenses don't work. It largely comes down to the following: the legal system is designed to deal with unforeseen circumstances in a case-by-case manner as they arise, and adjudicate them according to some good general rules of thumb that have been passed down from previous cases that are analogous to the present one. This is very much unlike software, where the machine, confronted with a circumstance the programmer did not anticipate, will blindly and unreflectively apply the rules it's been given, in a completely literal fashion.
Um, no. In all of these cases, you've created a situation where you intend that an on-screen button will be pressed on your behalf. And, you've gone to extremely elaborate lengths to do so. The more elaborate and carefully reasoned the setup, the clearer it becomes that you are deliberately causing the button to be pressed.
And all of this for nothing. Because, suppose if we granted your argument that you did not agree to the terms of the EULA, but rather, it was a random accident, causally unrelated to anything you did, that the "Accept" button got clicked. Well, in that case, you don't have permission to use that copy of the software. So if your argument was right, you'd have gone to extreme lengths to achieve exactly nothing.
No, that's perfectly clear too. Suppose your cousin installs Word in your laptop in that case. You did not intend to have that specific piece of software installed in your machine, nor did you take any action that you could reasonably expect to result in Word being installed in your machine. You have not agreed to anything, so you're not liable for anything. You're not legally entitled to use that copy of Word, though.
No, your computer is yours to lend as you like, and you can't in general be held responsible for what the borrower does. The exceptions start to come in when you should have reasonably suspected that the borrower was going to do something unlawful using your computer, and especially if you stood to benefit from such unlawful activities. So if you lent your computer to your cousin, thinking that he might install Word in it, in the hopes that you will end up with a copy of Word, that could well be different.
The conversation over the cup of tea is not searchable or analyzable. More importantly, when such information is given over a cup of tea, it is given in context.
As vux984 is pointing out elsewhere in this thread, yes, each little bit of information involved in this problem is harmless on its own, especially when it remains in its original context. The problem is when many such pieces of information are aggregated together, and stripped of their context. The aggregation is a problem because it enables the inference of many other pieces of information that were not disclosed; the context is important because when information is stripped of its context, wrong inferences may be drawn from it.
No, this is not in general true. Not all measurable risk factors are fair game for variable insurance rates. For example, an insurance company that explicitly used race as a pricing factor would find itself in trouble, no matter how strong of a correlation it could demonstrate between race and cost of claims.
More generally, it is unfair for insurance to be priced according to factors that you have no control over, especially if it's possible for you to move into a higher-risk group involuntarily. The best example of this is the practice of charging higher health insurance rates to sick people than to healthy ones. This in fact decreases the value of the insurance to all policyholders, because you have no control over which of these two groups you will be in tomorrow. You pay the insurer low fees while you're healthy so they will cover you if the time comes when you are sick, but then when that time you can no longer afford the coverage!
So, to take it back to the thread topic: data mining for insurance factors is problematic because it can lead to insurance companies pricing coverage on the basis of all sorts of risk factors that, despite being real, they shouldn't use.
While I mostly agree with your comment, I don't think that the single owning entity is an important factor. In fact, Google is the perfect counterexample here; they are set up to index and analyze tons of data they don't own.
What I had in mind is companies like Google, working in the service of organizations like credit agencies, health insurance companies, PR companies, the government, company HR departments, or just identity thiefs. Why are these services given away for free? Because the information they gather about you is valuable. Why is that information valuable? Because somebody expects to use it to get the better of you in the future.
The sister angle comes in because people will implicitly or explicitly give information about people other than themselves, that then will be used against people who did not disclose it about themselves.
Yes. The problem is the people who use technology to allow other people to destroy my privacy in new, much more brutally efficient ways. Only because the first set of people really want to gather, aggregate and analyze as much information as possible about me, in order to use it against my advantage.
How do you know it's not possible to tell the system to run the app with the new DLL?
I think that refusing to run applications that have been tinkered with is a reasonable security measure to protect against malware. I'd be upset if there was no way for power users to change stuff, yes, but I don't think we have the full story here yet.
Can somebody post a readable, reasoned summary of this submission?
Um, forget I said "summary." This would need to be longer than the original. Maybe "commentary" is the right word...
I think this kind of objection is silly; but, what's worse, in this particular case, the factual basis of the objection is wrong. I'll get to that.
Various databases support automatic refresh of the physical tables in question when changes are made to the base logical tables. In this case, you're materializing the view on disk, and changing it as the tables in question change.
Indexed views in SQL Server, for example, don't even allow the refresh to be deferred; an update to the base tables must refresh any indexed views that use that table. By your criterion, these are neither snapshots (because they change as the data in the base table change) nor views (because they are materialized).
This isn't as big of a problem as you make it to be. That just means that if you limit or eliminate that RDBMS ability, you can query the data much faster and distribute it over a bigger cluster.
You're still missing the point that the relational model is a logical data model. This is the one biggest misconception in all arguments against relational technology. If row-based RDBMSs oriented toward small transactional updates have intrinsic performance limitations, this doesn't defeat the relational model; it just means that it's the wrong implementation of the model for one type of application.
But other than joins, you're not even addressing the most fundamental relational feature, which is the separation of the logical and the physical data models.
Not quite true. You agree to grant them, as copyright holder or licensee, a copyright license that does not limit their use of the photo. However, there are still other limits on how they may use the photo in question.
To go back to your example:
Yup, but the reason why you can't do any of that isn't just that you don't have copyright or a license. There are two reasons:
Facebook, IIRC, makes you represent that you have the right to grant them a license to any pictures that you upload, but when you get down to it, they have no means through their EULA of ensuring that they have the right to use the recognizable likeness of every person that appears in a photo on their site. That's quite simply because users can upload photos of other people, even people who are not FB members.
A copyright license is not a transfer of ownership. And there are well-established examples of copyright licenses being granted without any written papers or signatures. For example, free software.
This whole thing sounds like an urban legend to me, but in general you can't just publish a photo with somebody's recognizable likeness and label it in a way that may tend to smear that person's reputation, without the consent of the person depicted in the photo. So, even if you had copyright or a valid license on the photos of these ladies, if you published a book of those photos and labeled it "Dirty Facebook Skanks," you could get in trouble.
This page contains a useful non-lawyerly overview of the topic.
And don't forget data integrity at the logical level, like protecting against the duplication of information (normalization), verifying that the data meets all kind of semantic conditions (constraints), atomicity of complex reads and writes (transactions), etc.
It is very, very important to know that these "death of the RDBMS" articles seldom take notice of all of the many problems that RDBMSs address. They normally just completely ignore at least the data integrity problems, and focus on the query speed problem. Well, guess what, of course you can do better than a plain RDBMS at the query speed issue if you ignore all the other stuff. Unless the RDBMS implements some kind of super-fast read-only materialized view, in which case, well, then it's not clear whether any technique you have can't be duplicated by the RDBMS.
What worries me about these arguments, however, is that they're missing a point that's very similar to yours here: these high-performance key-value databases can be implemented as features in an RDBMS. Basically, if you have a technology that allows some limited type of database to be distributed across tons of nodes and to be queried really fast, well, that's a kind of limited-functionality materialized view with a special engine to access it. So put it in as a subsystem to the full RDBMS, and use your plain old full-featured relational engine as the system of record that solves the concurrent transactional update and data integrity problems, and have it also push out the deltas to the specialized store that supports the the high-performance distributed querying.
Nobody is denying that there are many applications where you don't need all that the relational model provides, and that those applications can be made to perform faster by not providing certain features. What people repeatedly fail to understand is that this is not a refutation of the relational data model, because it is a logical and general data model that's capable of modeling the data in such applications, and does not dictate the implementation.
The name of the MapReduce framework comes from the functional programming operations "map" and "reduce." Map takes as its input a collection of data, and a function that transforms data elements into other elements; it outputs a collection where each element of the input collection has been replaced by the result of applying that function to it. Reduce takes a collection of elements, an initial value of the same type as the elements, and a two-place, commutative, associative and symmetric operation; it produces as its output the value that results from applying the operation to the initial value and each element of the collection in turn, accumulating the partial results.
Map and reduce are operations that can be trivially parallelized. To parallelize map, you divide the collection into subcollections (in any arbitrary manner), and map over each of them in parallel. To parallelize reduce, you divide the collection into subcollections, also arbitrarily, reduce each subcollection independently, then apply the reduction operation to the partial results. (That works because the reduction operation is commutative, associative and symmetric.)
Well, guess what: this sort of technique is trivially applicable to relational database queries. A SQL query translates down to a combination of joins (the FROM clause), filters (the WHERE clause) and maps (the SELECT clause). Joins are trivially parallelizable; you give each execution unit a subset of the tuples of the driving relation. Filtering (the WHERE clause) is a kind of reduce operation. SELECT is a kind of map operation. This means that relational queries are not any less amenable to parallel execution than the stuff Google does.
But the killer thing here is that MapReduce says absolutely nothing about the updates problem. This is one of the big features of RDBMSs: the ability to handle concurrent query and modification. It also says nothing about the data integrity problem, which is also one of the big RDBMS features.
So, when you get down to it, there is a good argument to be made that many applications could make use of database technologies that support much faster querying, at the expense of very little updating. But there's no convincing argument that that technology isn't best implemented in the context of an RDBMS.
Not in the everyday sense, nor, evidently, in the legal sense.
Why do you insist not only that every word have the one same sense in every single context? That's not even true within science itself.
Not that I know much about this topic, but I understand that the problem with this what you're claiming here is that children don't learn language from TV humans with changing facial expressions, either. They learn it from actual real-life interaction with other people. So to the extent that you can blame TV for this, it's not easy to single out Barney as being particularly relevant.
I'd recommend Eve Clark's book, First Language Acquisition if you want to check whether I'm remembering this right.
Yeah. That's a different sort of trust problem than what I was thinking about. That's the issue of code delivery--is the code that I think I'm giving the user the code that the user is actually getting?
GGP talks about running eval when you trust the source. My answer to that is to question whether you should trust the source in the first place. Good security design often involves splitting a system into parts that trust each other as little as possible, so that if one part of the system gets compromised, the damage can be contained there.
To get this sort of privilege separation working, the interface between the subparts of the system must be carefully designed to be as restrictive as possible. Using eval on a string presumed to be JSON data is basically the antithesis of that: the interface between the subparts of the system is an ad-hoc subset of a Turing-complete language that has access to lots of resources in the component that runs eval.
The other problem here, and a deep one, is that correct validation of data is no simpler than correct parsing and semantic analysis of the data. If you're going to write a validator and you want it to be correct, you're going to need to implement the equivalent of a correct parser. Given that you'd already be writing a parser, it would then make no sense to use eval.
You must be unfamiliar with /b/.
And if you always parse it with a safe parser, you remove the whole need to trust the source not to inject code into your input data.
I've spent many years programming in Lisp, even getting paid for it. The only legitimate use I know for procedures like eval is to generate and compile code runtime for performance reasons, in software systems where the computations that will be done can't be known when the system itself is compiled. Paradigm example: a complex data analysis and reporting system where the users enter complex formulas to perform analysis on large data sets.
But guess what? The way those systems work is that the formula language provided to the user is strictly defined, with a real grammar; the user-entered formulas are parsed with a real parser that validates them against the grammar; the parser outputs an abstract syntax tree representation of the user-entered formulas; the abstract syntax tree is transformed into a Lisp expression that evaluates to a function that computes the value of the formula; and then that Lisp expression is passed to some eval-like procedure that outputs the corresponding compiled code. So the user input is four layers removed from eval, and extensively validated.
So basically, you need a really big justification to ever use eval or any other function that can potentially cause your program to execute arbitrary user input, and your system needs the equivalent of a parser and semantic analyzer in it anyway before it can be remotely safe to use eval.
Translation: the eval is insecure.
Or, alternatively, process the data in such a way that you do not need to trust the source of the data. This means parsing and validating the data to ensure it conforms to a strictly defined format, but even if you don't get that far, there is one very simple thing you can do that makes a big difference: don't use eval.
Yup. And you can also not use eval. If you never use eval, you basically guarantee that the set of code that can run during your program's execution is exactly equivalent to your program's source code.