AP CS Test Takers and Pass Rates Up, Half of Kids Don't Get Sparse Arrays At All
theodp writes: Each June, the College Board tweets out teasers of the fuller breakouts of its Advanced Placement (AP) test results, which aren't made available until the fall. So, here's a roundup of this year's AP Computer Science tweetstorm: 1. "Wow — massive gains in AP Computer Science participation (25% growth) AND scores this year; big increase in % of students earning 4s & 5s!" 2. "2015 AP Computer Science scores: 5: 24.4%; 4: 24.6%; 3: 15.3%; 2: 7.1%; 1: 28.6%." [3 or above is passing] 3."Count them: a whopping 66 AP Computer Science students out of 50,000 worldwide earned all 80 pts possible on this year's exam." 4. "Remember that AP exam standards are equated from year to year, so when scores go up, it's a direct indication of increased student mastery." 5. "Many AP Computer Science students did very well on Q1 (2D array processing–diverse array); >20% earned all 9/9 pts" [2015 AP CS A Free-Response Questions] 6. "The major gap in this year's AP Computer Sci classrooms seems to be array list processing; Q3 (sparse array): 47% of students got 0/9 pts."
Kids are passing the AP CS test with higher scores, but nearly 50% of them cannot understand concepts that involve a slight amount of thinking. In other words, it's a shitty test or most taking it are stupid.
Link for the Lazy
I had never heard of sparse array as a term, but we discussed the concept nearly two decades ago as a normal design process.
That is like the second easiest pointer-driven data structure...
Do you think they would've gotten it if the question had called it a database of values instead of a sparse array? There wasn't even a need to know what a sparse array is in advance, as the data structure is fully explained in the question. The actual task only requires a single loop over the linked list, comparing each element to the requested row and column number, returning the value if found and 0 if the element isn't in the list. The second half of question three is just a little bit more difficult, but the first half would have gotten them at least some points.
What this tells me is that most of the other questions were answered from rote memorization, but when a new concept is presented, they're stumped, even if it is almost trivial and everything is explained in detail. Pattern recognition.
Sparse arrays is a mathematical abstraction that completely ignores the implementation details. Formally, they are any matrix that has "many" zeros (or null) values. The practical problem is that most useful optimizations around sparse arrays require closely matching implementation details against the problem to be solved. With sparse arrays, implementation details are killers.
For instance, suppose the standard solution is adopted. The sparse array will be organized as an array of linked lists representing the rows, with each row containing another linked list that contains the individual data values. What happens if you want to do a matrix multiply? A matrix multiply requires a column by row lookup and a row by column lookup. One will be an O(1) lookup, and the other will be a O(n^2) lookup. This makes a full matrix multiply an O(n^5) operation, and memory is the least of everyone's worries.
To optimize the code, it is necessary to look closely at how the matrix will be built and used. However, as soon as that starts happening, the matrix multiply decomposes into a bunch of specialized matrix operations. At this point, the abstraction starts falling apart.
For example:
a) Assume the multiplication involves a diagonal matrix. Then the optimum solution is to store the diagonal matrix as a 1xn matrix, and specialize the matrix code. This was the favoured approach from numerical methods in C and Fortran.
b) Assume the multiplication involves a tridiagonal matrix. Then the optimum solution is to store the tridiagonal matrix as a 3xn matrix, and specialize the matrix code. Again, see numerical methods in C and Fortran, or just about any good matrix library.
c) Assume the matrix operation involves a "control-systems" style matrix. One populated row, followed by a diagonal series of rows with one or two elements. The optimum solution is to develop specialized code. For most control systems problems, this matrix never changes.
d) Control systems often have a compact matrix representation involving a series of matrix multiplies. However, if the matrix multiplies are analysed, they become a much simpler sequence of equations that can often be executed in O(n^2) time instead of the longer O(n^3) time of the matrix multiplies. As such, develop specialized code. Both MatLab and Mathematica have functions where numerical operations can be broken down into there constituent formulas and saved as "C" code.
e) Assume we really need to frequently multiply a truly sparse array. Then build two sets of linked lists, one organized by row/column and another organized by column/row. Then both the row and column lookups can be done as an O(1) operation. The matrix multiply is a O(n^3) operation.
f) Just because the inputs to a matrix operation are sparse, doesn't mean the output array is sparse. I'm thinking of Singular Value Decomposition, some matrix multiplies, matrix inverses, matrix pseudo-inverses, and covariance matrices. Also, some matrices that appear in Quantum physics. In this case, matrix operations need to be further specialized to deal with creating non-sparse matrices from sparse-matrices. Additionally, some matrices may need to be rounded to sparse, even though they may be fully populated, like some covariance matrices.
In the end, sparse matrices are simply a descriptive term for a bunch of application-specific optimizations. Sparse matrices devolve into numerical optimizations that no-one cares about unless they are looking at an application that requires the specific numerical optimization. I'm not surprised high-school CS coders don't "understand" them.