Slashdot Mirror


Ask Slashdot: What Does Your Data Mean To Google? (google.com)

shanen writes: Due to the recent kerfuffles, I decided to try again to see what Google had on me. This time I succeeded and failed, in contrast to the previous pure failures. Yes, I did find Google's takeout website and downloaded all of "my data," but no, it means nothing to me. Here are a few sub-questions I couldn't answer:

1. Much more data than I ever created, so where did the rest come from?
2. How does the data relate to the characteristic vector that Google uses to characterize me?
3. What tools do Googlers use to make sense of the data?

Lots more questions, but those are the ones that are most bugging me right now. Question 2. is probably heaviest among them, since I've read that the vector has 700 dimensions... So do you have any answers? Or better questions? Or your own takeout experiences to share? Oh yeah, one more thing. Based on my own troubled experience with the download process, it is clear that Google doesn't really want us to download the so-called "our own" data. My Question 4. is now: "What is Google hiding about me from me?"

24 of 88 comments (clear)

  1. Et tu , Btute? by Camarillo+Brillo · · Score: 2

    My question is ; who else is getting data about me from Google? Does Google sell it outright? I suppose that is their business model, but it would be nice to know how my metadata is distributed.

    1. Re:Et tu , Btute? by PolygamousRanchKid+ · · Score: 5, Informative

      Does Google sell it outright?

      The German postal service, Deutsche Post, was just caught selling data to political parties, which was used in election campaign targeting.

      Deutsche Post responded with the claim that they were not selling the data . . . merely "renting it out" . . .

      Mega giga lame.

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
    2. Re:Et tu , Btute? by Dutch+Gun · · Score: 2

      Does Google sell it outright?

      As far as I understand it, Google sells access to your data in the form of targeted ads, not your data itself, because it's so incredibly valuable. And that access is more in the form of "I want to show ads to this demographic", so probably nothing that could personally identify anyone. In some ways, I suppose that's lucky for us, because they have a very big financial incentive to guard against leaking it.

      Then again, Facebook let all their data escape, so...

      --
      Irony: Agile development has too much intertia to be abandoned now.
    3. Re:Et tu , Btute? by shanen · · Score: 2

      If the google is selling it, I suspect they only sell aggregated forms. From my perspective as part of the product, I would actually like control over the degree of aggregation. I'm not too concerned if something about me is included as part of the average for all the google users within a state or even a large city, but I'd start getting concerned if they are selling parts of my data as parts of extremely small groups such as the people who live in my neighborhood or even the level of an apartment building.

      Perhaps your question could be inverted into the form: What kinds of data would cause the google to report your data to the local police?

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    4. Re:Et tu , Btute? by thegarbz · · Score: 2

      Does Google sell it outright?

      Does Coca-cola sell you its recipe?

      Your data is the only thing Google has to derive value. They sell *you* specifically access to you in a wide variety of ways through many APIs targetting many delivery mechanisms. But the data is what gives them the market leverage they have.

      To me that makes the submission all the more stupid. It's kind of like saying:
      "Coca-Cola prints the ingredients list on the side of the bottle, but it doesn't taste like coke when I mix it together, does anyone know what recipe they use?"

    5. Re:Et tu , Btute? by Dutch+Gun · · Score: 2

      Even more seriously, if the data contains flaws and errors that reduce the value of the data when the google is trying to sell it, we can't correct those problems.

      Is that actually a problem for us, rather than Google? I mean, what they general sell is targeted advertising. Why would you or I really care if their data is correct or not? I don't really understand that aspect of your question.

      The data that credit-reporting companies have on us impacts our daily lives about 1000x more than what Google collects about me, because they draw conclusions about that data (a credit score) that have very definite real-world effects on me in the form of loan rates or even credit approval / denial.

      --
      Irony: Agile development has too much intertia to be abandoned now.
  2. Re:In exchange for by shanen · · Score: 3, Interesting

    Uh? What question are you trying to answer? And how does that question relate to any of the questions I posed? At first I thought you were trying to say something about derived data, but now I have no idea...

    However, one of the categories of data I was looking for was data about me from other sources. For example, in terms of marketing my data to the advertisers, such external data as my credit history would seem to be highly relevant. Perhaps I can find my credit report somewhere in there?

    In the original questions I left out one of the peculiarities I already discovered. A lot of "my" data that the google sent me was actually links to other places where I had posted things. In other cases the links seemed completely unrelated to me, as with a Google Play app to some game I don't believe I've ever downloaded or played.

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  3. Do you really think they'd tell you? by Rick+Schumann · · Score: 2

    Seriously, do you really think that with anything short of a court order or an order from Congress (or maybe a gun pointed at their heads) they're really going to show you how much actual data they have collected on you? When you signed up for their 'services' using your real name, you handed them the Keys to the Kingdom, regardless of any agreement (that you likely never read in the first place). The only way to win this game was to have not played in the first place.

    1. Re:Do you really think they'd tell you? by shanen · · Score: 2, Interesting

      Highly principled stand, and I congratulate you [HermMonster] for your energy and enthusiasm and even for your efforts, but I think you are deluding yourself. One reason is that by attempting to hide yourself you would actually be attracting attention to yourself. Quite possibly, you are even rendering yourself a marked man and the FBI is following you around trying to figure out what you are trying to hide.

      More seriously, some of the services cannot be used without leakage. Let me take an innocent example, the case of using a private browser window to evade a paywall. This is something I've started doing fairly routinely when using Google News. I've made the calculation that I'm willing to let the google use my identity to recommend stories that I'm interested in, but what am I actually hiding there? I'm willing to assume that the paywalled website is fooled into thinking that I've never been seen before (by the website), but am I actually fooling the google? I don't think so precisely because it is clearly in the google's interest to detect the link translation process.

      In concrete terms of the data that I just downloaded from the google, I suspect that there are browser histories in there, including information on direct and indirect links to other websites. The new derivative question is whether or not the google is reporting on this to the paywalled websites?

      So far this topic seems to be generating lots of new questions in my mind, and I haven't found many (any?) answers.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    2. Re:Do you really think they'd tell you? by rot16 · · Score: 2

      Most of my e-mails come or go to @gmail.com, so running my own e-mail server is almost useless.

  4. Re: "What is Google hiding about me from me?" by shanen · · Score: 2

    Uh? Are you saying that they are hiding it by sending it to me? If so, then what I am seeking could be rephrased along those lines. Right now it looks like I have a gigantic pile of data that's even messier than my actual life, which is saying something.

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  5. That's an embedding vector by Visarga · · Score: 4, Insightful

    The 700 dimensions vector (if it's true) is not something you can make sense of. It's an embedding vector that represents your characteristics in relation to all the other people. Each individual dimension doesn't have a meaning.

    1. Re:That's an embedding vector by shanen · · Score: 4, Interesting

      I think I agree with you as far as you went, but in that case part of the information I am asking about is the context to interpret the shape of the categorization space and where I am within it. That is also in terms of the relationships to the parts of my data that contributed to my location and to the accuracy of that location. The google can reveal a lot about the space without exposing any of the individuals within it.

      Perhaps a more concrete example will help? For example, can the google look at the vectors of spouses to assess how well their marriages are liable to work? Just asking for a friend, since I'm pretty sure my wife would NOT let me look at her data. She'll barely tell me when breakfast is ready.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    2. Re:That's an embedding vector by shanen · · Score: 3, Interesting

      Me thinks you [Lanthanide] are projecting, but I will confess that I never did understand how my own parents stayed together. My condolences to your much better half. Or perhaps better to respond with some variation of the old grading joke: "I was one of the students who made the dean's list possible!"

      That was just minor tit for even more minor tat. The most appropriate response would probably be to ask "Don't you have anything to say on any aspect of the actual topic at hand?" If you know nothing and have nothing to say, then you can always say nothing.

      I actually did consider raising the issue of using personality characterization for marriage guidance and counseling. I would not be at all surprised to find out that some branch of the google is exploring related business opportunities. However my own interests these days are probably much more mundane. I'm just trying to figure out who's treading on my freedom.

      By the way, I don't think the google is the worst abuser of our personal information. In a sense, the google's motives are pure insofar as they are focused on the money. Almost every question about what the google is doing with our information comes back to the answer "... because they think it will increase their profits."

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    3. Re:That's an embedding vector by shanen · · Score: 2

      That's sophistic BS. If there is any projection there, it's that I would respect her privacy as much as I would hope she respects mine.

      As matters stand now, you sound like a child who was probably in diapers when I was wandering though my first flame wars. I knew flame warriors who actually enjoyed themselves, but I've always regarded ad hominem argumentation as a waste of time, but apparently unavoidable when hominems are involved.

      I didn't introduce the gaslighting topic, and I would even argue that I made a sincere attempt to redirect the conversation towards a more productive course. Of course I'd never waste the efforts in your case, except perhaps by encouraging you to prove me wrong. Go ahead, make my day by writing something worth reading.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  6. I got a ZIP file by mnemotronic · · Score: 2

    I used the provided link to "download all your data" and had it save a "takeout" ZIP file on my Google Drive. I then tried adding a few files to drive and removing them then "really" removing them. In both cases a "removed" file (in the Trashcan but not "really" removed) did not appear in the Takeout archive. I then created a new Takeout archive and had it send it as an email to my gmail account. In both cases it's everything from my drive, calendar, all emails, contacts, bookmarks, photos, etc.

    In the expanded ZIP under the root "Takeout" dir there's an "index.html" with details on all the files. The 2nd archive i created even contained the first archive in it's entirety from the "Takeout" folder on my Drive.

    Are you seeing something other than this?

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
    1. Re:I got a ZIP file by rot16 · · Score: 2

      This list is missing your tracked browsing history. For Android users there is GPS tracking history and call and SMS history.

    2. Re:I got a ZIP file by shanen · · Score: 2

      I'm still trying to consider the differences between what you received in one gigantic file versus the smaller pieces I received... I feel my earlier response was not helpful.

      Let me say that my original idea about the structure is definitely false. I speculated that the links in the index.html file would include relative references to the component files. That is NOT the case. I was even reduced to searching the google's documentation for such information.

      Now you have me speculating that the redundant files are all unique, even though the folder names appear many times. I was looking for something along these lines, but I think I have to describe it in terms of an algorithm:

      (1) Add files until the 2-GB limit is about to be breached.
      (2) Find smaller files from various directories until the 2-GB limit is exactly satisfied.
      (3) Start the next zip file and return to Step (1).

      What happened at the end is still unclear, but I'm going to attempt to reconstruct a single takeout file on that theory, hopefully before this question has expired on Slashdot so I can share that part of the information. However, even if this approach works, I think it will only reduce to the interpretation problem for the Facebook data, which was basically similarly mysterious even though the amount of data was so much less.

      --
      Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
    3. Re:I got a ZIP file by swillden · · Score: 3, Informative

      This list is missing your tracked browsing history. For Android users there is GPS tracking history and call and SMS history.

      If location history is turned on, it should be there in the download. Mine is.

      SMS messages are not uploaded to Google, unless you're using Hangouts for SMS (which you can't do anymore unless you're using Project Fi as your carrier). Many people wish SMS were backed up, so that it could be restored onto a new device. As it is, when you get a new phone your SMS history is lost unless you copy it across to the new device (which recent Android versions will automate for you).

      FWIW, Android P is enabling Android backups to be encrypted in a way that ensures that Google cannot read them. That will in turn enable more data (like SMS, I'd expect) to be backed up and restored since it won't raise privacy concerns.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  7. Re:Stalking by donaldm · · Score: 2

    I cant believe we have deteriorated as to let a corporation stalk us

    With Google Chrome you can turn many of their tracking features off although if you are feeling paranoid there are other web browsers you can use. It does get more difficult to control or stop information being sent to one or more interested parties if the operating system you are using is configured by default to do so and you can't blame Google Chrome for that.

    Like it or not any site, you visit with a web browser will log your information as metadata. Under normal circumstances, metadata is only used for debugging purposes unless a court order is presented to the appropriate managers, (ah the good old days) however depending on the privacy policies of the company that metadata can be sold to interested parties.

    It must be noted that most computers even from the 1950's onward logged metadata which as I have explained before is extremely useful for debugging purposes. Under normal circumstances, metadata was only kept for a few days or months (depends on company policy), however, it appears metadata can be used for other purposes and depending which country you live in there may be government policies in place that require retention of metadata for years.

    BTW. I run Linux as my primary operating system and I have instant access to four web browsers, those are Google Chrome, Firefox, Konqueror and Qupzilla. There are other browsers I could install (takes about a minute or two) but I choose not to. No matter which browser I use any site I visit will log my activity as metadata even if I am using incognito settings. At least I don't have to worry that my operating system is sending data to interested parties.

    --
    There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
  8. Eh? by shanen · · Score: 2

    Sorry, Trax3001BBS, but I have to conclude that you are a terrible writer. Perhaps indifferent to communicating? If so, why write at all?

    I'm really trying to strain my imagination for some meaning in any of your comments. Perhaps your last comment is supposed to mean that you think I'm advocating on behalf of Facebook in some sense of its superiority to the google? If so, I would say that I basically have the same questions (and concerns) about the Facebook data, even though there was so much less of it. At least based on Facebook's claim to have three orders of magnitude less data about me...

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
  9. Some answers from a X-Google engineer by Anonymous Coward · · Score: 2, Interesting

    2. Google doesn't have all that data unified. The takeout project is actually the most unified view of your data.
    3. Googlers in general doesn't have access to your data. Systems do, and use it in an automated fashion. There are break glass access for some engineers for some types of troubleshooting - but this triggers alarms.

    In general, during my > 5 years at Google, I realized it's a company I'll trust with my data for many years to come. The "Data Liberation Front" who ensures that data takeout is available is huge. Also, GDPR in Europe ensures that data takeout needs to be very easy for many years to come. Google was just years ahead of the law there.

  10. Raw data vs Derived data by mrwireless · · Score: 2

    The main thing to understand here is that there are two types of data:

    - Your raw data
    - Their 'derived data'

    This 'Derived data' (as the databroker industry calls it) is where the real value is. These algorithmically formed 'opinions' about you are the valuable distilled product they sell. In the USA this derived data doesn't belong to you. It's protected as a form of corporate free speech.

    In the EU this is a little different, as these 'opinions' are also considered personal data. The question is to what extent you get access to it. For example, the threshold for personal data is when a piece of data can be traced back to less than 11 people. So the trick here is to create opinions about small groups of which you are a part. For example: knowing that someone with cancer lives in one of three adjacent houses, that is not considered personal data.

  11. Re:Where were the browser histories? by swillden · · Score: 3, Insightful

    So far in my explorations of the data I haven't seen any browser history data, though I strongly suspect the google is collecting it

    Unless you have web history enabled (check the settings in myactivity.google.com), I'm quite certain Google is not storing your browser history. I think this is a distinct question from tracking your web browsing through Google Analytics, assuming you haven't opted out of that. In the latter case, Google gets information about the sites you visit from those sites and uses it to update your interest profile, but doesn't store the actual visit history.

    Note that there is almost certainly data Google has about you which it cannot show you, because it can't be 100% certain that you are you. Data derived from logged-out interactions can be tentatively correlated with you, but since there's no way to be completely certain you're the same person, it would be a violation of the privacy of whoever actually had that logged-out interaction (which might be you) to show it to you. In the case of logged-in interactions, of course, it's reasonable to presume that anything done while logged into account A can be safely shown to account A.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.