Introduction
I data. I like all types, but I'm biased. I have a special interest in my own, personal data. In particular, I'm trying to assemble as much data as I can on my web usage habits and preferences with the end goal of building the dataset needed to train a truly personalized, generalized AI assistant (overfit to me, as I like to say).
That said, as a data professional and hobbyist, I'm often saddened by what little access companies give us to our own data. Given this, I take steps to track and / or acquire this data myself. Some of this is fully legal; the rest I'm not so certain. Let's just say it's a legal gray area. But hey, I like gray!
Nevertheless, this grayness is a big reason why a lot of my most interesting projects are not publically available anywhere. So in keeping with that trend, what follows is a fairly comprehensive list of the data I track (or acquire) on myself, but I won't get into too much detail on how this done / share any code. (One hint: a well designed Chrome extension can be quite useful.) If you'd like to know more on any or all of these, ask me in person.
Data I Routinely Collect
- Browsing history
- Google searches / preferred results
- Spotify listening / saved songs (API)
- YouTube subscriptions / viewing history
- Netflix viewing history / ratings
- Reddit feed / what I read
- Hacker News feed / what I read
- ESPN feed / what I read
- Financial transactions / data (Mint)
- More to come?