February 2012
1 post
Google's 2004 Investor Letter Describing the... →
Posting for easy access during a talk.
January 2012
8 posts
Python Unicode Encoding Bugs
It’s a tricky business to work with applications that pass around Unicode strings as ASCII byte strings. Small bugs can lead to lost data or data that is processed differently by different programs. Here’s an example.
One program outputs the following Unicode string representation of a Tweet:
C:\\Documents and Settings\\u30e6\u30fc\u30b6
If you process this string...
Model updating with terabytes of data in real-time
Readers of my Facebook page may remember that I’ve been writing about model updating high frequency trading data on an Intilop at sub 100-ns speeds. One of the tricks to doing this …… is that *all models must be updated in a single pass and the data must be thrown away*. Throwing data away sounds like heresy, but it is not. On my Facebook page, I described how to take the mean...
Does the Bakshy Study Demonstrate An End to the...
Bakshy (Facebook’s Data Team) studied who influences whom on Facebook. One of the findings is that users propagated novel information from weak ties. In interviews, Facebook’s PR people are claiming that this indicates an end to the online echo chamber. This is highly unlikely.
Across my friends and family, both strong and weak ties, one of the clear ramifications of Facebook sharing...
Hybrid Operating Systems →
It is entertaining how Google has copied Microsoft’s strategies. alexainslie:
“My goal is for web apps to become compelling enough to force OS creators to hybridize their platforms. In other words, I’d like to see rightward movement in both the app and OS spectrums.” - Boris Smus
Michigan's Big Data Success →
The health care savings and increased tax compliance are only two of the benefits from incorporating data from multiple databases into behavioral models.
Business Intelligence versus Data Science
Business Intelligence (BI) professionals are beginning to connect with me on LinkedIn, Twitter, and Facebook. As I read their commentary (such as this), I realize how the market has split to support complex, real-time prediction tasks.
When I think about working on real-time prediction tasks, I’m worried about throughput. The easy part of the job is fitting a model. The difficult part of...
Mobile Healthcare Systems Are Only As Good As the... →
Mobile health care technology that fat traps you into a Fit4Life scenario?
December 2011
3 posts
The Fat Trap Article
It’s worth bookmarking this article on current research results in weight loss for future review. Our 2011 CHI research paper on the limits of persuasive and machine learning weight loss technology sees about 30 downloads a week from my personal site. It’s no wonder that it’s popular.
Data Scientists Disliked in Corporate America?
A recent post by mathbabe about the desirable traits of data scientists has met with a lot of backlash in the community. In the comments section of her blog, on my Facebook page, and in my Twitter DMs, I’m seeing notes from data scientists that tell me that corporate America doesn’t like data when it disagrees with corporate desires.
I’ve worked in corporate America and...
Why I like Stanford's Online Machine Learning...
The University of Washington is making a big push for additional funding to support UW Computer Science and Engineering. Some people have suggested doubling the funding to increase the number of graduates per year by less than a thousand.
I’m skeptical, but mostly because I think integrating courses like Stanford’s online machine learning courses into someone’s job will result...
November 2011
4 posts
Managing Bias - Variance Tradeoff in Machine...
Update: Several industry professionals have suggested that aspects of this note are incorrect. However, the academic literature generally supports every one of these “cliff’s notes”. The most common experienced industry veteran complaint is a riff on “more data solves every learning problem”. While true in the abstract, in practicality, there are problems where even...
Improving Performance of the Samsung 4G/LTE MiFi
If you own a Samsung 4G/LTE MiFi, you may, like me, be fed up with resetting the device because your Macbook or PC loses it’s ability to communicate with it. I think I’ve solved that problem.
To solve it, I do two things:
I set the WiFi type to b/g/n
I disable power saving mode
Since I’ve done this, the device no longer times out while I’m writing a long email and...
Understanding Why Stock Returns Look Like a Random...
This is the first draft of text that I need to write for both a technical (academic PhD) audience and lay people. The two versions will substantially diverge, but I’m going to use this draft to get feedback from both sides. If you’re reading it, it’s probably because I’ve solicited feedback from you.
What is Nassim Taleb talking about in his books, especially Fooled by...
October 2011
6 posts
Cooking with WeatherDirect.com Sensors
If you’re sitting in a house in Ithaca, NY with no furniture or cooking supplies and you’re thinking: “I would really like to cook some Piggery Pork perfectly, but I don’t have the supplies to do it.” This post is for you.
I have a WeatherDirect.com weather station and sensors in my house. This system cost me about $50 and it measures the outside temperature of the...
CMU Study Shows Nothing, Press Comments on It →
This article caught my eye because it talked about the power of defaults, which I believe in. It didn’t wander into the weeds until it talked about research at CMU which had researchers measuring whether users could protect their browsing privacy. Except the measurement method is invalid. The researchers created a definition to stop online behavioral tracking that isn’t effective and...
How did I know that Oil prices would drop in 2008
One of the questions that smart people ask me when they learn that I won a 2008 Forbes investing contest with only a few trades is: “How did you know that oil prices would decline in July 2008?” The answer is that I did not know, but the fundamental data seemed to indicate an improbable growth in productivity, and I bet against the veracity of the data.
In plain terms, my bet...
I Like Wrightsock →
I’ve been wearing Wrightsock Double Layer Coolmesh socks for at least 10 years, and I had a great customer service experience with them, so I thought I would relay it. Recently, the company changed from Coolmesh to Coolmesh II, which has a different mix of fibers. The original Coolmesh were prone to developing holes in the external layer of the sock due to wear. While the holes didn’t...
Machine Learning Prototyping in Octave
Octave is a useful, free resource and can sometimes beat Matlab for prototyping machine learning solutions to estimate their effectiveness. It’s good to see Andrew Ng pushing it for his open Stanford machine learning course.
The official documentation is here. Some useful tutorials on Octave include 1 and 2.
Bank of America Charges Me $36 for Switching...
On Saturday, I received mail from Bank of America notifying me that I was overdrawn on a checking account and if I failed to take immediate action they would destroy my credit rating and make it impossible for me to open a bank account at another financial institution for five years. Since I always enjoy my interactions with Bank of America, I visited my new home bank in Kirkland, WA this...
September 2011
10 posts
I only care about the Netflix / Qwikster Customer...
My mail / news reader produced a collection and summary of 40 different articles about Netflix’s imminent implosion. I don’t care about whether Netflix will Blockbuster itself sooner rather than later. But I do care about the customer experience.
Patti is the main Netflix user in our household. I rarely, if ever, use the web site or watch the movies. And we rarely go to the theater...
When are people going to learn? →
Web tracking is pervasive and technically more sophisticated than cookies. “Research” in the scientific community is only 13 years behind previously published industry knowledge.
A link for programmers that has some relation to... →
I need a resting place for this link.
Links to Climate Monitoring Systems
I’m in the market for a climate monitoring system. Rather than leave all of these tabs open in my browser, I’ll copy the URLs here.
http://avtech.com/Products/Temperature_Monitors/TemPageR_3E.htm
http://www.ambientweather.com/latx60uset.html
http://www.protectedhome.com/documents/TX60UIT%20Manual.pdf
...
5 tags
Surprise! Facebook privacy →
Facebook doesn’t care about the Like buttons you don’t click. They care about receiving an HTTP GET request at their server for every web page you view so that they can collect the IP address (and potentially the Facebook cookie info) so that they can build a profile of each user’s browsing habits and resell that data to industry or build products around it.
alexainslie:
“If...
New Performance Metrics for Social Security?
Social Security is a life annuity or regular-payment annuity in the United States. It is an earned benefit from years of substantial contributions. Your estimated return on your payments to Social Security is just under the initial payment value, adjusted for inflation (using inflation estimated by the CPI).
As a life annuity, Social Security performs comparably with many annuity products offered...
Some Videos for Former and New CS 5150 Students
I usually get the impression that most of what I was trying to teach CS 5150 students about product definition was ignored. Since Prof. Arms worked at Next, he may get a kick out of these old videos, which show Steve Jobs selling the beginning of the product vision for Next to internal employees.
Part I.
And Part II.
A little less Greyhound at the airport →
Blueberry Lavender Ice Cream
The trick to making blueberry ice cream is to understand whether the blueberries are in the perfect state of freshness. If they are, then they can be folded into the ice cream base after it spins, assuring a fresher taste. If they are not in the special place of freshness, then cooking the blueberries into a syrup and mixing it with the base yields a better result.
To identify whether your fresh...
Online Ticker Search Intensity Predicts Abnormal... →
A student is working on replication of this paper with me for our discussion on Gold pricing. Wall Street geeks may find it interesting to read before the next post.
August 2011
14 posts
Dissertation Correction That I'm Memorializing
Like many students (and even future Nobel Prize Winners such as Milton Friedman), I sometimes misspeak about the definition of a confidence interval. I’m leaving this as a placeholder to refer back to when making formalization corrections.
Searching for Economic Models that Work
Bill Gross (at Pimco) is having a bad year in bond returns because he bet against the Treasury. When Mr. Gross made public his bet against the Treasury, I thought he was nuts. Perhaps this was because I was only recently (in the last 5 years) trained on the IS-LM economic model that is working to predict bond price movements with higher accuracy than Pimco’s models.
This is yet another...
Sous vide scallops in 122 degree water bath
Before I forget what I did, I want to document the sous vide scallops recipe from last week:
Dump a pound of raw wild scallops into a bowl. Season with salt and pepper. Place them in a vacuum seal bag interspersed with 2 tablespoons of coarsely chopped butter. Seal. Submerge the bag in a water bath held at 122 degrees F for one hour or more. Remove scallops and brown one or more sides in a hot...
Special Report on High Frequency Trading (HFT)
I did a quick scan through the special report on HFT’s impact on the market. Here’s the money quote:
James MacIntosh, investment editor of the Financial Times, remarks that fundamental information is no longer reflected in stock pricing (see MackIntosh 2010). He suggests that pricing is now driven by market sentiment and possibly by the increase in trading on trends and patterns.
...
Facebook and Twitter lead Web browser tracking
I just looked through the network traces of my home for a few days. Twitter, Facebook, and a few other companies track the vast majority of web pages that the household visits. Google is trying to get in on what little they don’t already have of this market with +1.
Ten years ago it was a big deal because one company could track 20% of web views. Now it’s the new normal.
Gold analysis part 3, can Blogs or Twitter predict...
Journalists and scientists are extremely excited about methods for predicting the future based on the collective mood from Twitter, Blogs, or message boards. Should they be? Rebecca Greenfield in the Atlantic Wire recently reported that the Twitter hedge fund beat the market and other hedge funds. One such method was self-published on arxiv.org and is supposedly a basis for the hedge fund. Another...
Gold analysis part 2, a question for the social...
For an industry awash in data, the empirical financial modeler faces an interesting problem — the quality of the data is always suspect. For instance, say you want to rank, by country, the percentage (of weighted cost) of non-performing real estate debt. Based on the published statistics, the ranking would be China (lowest), followed by Germany, followed by the U.S. The pure mathematical...
Why can't my newsreader exclude any story that...
You would think we would have this capability in 2011.
Investigating Gold's Pricing Factors →
An open question in finance is whether gold pricing is inversely correlated with stock market performance and, therefore, serves as a hedge against asset distress. Some people in finance believe that gold is excessively bid up by wild buying by speculators while others believe it is undervalued. The linked paper discusses a theory and an empirical model for gold pricing. The question that...
What am I supposed to do with the 2011 CHI Award?
This year, CHI began sending certificates of accomplishment to authors of papers that receive awards of distinction for their writing. Now that we have received them, what exactly are we supposed to do with them?
The Continued Propagation of the Fake Microsoft...
It doesn’t surprise me that many people continue to propagate the fake IE user study that claims that IE users have lower IQs than Chrome, Safari, and FireFox users. What surprises me is how many academics did it.
Even well respected academics did not read the original study web site or the explanation about the study given to the press. Had they read them, they should have been able to...
Actually, you would run a business that way
When talk turns to politics, many of the people that I meet tell me that the Federal government spends too much money. “You would never run a business that way” they say. I’ve held my tongue in these chats. But, there is not a business person I know that wouldn’t accept huge amounts of capital at interest rates between -0.2% and 3.68%. That’s free money.
The Overreliance Myth vs. Incentive Systems
Sarita Yardi tweeted earlier today about Sinan Aral’s Tweet about a blog post from Jason Gots about the dangers of exclusive reliance on mathematical models to explain human behavior. Got’s post is interesting to me only because it reinforces the myth that Wall Street relies on these models and this reliance was, in large part, causal for the 2008 financial crash.
The overreliance...
July 2011
11 posts
Lupa's Pork Confit is Good, but This Photographer...
I’m testing the +1 button from Google. If you can see the +1 button, you can plus one if you think this photograph on JP’s blog of Lupa’s pork confit is really, really bad.
Unable to Make Mac OSX 10.7 Lion Work, Apple...
After failing to find a way to make Mac OSX 10.7 Lion work on my 6 month old, Late 2010 Macbook Air (MBA), Apple refunded the cost of OS X Lion to me today.
As I’ve noted in previous blog posts, Lion causes my MBA to run at extreme temperatures under minimal operating conditions and the response time of the OS can be 10 seconds to scroll up a single item in a list.
Mac Mail in OSX Lion takes 10 seconds to render a...
My Zero Hedge RSS folder has 12876 articles in it. Snow Leapord renders a new article (when the cursor moves in the inbox pane) in less than a second on my top-of-the-line Late 2010 MBA. It renders in 10 seconds on the newly upgraded Mac Mail in OSX Lion (after conversations have been disabled).
OS X Lion Upgrade Remedy - Disable Spotlight and...
After wasting time with AppleCare support (who appear to have no idea how OSX actually works and how to troubleshoot it), I finally solved my continued OSX churning by disabling Spotlight indexing on all volumes, completing all configuration changes, shutting down all applications, reenabling Spotlight indexing, rebuilding the Spotlight indices and then letting the computer sit overnight to index...