Readers of my Facebook page may remember that I’ve been writing about model updating high frequency trading data on an Intilop at sub 100-ns speeds. One of the tricks to doing this …
Bakshy (Facebook’s Data Team) studied who influences whom on Facebook. One of the findings is that users propagated novel information from weak ties. In interviews, Facebook’s PR people are claiming that this indicates an end to the online echo chamber. This is highly unlikely.
It is entertaining how Google has copied Microsoft’s strategies.
alexainslie:
“My goal is for web apps to become compelling enough to force OS creators to hybridize their platforms. In other words, I’d like to see rightward movement in both the app and OS spectrums.” - Boris Smus
The health care savings and increased tax compliance are only two of the benefits from incorporating data from multiple databases into behavioral models.
This is a graph of the frequency of Tweets by character length (1 to 140 characters) from the 500 million Tweet sample used by Scott Golder and Vladimir Barash (sample extracted from Twitter in late 2009) in their academic papers. View this graph in comparison to a Twitter employee’s set of graphs that describe the typical length of a Tweet. Current Tweets have a number of changes (different text processing in the twitter-text API, wrapping of URLs, different application distributions, etc.) Note that the early peak seen in the current Twitter graphs is much less pronounced. Another note is that this graph shifts a little depending on how the Tweets are processed. More on that some other time.
Thanks to Scott Golder and Vladimir Barash for letting me use this data.
Business Intelligence (BI) professionals are beginning to connect with me on LinkedIn, Twitter, and Facebook. As I read their commentary (such as this), I realize how the market has split to support complex, real-time prediction tasks.
When I think about working on real-time prediction tasks, I’m worried about throughput. The easy part of the job is fitting a model. The difficult part of the job is implementing the model in a real-time prediction system. I’ve been working with a few friends on a toy example for explanatory purposes. The toy is an iPhone application that recognizes a wine label (from its picture) and then integrates with Cellartracker to allow insertion/deletion from Cellartracker’s inventory.
Mobile health care technology that fat traps you into a Fit4Life scenario?
It’s worth bookmarking this article on current research results in weight loss for future review. Our 2011 CHI research paper on the limits of persuasive and machine learning weight loss technology sees about 30 downloads a week from my personal site. It’s no wonder that it’s popular.
A recent post by mathbabe about the desirable traits of data scientists has met with a lot of backlash in the community. In the comments section of her blog, on my Facebook page, and in my Twitter DMs, I’m seeing notes from data scientists that tell me that corporate America doesn’t like data when it disagrees with corporate desires.
I’ve worked in corporate America and academia, and one thing is inevitably true: every decision is political. Many of the people that work with data seem to believe that their analysis is objective and this should shield them from politics. They need an injection of reality. First, because all data is situated and subjective. Second, because the use of that data is always situated. These are attributes of political problems, and scientists should not be surprised that when there is a fight over scarce resources there will be politics.
Let me add another list item to the list of attributes that mathbabe finds useful for data scientists: a good data scientist is politically aware. They know which stakeholders have an interest and their motivations. And, instead of tanking programs, they do everything they can to find a way to make them successful instead.
The University of Washington is making a big push for additional funding to support UW Computer Science and Engineering. Some people have suggested doubling the funding to increase the number of graduates per year by less than a thousand.
I’m skeptical, but mostly because I think integrating courses like Stanford’s online machine learning courses into someone’s job will result in better outcomes for students. Stanford’s course is much lighter than a typical undergraduate or graduate course on machine learning, but it may be more useful. It establishes a common set of code, cases, and vocabulary around machine learning and it is unobtrusive enough for a programmer working at a large company or a startup to do each week and still have a full time job. The “assignments” become integrating the lessons learned into the company’s business.
If we coupled this type of learning with a certification program, it would be more valuable for students and industry than taking a student out of the work force and asking them to spend a lot of time commuting (or living) in an undergraduate or graduate program. The learning could be more integrated with their job. And the loss of income/cost of education would be especially tasty for young people that have become over-burdened with debt from the modern American university system.
This won’t work for everyone, but for most tech workers, it should work much better than the current system.
Update: Several industry professionals have suggested that aspects of this note are incorrect. However, the academic literature generally supports every one of these “cliff’s notes”. The most common experienced industry veteran complaint is a riff on “more data solves every learning problem”. While true in the abstract, in practicality, there are problems where even large amounts of data are too noisy if the features are not designed well. Designing features well is beyond the scope of this Cliff Note for students that are beginning to learn to implement machine learning in real settings.
If you own a Samsung 4G/LTE MiFi, you may, like me, be fed up with resetting the device because your Macbook or PC loses it’s ability to communicate with it. I think I’ve solved that problem.
To solve it, I do two things:
Since I’ve done this, the device no longer times out while I’m writing a long email and forces me to reset it manually.
My idea for butter poached meat using a Sous Vide Supreme was refined by the chefs at French Laundry when Patti and I ate dinner there last week. After some testing time, I can give you the method:
Note: I take no responsibility for your outcomes with this recipe.
This is the first draft of text that I need to write for both a technical (academic PhD) audience and lay people. The two versions will substantially diverge, but I’m going to use this draft to get feedback from both sides. If you’re reading it, it’s probably because I’ve solicited feedback from you.
What is Nassim Taleb talking about in his books, especially Fooled by Randomness?
Nassim was one of the first people to write about and internalize into a stock-picking regime that returns from stock and bond picking follow a random process. The title of Nassim’s first book is indicative of the format — it contains stories and reflections about people (academics, professionals, etc.) that are fooled into believing that they are doing something better than returns that would be generated by a random process.
Daniel Kahnamen’s new book, Thinking Fast and Slow, also discusses this topic in detail. Kahnamen’s Nobel Prize and early books, such as this, discuss ways that we fool ourselves into believing that we have a system or approach that is better than a random process would produce over time.
You’re Not Alone in Being Fooled
This concept is so difficult to grasp that graduate students, faculty at Ivy academic institutions, and even Nobel Prize winners, such as Milton Friedman (insert link to the 1948 example) routinely make big embarrassing mistakes about this. One of the reasons that you study as a graduate student is to train yourself to stop making these mistakes (it is still difficult, especially if your faculty don’t know and understand the problem).
Why do we believe that stock returns follow a process that approximates a random process?
If you examine 1500 stock pickers and examine their returns against indexed benchmarks, the returns generated by the pickers follow a normal distribution. This does not imply that all stock pickers have equal performance. It implies that some pickers will do better than others some times. Much better. We will expect to see cases where a stock picker beats the benchmark for 25 years in a row. Every random process has some probability that someone will do the equivalent of flipping heads 25 times in a row. And, when you get enough people doing something, you are bound to have this event occur.
What is the implication for my investing (especially retail investing)?
First, it is extremely unlikely that your performance will exceed the mean over time unless you are engaged in arbitrage and you don’t realize it. One type of arbitrage (that is also illegal) is insider-trading. In this type of arbitrage, the two markets are the current market and the future market. You know the future market price (because of inside information) and you profit from capitalizing upon the known imbalance between the current market price and the future market price.
Second, high frequency traders and market makers are engaged in arbitrage, and they are always fighting against you. These systems know the future price because they are both examining data across a broader range of activities faster than you are and they are dedicated to this activity day in and day out such that they develop statistical histories that represent successful strategies. These two implications together imply that if you are not engaged in arbitrage, it is extremely difficult to beat random performance. (Remember, beating random performance does not imply that you won’t be able to beat the market average once, or twice, or even 20 times in a row! It implies that real returns will probably be distributed normally, at best. Before costs and fees. I can play slots for 5 minutes and win $20. If I don’t walk away, I have to expect that the house will eventually take the money back. Plus more.)
Third, if you are developing an arbitrage scheme, i.e. a plan for picking stocks or bonds better than a random process, I hope you have a risk management plan for handling what happens when the market assumptions that enable your arbitrage process disappear. This is what professional investors are supposed to be doing. They find an arbitrage process that works for some period of time (such as an irrational pricing of Italian bonds that allows profits), they make money from it, and then they need to be ready to move on when that opportunity disappears. When they are not ready, you get massive losses like we’ve seen with Long Term Capital Management, the housing crisis, etc.
If you’re sitting in a house in Ithaca, NY with no furniture or cooking supplies and you’re thinking: “I would really like to cook some Piggery Pork perfectly, but I don’t have the supplies to do it.” This post is for you.
I have a WeatherDirect.com weather station and sensors in my house. This system cost me about $50 and it measures the outside temperature of the house, the inside temperature, plus the temperature from a single waterproof probe. It displays these temperatures on the weather station and it uploads them to weatherdirect.com so that I can analyze them. People use these systems to track the temperature of their house, their refrigerator, their pool, their hot tub, etc. when they are out of town.
Now, if I want to cook a pork chop, I know that all I need is a Ziploc bag and a cooking vessel that can store water (hopefully without a large change in temperature). Since I took my fancy kitchen thermometers with me to Bellevue, I can use the WeatherDirect’s waterproof probe to approximate my more sophisticated gear to produce great meals that look like this.
The process is basically the same. Keep the pork at about 135 degrees for about an hour. The Weatherdirect probe will tell you the temperature of the water. You just need to keep it around 135 degrees for about an hour. Plus or minus 5 degrees doesn’t really matter.
Cheers!