Saturday, March 28, 2009

Machine learning might be too easy, but so what?

John Langford argues that machine learning is too easy. He doesn't specify exactly what he means by this, but it seems to be that it's possible to publish papers and make a career in one area of machine learning without even understanding the core ideas of other areas.

Apparently, he thinks this is a problem. But why?

I could agree that it would be a problem if we were talking about science here. But we aren't. I've long since stopped pretending that I do science. (Except for the remote possibility that something I do might have an impact on a real science, such as biology or psychology.) We are just not studying the natural world.

I don't think of it as engineering either, as an engineer is meant to construct that that actually work and make economic sense. Most of what I do is pretty far from being useful or even reliable. Instead I think of myself as an inventor, practicing blue-sky invention of algorithms and toy applications without direct economic pressure. (Role model: Gyro Gearloose.)

So in a field of invention where people are inventing things following different paradigms and variations on a common theme of learning/optimization, is it a problem that most of the inventors have only a very hazy idea of what the others are doing? Not necessarily, as we are not all working towards the same goal (at least in the near term) and don't need to agree on anything.

Of course, it's great when you can combine knowledge from different research fields and come up with a nice synthesis - this is an almost surefire way to "be creative", and it's necessary that someone does it every once in a while. But for the most part, I don't feel like digesting hundreds of pages of dormative formulas in order to understand e.g. statistical learning theory. I feel my time would be much better spent just getting on with my own inventions, and reading up on stuff that's directly relevant to it (or seemingly completely unrelated, in order to look for new applications).


Will Dwinnell said...

I didn't understand exactly why he though that machine learning being "too easy" was bad, but I don't understand your assertion that machine learning is not economically valuable. I can only speak for my own work (right now in the credit card industry), but I assert that inferring things from data in industry is tremendously valuable.

Daniel said...

Hi Julian,

I interpreted the article as "machine learning makes solving lots of problems easy" which is easy. It's also true that the "science" can sometimes get lost.
Based on my experiences speakig with ML people (or should that be CI people?) I think that some number of researchers are hard pressed to explain why their algorithms behave the way they do or why a certain method (or weighting set) works better than another.

Is that a bad thing? I don't know. It might be if you're claiming to do science because you're not "mining all the gold" so to speak from a given research project. It's interesting that (again, in my experience) not all fields which apply CI/ML methods suffer from this problem. The Data Mining community for example seems to have a rather solid grasp on the advantages of "abc" algorithm vs. "xyz".

Daniel said...

hmm.. that first sentence above really should read:

I interpreted the article as "machine learning makes solving lots of problems easy" which is *true*.

Damn fingers!

Donny Viszneki said...

I believe that what John Langford is really criticizing is that neural networking as an academic pursuit suffers overall due to the emphasis on practical results at the expense of traditional research practices.

I think Julian is, incidentally, making assumptions about the domain which John is making observations about. This is of course nobody's fault but John's for not telling us.

In the long run, I believe John would be right to observe that this will lead to less-than-ideal performance in any industry or individual problem using ML. Of course from John's article, it's not very clear what he is observing, even though it's very clear what his supporting arguments will be ;)