visit
I have very recently started making some progress with my Self-Taught Machine Learning Journey. But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me.
In this Series of Blog Posts, I talk with people that have really inspired me and whom I look up to as my role-models. The motivation behind doing this is that you might see some patterns and hopefully, you’d be able to learn from the amazing people that I have had the chance to learn from. **Sanyam Bhutani:** Hello Grandmaster, Thank you for taking the time to do this.Vladimir I. Iglovikov: Thank you for your questions. It is a pleasure. I will try to be as detailed as possible even if fewer readers are able to survive until the end. :)
If I had read a similar text four years ago at the beginning of my Data Science career, I believe my life would have been a bit easier. So it will be well worth all the writing if it will help even just one person.
**Sanyam:** You’ve done your Ph.D. and hold a Masters in Physics. How did Machine Learning come into the picture? Could you tell the readers about your Machine Learning journey?
Vladimir I. Iglovikov: About four years ago I was finishing my Ph.D. program at UC Davis. There were a few months left before graduation. It was time to think about my next steps. All my graduating colleagues were either going for:
Sanyam Bhutani: What made you pickup kaggle and continue competing to eventually become a Grandmaster?
Vladimir I. Iglovikov: Becoming a is not that hard. My firm belief is that every decent ML Engineer / Researcher can and should be able to get one gold and two silver models in the Kaggle competitions. And we have plenty of examples of this being the case. It will take some evenings and weekends, but if you have a solid background in Machine Learning from self-education, academic or industrial experience, it should not be that challenging.
is a different story. There are over two million people registered at Kaggle, but only 150 of them are GrandMasters. Kaggle is a second unpaid full-time job. You need a really good reason to burn your free time for months or even years on this. There are other, much more important activities in life like traveling, sports, social life, etc. Probably, my main reason for getting to the GrandMaster level that I enjoyed working on all the problems and I was able to do it well :) Right now I have many exciting, and challenging problems that can be addressed with Deep Learning at work. This makes me less excited about competitions, although I do a few submissions from time to time at Kaggle. At the same time, life is not limited to work, and in addition to traveling, sports, and social activities in my free time, I am working on open-source projects. Here I would like to mention an image augmentation library that Kaggle Masters , , , and are developing in our free time.
Another activity that takes the time that I could dedicate to Kaggle competitions is writing pre-prints, papers, and blog posts. About a year ago, just for fun, and I published a pre-print called: . There was nothing extraordinary in that work, just UNet + VGG11 encoder that I used in our . The model is just some other variation of UNet, and I would not even say that it is the most successful variation. Right now, only one year later, a pre-trained encoder is already a standard, and people use much more exciting variations with hypercolumn, squeeze and excitation modules, and other nice tricks. As I have said, we did it mostly for fun, but when I look at my , it is my most cited work. It outperforms things that I was doing in academia for many years. When I was presenting posters for other work at CVPR 2018, people came to me and thanked me for TernausNet, for sharing the , and told me how it helped them. I knew that the enormous amount of knowledge rot at Kaggle forums without having a chance to get to the outside world and being helpful to the ML practitioners. This story with TernausNet made me feel the scale of the problem better. Last year my collaborators and I invested a lot of time publishing papers based on competitions. It helped to share our knowledge. The feedback from the community was positive, we got some citations for our publications, and I am planning to increase my writing activity this year even more.
I really hope that more and more Kagglers will invest in writing. It can be blog posts, papers, or something else. More information flowing between people is better. Writing a blog post or tech report will help you to structure thoughts on the problems, it will help other people to have a warm start on similar challenges and, most likely, it will help you with career opportunities. Which means that you will work on more exciting problems and being better paid at the same time.
Sanyam Bhutani: Today, you’re working as a Computer Vision Engineer at Lyft. Do you feel the kaggle competitions are related to your work? How do you find the time to kaggle?
Vladimir I. Iglovikov: When I was interviewing for Lyft, I wanted a position that involved the application of Deep Learning to Computer Vision tasks. My major in university was Theoretical Physics which, in my biased opinion, is superior to Computer Science. But the representatives of the Human Resources (HRs) department had an opposite view. I did not have any Computer Vision related publications in my resume and my work before Lyft was about traditional Machine Learning and not imagery data. The only thing that was related to what I wanted to do were the top results in the Computer Vision competitions.
Typically, HR skips parts of your resume about the results in competitive machine learning and look for a set of standard resume filtering keywords. This time a miracle happened and Kaggle results attracted attention. It helped me to get into the interviewing pipeline. Now having experience developing and deploying Machine Learning models in my previous companies in combination with the knowledge obtained at Kaggle, interviews were relatively straightforward. Hence, Kaggle experience and achievements helped me to get a job that aligned with what I wanted to do. At the same time, it is non-obvious that Kaggle competitions may be useful for work. Moreover, I would not say that all Kaggle competitions are helpful for every project that you face at work. The knowledge that you get at Kaggle is a powerful beast. It brings value only if you know how to tame it. I use knowledge obtained from Kaggle competitions at work often:Sanyam Bhutani: You’ve rocked quite a few competitions. Which one has been your favorite?
Vladimir I. Iglovikov: There are two competitions that I would like to mention. One that is my favorite, and the second one that is the most entertaining.
My favorite is, probably the“Everyone can participate, but only the citizens and residents of a particular set of countries can get prize money.” Even though I am a resident of the United States and pay taxes here, I am not eligible purely because of the color of my passport.
The way that these countries were chosen is also interesting. The organizers took some shady corruption rating of countries from 5 years before the competition and put a threshold at some level. Why this level was chosen is also unclear. I like to think that this cut was selected randomly, but my friends suspected that it was a conspiracy in which Britain cut countries it considered as second class citizens, i.e., Russia and China. I hope that it is not what happened, but the cut was just above these countries. Anyway, I knew about this rule, and I still participated. New knowledge, new pipelines in my private repo — this is important. Prizes are also valuable, but secondary. In this competition, I also had the opportunity to learn from , who is not very publicly known Deep Learning expert, who provided mentorship during and after the competition. I did a few iterations, checked many ideas, implemented a strong object detection pipeline and finished second…. Concerning the rules, I did not deserve the prize, but the fact that the rule exists does not mean that I like it. Anyway, I wrote posts at Facebook and on Twitter that I am not a big fan of these rules. The story is spicy and relatively toxic: “Artificial Intelligence,” “Competition Winner,” “Poor Russian citizen,” “cruel British Defence Lab”, “Accusation of Russia as a country in corruption” and all other fancy words one can exploit in the publication. Different media outlets barraged me asking for an interview, but all of it was so new to me that I baled. It did not stop the journalists, so instead of talking to me they showed my profile picture from Facebook and Github and invited some strange experts to share their opinion on the topic. Today I would behave differently. I am now more comfortable speaking on camera, and I still believe that information flow in science and engineering should be maximized. If we want to travel to Mars or extend our life span soon, we need to be more open to the information flow in the scientific and engineering communities. A big tech company, Mail Ru Group, agreed to pay me the prize money, it was $15k, but I did not feel that I deserve them, so The most hilarious part was when the journalists approached my parents. My mother and father said that although they were an essential part of my childhood, it would be unwise to underestimate the influence of my schooling and that my school teachers are much better candidates for an interview. And it worked :) My school teachers also appeared on TV.Sanyam Bhutani: What kind of challenges do you look for today? How do you decide if the competition is worth your time?
Vladimir I. Iglovikov: As I mentioned above, it works best when the challenge is highly correlated with day to day tasks that I am facing. So this is the first type of problem I look for.
Next, I look for problems that are interesting, but farther outside of my comfort zone. For instance, GANs are a bit more foreign to me than I would prefer. Therefore I would definitely participate in the next Kaggle competition on this topic. The last one is the simplest. There were three competitions at Kaggle in 2017 with $1,000,000+ prize money. I am relatively well paid, so money is not the most essential part of my motivation. But I am open to discussions for advisory board positions in early-stage startups and interesting full-time positions that will add +30% to my total cash compensation. This fact makes me feel that I will give a shot to the next challenge with a good prize.Sanyam Bhutani: How do you approach a new competition? What are your go-to techniques?
Vladimir I. Iglovikov: For every ML problem that I face, the first thing that I try to do is to map the data into prediction and prediction into the validation score.
It may be something that I implement from scratch which works well if it is a problem I am familiar with. But if the problem is new to me, I just copy paste from a forum on Kaggle or `git clone` from a relevant repo. If I understand what is happening in that absorbed code this is good. But if I do not have the slightest clue of what is happening it is still more than fine. In the first stage, I need a reliable baseline, even if do not fully understand all details of how the pipeline work. In general, for myself, I divide every Kaggle competition into two main stages:As Mike Tyson said, “Anyone who has ever been in a fight knows this is true — you can plan all you want, but when punches start to get thrown, plans quickly fly out the window.”
In terms of Kaggle competitions, you will overestimate your skill and you will underestimate the talent of the people that you compete with. State of the Art approaches that you presented at Tech Review at work in PowerPoint and that was warmly welcomed by your coworkers will suddenly perform much worse when it is actually implemented in Python and verified on the leaderboard. Drivers, libraries, model and data versioning (Here I would like to mention a tool that I use for model versioning tasks), the quality of the data, all these and 100500 other small but annoying things will attack you. By default, you do not know where the most prominent issues will be. Thus the pipeline that moves me to the adult stage of the competition is the first thing that I need to do. After the pipeline is done, you will be able to do fast iterations. I would like to remind you that the number of ideas that you try is often proportional to your standing on the leaderboard. Thus to maximize your score, you need to maximize the pace of your iterations. Once I start iterating:Sanyam Bhutani: For the readers and noobs like me who want to become better kagglers, what would be your best advice?
Vladimir I. Iglovikov: I would say for those who have not yet tried to participate in Kaggle competitions, but are willing to invest the time to boost your data science skills, to start as soon as possible. If you know the basics of Python and have no experience with ML — this is good enough to start. Just do it, and it is better to do it now than later. Every day that you are thinking about participating but not doing it — there is the knowledge that you do not get.
Sanyam Bhutani: Your Carvana competition win involved some HUGE computing resources-which indeed were the needs of the competitions.
Do you feel, in general-a noob could do well in a competition with starter hardware?Vladimir I. Iglovikov: I would say that having GPUs is helpful. And this is true that our team had around 20 relatively powerful GPUs. Hardware helps with iterations, some heavy Deep Learning models may require both hardware and time, and pretty often you may exchange hardware for time, which leads to fast iterations and, as a result, to more knowledge per time and better results.
If you can afford a machine with 2 x 1080Ti, go for it. If your manager at work may allow you to use some of the work computational resources for competitions, talk to her/him about it. Typically, managers are pretty open to you spending your free time and company computational resources on your self-development. Especially if it improves your productivity and you bring novel ideas in-house. At the same time, I am not sure that a participant with four GPUs has higher chances for success than a person with only two, at least concerning the problems that I have seen so far at Kaggle. I will give you a couple of examples: Kaggle Master, , who finished was part of the winning team in the and was also at the top of plenty of other competitions started without any GPUs. He used free credits from Google Cloud. Now he has four Titan V CEO edition that he got for the , and I am pretty sure that he likes it more than using the cloud.In his interview, spoke on how Kaggle Master got his first gold medal in a Computer Vision / Deep Learning competition without having GPUs. He got a strong result with CPUs at the beginning of the competition, and many people with GPUs were happy to merge in a team with him. I would say that there are plenty of people who want to participate but do not have GPUs, and plenty of those that have GPUs but are less motivated. Show show some result, and people with hardware will appear.
Sanyam Bhutani: You’re an active member of the ods.ai community. Can you tell us more about ODS? We can see many teams from ODS in the Top LB(s), are noob kagglers welcome in the community as well?
Vladimir I. Iglovikov: It is more than this. If you look at my there is a line of me being an evangelist for as a community. I believe that the community is excellent, and it boosted my ML skill a lot. It took me a couple of years to get my first gold medal at Kaggle when I was competing on my own. I joined ods.ai and other top finishes came much faster :)
is a Meritocratic Russian Speaking Data Science community. At the moment it is about 30k members. Not all of them are active, but there is a significant number of people that are there pretty regularly and have Data Science as part of their lives. It can be for research, competitions, education, pet projects, business-related initiatives, and anything else. Everyone, except for the representatives of the Human Resources departments, is welcome. Some people are making their first steps in Data Science, some people work more on the research side, others make money for building products with Data Science techniques as I am. Basically, everyone who is excited about Data Science and is willing to learn DS in an efficient but probably the hard way is welcome. The most significant part of the community lives in the former Soviet Union, but there are plenty of people from the US, China, Europe, Canada, and India. As long as you are excited about Data Science, it is an excellent place to be. The right place to ask a question, the right place to give an answer. There are a few initiatives that were born in ods.aiSanyam Bhutani: Given the explosive growth rate of ML, How do you stay up to date with the State of the Art Techniques?
Vladimir I. Iglovikov: I do not. And am not even trying. The field is too broad. When I have a problem that I need to solve, I dig deep into the latest achievement in this particular area. For the topics for which I do not have hands-on experience, my knowledge may be minimal. I am trying to work on a few diverse problems at the same time, a couple at work, a competition, and a pet project, which gives some breadth of view. I have hope that I am experienced enough to be able to pick a new topic, say voice recognition, and become proficient with expediency.
Reading the latest papers on all topics from morning till evening is a pretty bad idea. Life is too short and valuable for this type of activity. I read papers daily, but only for the problems that I am facing. Also, I analyze the winning solutions to different competitions. As we discussed above, it is an invaluable source of information, if you can parse it correctly. Attending conferences like CVPR, ICCV, as usual, works really well. Dense, exhausting, but a very productive way to catch up with the field in general.Sanyam Bhutani: What progress are you really excited about in Computer Vision?
Vladimir I. Iglovikov: In terms of papers, my latest favorite trick is to use siamese networks for image classification and tracking, where you do not learn class values, but class embeddings.
I really like the concept. But I also like how it allows using a small amount of data for training. My guess is that to move in the direction of general artificial intelligence, robots, self-driving cars being a product, and other exciting sci-fi areas we, probably, need to figure out how to use the existing data more efficiently. And in this sense, I like siamese networks and overall work that the community does in the direction of one-shot learning.Sanyam Bhutani: Do you feel ML as a field will live up to the hype?
Vladimir I. Iglovikov: Being honest, I do not know. Let’s rephrase the question.
There are plenty of people that go to college to the Computer Science department for four years, thinking about going to Graduate School for five years, so that in 4 + 5 = 9 years they will have an interesting, well-paid job in the domain of Machine Learning. I am not sure that they will. But we will see how it goes. Everything is changing too fast these days :)Sanyam Bhutani: Before we conclude, any tips for the beginners who feel overwhelmed to start competing?
Vladimir I. Iglovikov: There is only one piece of advice for those who want to start competing. Just do it, and try to get to the top, learning from everyone and everything. Kaggle is a great learning environment for a big subset of Machine Learning skills. The earlier you will start, the earlier you will acquire them.
Sanyam: Thank you so much for doing this interview.
Vladimir I. Iglovikov: Thanks to you for organizing it and everyone who will have the patience to read till the end :) And special thanks to for proofreading!
Edit: Thanks to for proofreading and suggesting a few corrections. Edit2: Many Thanks to for further proofreading.If you found this interesting and would like to be a part of My Learning Path, you can find me on Twitter .
If you’re interested in reading about Deep Learning and Computer Vision news, you can check out my .
If you’re interested in reading a few best advice from Machine Learning Heroes: Practitioners, Researchers, and Kagglers. Please click here