visit
I think that typically the most efficient size for a machine learning or artificial intelligence team is between 5 to 10 people per project. With 5 to 10 people, you can easily tackle most of the important aspects of an AI project. With more people, it becomes difficult to manage. There is a lot of overhead in the communication cost.
Since we actually have many projects going on we need to scale that team size. That is also true for larger companies where ML modeling is at the core of their business. From my experience, their (core) modeling team is around 5 to 10 people. So 5 to 10 people for a single task should be always enough to solve it.5 to 10 people for a single task should be always enough to solve itWhat is your current project?
Right now we are working on projects for a global ad network. This project is very difficult in terms of deployment and production needs because we need to deliver 5 million point forecasts per second, at 100 milliseconds latency.
It’s a very difficult project because for every customer and every product that they want to advertise via our clients’ network we need to create those forecasts. So multiple original numbers by the number of products and clients and the number of predictions our model needs to generate and this is just huge. Forecasts need to be delivered every second, it is online so yeah it is a tough one.This project has actually taught me a lot in terms of production in a large scale forecasting system. We made a couple of unintuitive moves that improved the performance a lot. For example:
Because when you are averaging predictions from 20 models, what you can also do is look at the standard deviations of those predictions. So, if all models vote positively for a given product, it means you should show it to the customer. If like five models, vote positively 15 models votes negatively, then you have a problem. A good thing is we can take advantage of model ensembling.
I know that forecasting on large datasets where you need low latency doesn’t seem like a typical scenario for model ensembling but that’s why I’m saying it is not intuitive. In such a production scenario, you would typically expect the model to be as simple as possible.I think that in the production setting, model ensembling is much more important than, for example, in competitions. Typically people claim that you only use those monster ensembles to win competitions but you will never use it in real life. I say that in real life it is something much more important than in competitions because of a very simple reason that you get a lot of non-stationarity.
in real life model ensembling is something much more important than in competitions because of a very simple reason that you get a lot of non-stationarity.Non-stationarity is something common In real life, which you don’t experience in competitions. And in the non-stationary problems having a diverse group of models helps a lot.We were actually spending a couple of hundred thousand dollars on Amazon every month but we have decided to move from one model to 20 models because the performance improvement was so big.
Yeah, I’ve just won my first solo gold medal and it’s a dream come true. Thank you so much. Yeah, so, actually, I came to data science and to machine learning and artificial intelligence from competitions.
I started competing and then I realized it is such a great area to be in and an area that can have a tremendous impact on our lives. So I decided to basically change my life entirely and start coding, doing data science and machine learning.I started as a business consultant, I worked at the Boston Consulting Group. It is a great company and I‘ve been working there for six years or so.Then they were generous enough to send me for an MBA program. I chose and that was one of the best things that happened to me in my entire life. This one year, when you come back to university, you have some time to think about what you actually would like to do with your life.I’m a huge evangelist of MBA and having that gap year to reflect. It was during my MBA program, when I discovered machine learning and wrote my first line of code. This is how my Data Science career started.
I’m a huge evangelist of MBA and having that gap year to reflect.
After MBA I went back to BCG, BCG gamma to be exact. BCG gamma is a sister company of BCG focused purely on AI. All big consulting companies created those. It was a great company led by fantastic people with an amazing guy, Sylvain Duranton running it. So they opened an office in Warsaw, they invited me and eventually I was leading this office. I think consulting, especially there at BCG, was a great place to be.
But after some time, I just wanted to try something different and I went to Netsprint.It is one of the largest data ecosystems in Poland and also a great company with a fantastic management and leadership team. I went there to be a chief data scientist.Some of the major use cases that we developed were around predicting the behavior of internet users based on their past history that was stored in cookies. That was prior to GDPR in Europe, which by the way would change the situation completely.
For example, we developed a website embedding system. Sites visited by similar people were close to each other in the embedding space and based on this information our system would predict if the websites visited by you were male or female-biased. For advertisers this is actually a very valuable piece of information.Then GDPR happened and the situation totally changed. I moved to deepsense.ai where I am right now. I’m a director of customer analytics.What drives me most at my work is actually problem solving and it is most attractive when I can also participate in coding. That is a big challenge that I see not only in my case but for many people who grow in organizations as good programmers and developers or scientists and they start leading teams.
I was fortunate enough because at deepsense.ai we have really a great team of people who require very little hand-holding and we can limit meetings to the ones where we discuss objectives and coordinate between team members.
Once the objective is set they use all the means necessary to realize that objective. It is one of the great things about working at deepsense.ai: the management is very, very efficient.When it comes to the percentage split, I’d say I spend 20 to 30% is formulating the problem with the client and getting clients feedback, another 30% is working with the team and another 40% is my own coding and my own work.I’d say I spend 20 to 30% is formulating the problem with the client and getting clients feedback, another 30% is working with the team and another 40% is my own coding and my own work.
It takes conscious effort to get to 40% there as it can easily drop to 5 or 10%.
When that happens you become unhappy as a coder and nothing good comes from being unhappy. It requires some discipline to organize your meetings, your stand-ups, and the way you talk to your clients but it is doable.It always amazes me how little things and even a tiny lack of coordination can lead to shocking results. To deal with those problems I always try to have an entire team in one room.
For example, the problem formulation is one of the key things that a machine learning team leader or chief data scientist should know how to do. It requires a lot of experience because it means that you should be able to know intuitively, what kind of technology to use for a given task.
Let’s say we have a project where we want to personalize the e-commerce website with ML systems. One approach could be to use traditional classification models and the other could use contextual bandits (reinforcement learning method).The technology choice at the very beginning has a huge impact on what kind of issues you’re going to be dealing with later on.The chief data scientist should really know very well or at least know intuitively which approaches to try. I would say that is easily one of the most important skills.
Then, of course, in any machine learning projects you need to choose where you and your team should debate their time to:by adding a new data source that their competition was not using they were able to price the risk segments that no other bank was able to price.Adding new data, looking for sources of new data, and even talking to potential partners that could give you the data is also within the role of a well-performing data scientist, chief data scientist in particular.
Operationally I tend to focus a lot on feature engineering. Extracting information from data is more important than choosing algorithms or tuning hyperparameters. Of course, model selection or hyperparameter tuning is also important, but rarely have I seen a big impact from these two sources.
The last point that I think the data science community underestimates a lot but the people who actually apply the models to real-life appreciate is the aspect of non-stationarity.Non stationarity is when something fundamental changes in the process of data generation. You need to make sure that your models are resilient to that.I will give you an example. Once I had a very good discussion with a person leading consumer lending modeling for one of the largest Polish banks. He explained to me why he’s using linear regression instead of say lightGBM for example.First let’s understand the position this guy is in:Under such circumstances what is very desirable for a machine learning model is responsiveness to sudden changes in the economy. He told me that if something sudden or something big changes in the economy, he pretty much feels how the linear regression model will behave but he does not have the same feeling about the lightGBM model.
I did not appreciate his comment until I ran into the same situation. Today we are doing a lot of modeling in the non-stationary data environment and we are using all those tricks to stabilize the outcomes. Using 20 model ensemble is one way, using simpler models like linear regression is another.I would say that this aspect of feeling how your solution will behave if something changed in the data generation process is really important in the context of the socio-economic domain. I feel that it is this kind of knowledge that you get after at least a couple of years of doing real-life modeling and experiencing some failures along the way.Anyway, the example that I mentioned before with the change in the website design is a really good one as well. Suddenly your features mean something entirely different but the models are outputting completely wrong predictions.This is one of the curses of machine learning that our models tend to fail silently.Another example of non-stationarity that I always give is hyperinflation. Imagine doing credit risk modeling in the hyperinflation environment. Typically, one of the most important features in the credit risk model is the personal income which is heavily affected by hyperinflation. If you train your model two months ago and you use it today it will be completely wrong.
To deal with non-statinarity you can use retraining, online learning, ensembling methods or other things. All of those make things better but they don’t necessarily make things good either.To deal with non-statinarity you can use retraining, online learning, ensembling methods or other things. All of those make things better but they don’t necessarily make things good either.They mitigate the effects of non stationarity but they don’t eliminate it.Say you retrain your model every day, which is a good practice, and then something really big changes. It is only one day out of say 180 days which is your typical training sample and it will not have enough impact to make the right adjustment for the business context.What I am saying is that it never hurts to have a better frequency of retraining but it is not everything.
The first one is the competitions where I get to learn new techniques. I do it every day. I try to find 15- 30 minutes just to think about the problem, just to think about how I would approach it and maybe just write one line of code. It never stops with one line of code, of course :).
Second one is good industry blogs. One blog that I could particularly recommend is the run by Jeremy Howard.
I also like that In terms of books, I have one that I highly recommend.It is really a very good book.Other than that, I’d say the best experience comes from real-life projects. That is actually one area where I think there are still not enough opportunities that are exploited by the data science community.
I’d say the best experience comes from real-life projects.For example, the amount of information that is out there on the internet, which you can scrape, analyze and gain insights from is huge. There is a big branch of companies and startups that are doing this exactly but there is still a lot of room there.Recently BlueDot has issued a warning against the Wuhan virus on 10th of December 2019 much earlier than everyone else.So this is the area where you need to connect your model to data around you and by automating the analysis you can derive really interesting insights.
I try to always focus on the question that I’m answering, the thing that brings value.So before coding spend time thinking about:
Not so much the technical excellence of your code but actually knowing where the value is that you bring.
I think this is actually a big hope for people coming from an outside industry into machine learning because this is the skill that is very natural to people who are used to solving business problems.
Machine learning and AI is a very young field and there are not many established best practices
Machine learning and AI is a very young field and there are not many established best practices that are taught at university or in any book so I designed my process and I try to follow it strictly.
Of course, I deviate from this process all the time, but at least I am by following something that is making sure that all the important elements are covered.I think it is equally important in real-life projects and competitions but let me give you a competition example.I believe that when you want to win a Kaggle competition you should at least touch all sources of potential value:I enjoy feature engineering a lotI tend to focus on things that I like, but I think this is very natural for all of us. So I enjoy feature engineering a lot which is fine but forgetting about other parts of the process cost me dearly in the past.
I think you remember the Home Credit default competition from 2018. We lost first place because we’ve put almost all focus on feature engineering. We had the best features and great models but we completely forgot about ensembling which is a huge source of value.
We were first on the private leaderboard but our models were not robust enough and we dropped on the public leaderboard from 1st to 5th place. That is why now, I try to follow a very strict process to make sure that I exploit all the possible sources of value.Also, it’s important to mention that my best ideas come when I am having a walk in the park with my wife but don’t tell her that. Ok, she already knows that I tend to think about machine learning when we spend time together.my best ideas come when I am having a walk in the park with my wifeSometimes you just need to take a step back and do something completely different: go to the park, go have a beer, talk to your friends and then the best ideas come.
If you spend sufficient enough time learning and improving the results will come.When I talk to beginner data scientists, they ask me about Kaggle I always tell them, forget about the results, focus on consistency and learning. If you spend sufficient enough time learning and improving the results will come.
First one is reinforcement learning. This is a technology that I believe has a huge potential going forward. I don’t think I have explored it enough in the past.
reinforcement learning is a technology that I believe has a huge potential going forward.
The other one is actually the research I’m doing at INSEAD about unconscious biases, and how these biases impact our decisions.
It is very interesting to me because I’m both in the field of machine learning and human learning. Analyzing and seeing what are the similarities in the process of learning between humans and machines is fascinating and some of the conclusions that we have here are really interesting.Analyzing and seeing what are the similarities in the process of learning between humans and machines is fascinatingFor example, I think, the big topic now is diversity and inclusion in organizations. I think machines are much more advanced in diversity and inclusion than humans are. Why is that?
If you recall in Kaggle competition and sometimes in production situations, it is not a single model that wins but rather a team of diverse models. So for any machine learning algorithm, it would be obvious that you should have diverse models in your modeling stack but for humans it is not.Also, if you think about it, we do not analyze results of how teams perform together but rather look at individuals. We give individual performance bonuses, we hire based on individual performance but actually having a best team of four is different from having best four individuals.having a best team of four is different from having best four individuals.It’s a completely different matter and it is because we cannot measure the team effects as such, and we end up in an environment where we measure what we can, individual performance. In recruitment, we consider this particular person, typically not in the context of a broader team, but rather as an individual.For example, say you have a board of a very large company, you have four guys, white males between 40 and 50 with Harvard degrees. You want to add a new member of the board and your recruitment process finds another person similar to the ones that are on the board. He may be better than other candidates as an individual but if you think about the team there could be much better candidates with entirely different backgrounds.
Diversity means derisking your “models” which is very obvious to everyone in the machine learning industry but is very difficult to grasp for humans just because we cannot measure it.Diversity means derisking your “models” which is very obvious to everyone in the machine learning industry but is very difficult to grasp for humans just because we cannot measure it.Sadly we don’t have a good solution for it right now. But I think it is a good question to ask, it is a good issue to spend some time on.
it is very easy to be successful if you are doing stuff that you like
I strongly believe that some senior and very technical guys that love coding should just do coding and they will lead the team naturally with their coding skills.
The guys who like to talk to other people and can organize getting diverse sources of data, they should do that because they will bring a lot of value to the team and the team will follow.That said, I think that the career progression for technical people is an area where companies don’t really know what is the best approach. Reconciling the need for very good programming skills with the need for managing people is just difficult.I always advise, do what you like, do it very, very well, and your leadership will come from this.Anyhow, I always advise, do what you like, do it very, very well, and your leadership will come from this.
All the training on management and dealing with people are secondary. Your presence as a chief data scientist or a team leader should naturally come from your true passion. People always follow people who are true in what they do.
Of course, there are some particular areas of expertise that you need which we already talked about like:I need to mention that developers, including myself, tend to focus on advancing technical coding skills only. The higher technical skills the better, of course, but improving your skills in other areas when you are already savvy technically can have a way bigger impact on your career.
improving your skills in other areas when you are already savvy technically can have a way bigger impact on your career.
Do you have some final thoughts?I think AI so far, and this is a very broad reflection of AI, has had a tremendous impact on technology (computer vision, natural language processing) but the applications where it has a strong impact on real-life are few and far between.AI has had a tremendous impact on technology but the applications where it has a strong impact on real-life are few and far between
I strongly encourage such applications and I believe education and healthcare are two sectors where the impacts could be the largest.
I encourage people to work there and think how AI and machine learning could change the situation because I like to think that machine learning and AI is the next big thing that will happen to society.
That said, I have not seen it so far. I have seen it work well in technology, but building tech for tech is not interesting, it should change our lives.This article was originally posted on the. If you liked it, you may like it there :)
You can also find mei or about ML and Data Science stuff.