Data science came a long way from the early days of (KDD) and conferences. 1980s-90s software engineers handling databases evolved into . The big data meets smart algorithm collided in a , making “”. That brings us to a decade later, post-pandemic 2022, asking the question, “”.
Hi, I’m Liling. By day, I am an applied scientist in Amazon and by after-work, I code open source and write tech articles on natural language process and sometimes articles on gaming pop-culture.
It is a joy and honour to be nominated in the Hackernoon Contributor of the Year for Natural Language Processing (NLP) category and if you have enjoyed by NLP or Machine Translation content that I’ve been sharing, help smash the vote button at
To celebrate the nomination, I’m writing up this article in a “Ask Me Anything” questions and answers format.
As a tech writer, I love to share the emergent technologies in machine learning and I have a particular soft-spot of language and translation related technologies. To celebrate the nomination, I’m writing up this article in a “Ask Me Anything” questions and answers format. Learn more about my thoughts and opinions on “what kind of a scientist am I?” in the tech industry in the follow sections.
Nowadays, job description for “data scientists” comes in different forms and it falls broadly under these categories:
If you ask anyone about the difference between the role and responsibilities of the different job titles, you will most probably end up with a vague line that delineated each of them.
This is usually the responsibility of the “scientists”. In the industry, this is specific to the different task and applications the team is supports and/or develops. It it similar to the academic researchers building machine learning model but the practicality of whether the final model is usable usually trumps the need to beat the state-of-the-art results in the industry.
This is usually the responsibility of the “engineers”. Reliability is critical to any modern machine learning applications today. It is important to make sure that scientists’ carbon-emitted efforts to produce the best model for the customers/users produces the expected performance in production.
A scientist’s “it works on my laptop” statement is unacceptable in the industry and engineers help to make “it works, anywhere” a dream come true.
P/S: An engineer might train a better model than a scientist do.
Roles and responsibility wise, they are similar but in practical terms some companies might have clear demarcation between the different scientists positions, so always as the human resource (HR) personnel or hiring manager if it’s possible to share the “role guidelines” specific to the position you are applying to and especially important to understand the expectations of your role once you joined the company and team.
I’m personally a “practicalist” in most cases, but when it comes to “the dough”, and asking friends/seniors in the companies are your best bet to know more about the company and their compensation.
“Don’t do it for the money” is over-rated. Do it for the love of doing it. I enjoy looking at numbers and the language data, thus NLP. But remember to get paid enough for doing it =)
There is no “bad” question or “need more focus” to these practical questions. But it does inevitably sometimes attract malicious product/tech advertising.
Literature review
Know what are the datasets available and what’s in them (noise, quirks, etc.)
Find which evaluation metric is task X usually evaluated on
Track the oldest relevant citation of the task, read that paper
Find the highest cited paper for the task, use that as your baseline
Define your success criteria for the task industrially (it might not be the standard eval metric for the task)
Try to replicate or reimplement the baseline
Communicate your model/libraries to engineers. Can your engineer productionize it?
Did baseline meet the success criteria? Ask the business/project stakeholder whether it’s sufficient
Build it, test it, break it, repeat!
At the moment, I’m spending my free time learning about 🤗 and not just about how to use the different components of the library but more so in understanding what features make it a success and what’s the X-factor that made it gained traction in the machine learning community.
I hope the above Qs and As give you some insights to “what kind of a scientist I am”. And if there are more burning questions you want to ask, feel free to leave the comment under the post.
Finally, I want to give a huge thanks the gzht888.community, staffs and sponsors for the Noonie Awards nomination and if you enjoy this article, help smash the vote button at