visit
Some days, I am working on building PoCs (Proof of Concepts) on the Jupyter notebook, while on other days, I am deploying these POCs into production. Some days are amazingly productive, where I end up devoting my entire time on the backend work of our , while on other days, I am left scratching my head over non-working code resulting from a single comma use! But one thing is for sure - there is never a dull day at Blue Sky.
Being a data scientist at Blue Sky has enabled me to embrace flexibility, have an open mind, and solve complex problems by using simple solutions. At the end of the day, I have learned that if you can achieve the desired result with just a Python or SQL query, do it!
As Occam's Razor states, "The simplest explanation is usually the right one".
If solving complex problems is not enough of an incentive, working at Blue Sky allows me to solve climate change - a clear and present threat to humanity's collective well-being. I believe Earth observation & geospatial analysis will play a crucial role in the coming decade.
Geospatial data not only provides visual proof of what's happening around the globe but also links all kinds of physical, social, and economic indicators that help us understand what the past, present, and future would look like.
Now that you know what I do at Blue Sky let me share how we do this.
This level of diversity also demands a lot of coordination in the workflow of a project. To achieve this, we follow a 5 step workflow that runs through the entire project timeline: Scoping, Research & Development, Data Hunt, Coding & Deployment and Generating Insight.
On any given project, a data scientist might be involved in any one of these five steps. Below is a preview of what they look like.
This is an important step in the grand scheme of things as it helps us evaluate the viability of developing a product. We are able to answer some of the critical questions/decisions regarding feasibility, data availability, ground-truthing & validation. All this requires extensive documentation, for which we again turn to Notion.
While hunting for your data sources, one important thing to remember is that raw geospatial image files are notoriously large, making them hard to store and visualise (unless you like extremely slow loading dashboards). That's why we store them in Cloud Optimized GeoTIFFs (COGs). To learn more about COGs, you can watch this .
In terms of our data stack, the complete data backend is written in Python because of its robustness to handle different kinds of datasets (geospatial or otherwise). Besides Python, we use Docker to test code in different environments and YAML to write configuration files for AWS.
We also maintain the highest standards when it comes to maintaining the accuracy of the model. The model before deployment goes through a number of cross-validation & refinement stages, helping to ensure high accuracy and flagging any anomalies early on.
At Blue Sky, the plan is to fight climate change one product at a time.