visit
What we were saying:
When you look at the data infrastructure industry, there is often a new category that emerges through a commercial product. Once the market matures, an open-source alternative gets created and ends up taking over the category. This behavior is often seen because data infrastructure requires privacy, security and scale, which cloud-based solutions can’t offer as well as open-sourced ones. There are many examples, such as Kafka, Spark, and now DBT. We want to be the open-source solution for data integration.You might wonder why an open-source approach would also win the format for data integration; sometimes a closed-source cloud-based approach works. This last sentence is a transition to the next slide.
What we were saying:
In June and July, we started reaching out to 250 of Fivetran’s, StitchData’s and Matillion’s customers. We ultimately managed to talk to 45 of them. We wanted to know whether an open-source approach would make sense to address data integration. What we learned is that a cloud-based closed-source solution will never be able to fully address the data integration problem. It has several inherent issues. 100% of the companies we talked to were using Fivetran, StitchData or other solutions, while also building and maintaining their own connectors. They did so because either (a) the ETL solution didn’t support the connector they wanted, or (b) the solution supported it, but not in the way they needed. When you look at Fivetran, for instance, you’ll see that after 8 years, they only support 150 connectors. The hard part about ETL/ELT is not about building the connectors, but maintaining them. It is costly, and any cloud-based closed-source solution will be restricted by a ROI (return on investment) consideration. It isn’t profitable for them to support the long tail of connectors, so they only focus on the most popular integrations. During those 45 interactions, we also identified a third issue. Some of the companies were about to stop using Fivetran for some connectors because it started to become too pricey. The value of an ELT solution is about replacing a paid data engineer that builds and maintains a connector in-house. The amount of work required from an engineer is almost the same whether a low volume or a high volume of data is being moved. So with volume-based pricing, at some point it just stops making sense to use an external solution. And the last inherent issue with a cloud-based approach: although cloud data warehouses are winning the enterprise market, it is because they are considered part of the data infrastructure. All other solutions must go through a rigorous privacy compliance process that will take several months.What we were saying:
We’re building an open-source ELT platform that syncs data from SaaS apps, APIs and databases to data warehouses, data lakes and other databases. Our solution can fully integrate with your data infrastructure and stack, if you are using Kubernetes or Airflow for orchestration, or DBT for transformation. Our goal is to become the open-source standard for anything ELT by the end of 2021.What we were saying:
By making it trivial to build and maintain connectors using Airbyte rather than doing it in-house, we will become the new standard for building connectors. This will help us support the long tail of connectors. And since connectors run as Docker containers, you can build them in the language of your choice. As connectors are open-sourced, any team can edit a pre-built connector and tune it to their needs. If a connector breaks, anyone can jump on the code, submit a PR and, once approved, the change can be propagated across all the existing users of that connector. Airbyte focuses first on a self-hosted offer; pricing will be based on the feature and number of connectors used. The pricing will not be indexed on the volume of data. Being open sourced enables us to have a bottom-up approach with a frictionless adoption from data teams, without going through privacy compliance, as we handle data security as a first-class citizen.What we were saying:
With Airbyte, we wanted to address 2 different audiences:Data consumers, including data analysts and scientists.Data engineers who were building and maintaining the connectors themselves, or managing the data infrastructure. What made Fivetran successful is that they enabled data consumers to leverage the data without the help of data engineers – they made them autonomous (as much as Segment made product teams autonomous). Airbyte provides a UI that makes it very easy to start replicating data for non-technical users. It takes literally for a data analyst to replicate data from Salesforce to Snowflake, including the deployment through our Docker Compose. In addition, Airbyte will offer data integration connectors through an API that data engineers can leverage to build their own workflow and applications. This is also a way for Airbyte to address SaaS businesses that want to offer their own integrations to their customers through Airbyte.What we were saying:
We actually started working on Airbyte in the end of July. We soft launched a MVP 2 months later (at the end of September), with only 6 connectors. We wanted to have feedback as early as possible. It’s been 6-7 weeks since we soft launched, and we are now used by X companies, we have Y contributors, and we’ve ramped up the number of connectors to 43.What we were saying:
Here are a few choices we made that distinguish Airbyte from other open-source solutions:Airbyte’s connectors are usable out of the box through a UI and an API, with monitoring, scheduling and orchestration. Airbyte runs connectors as Docker containers, so they can be built in the language of your choice.Airbyte’s components are modular, and you can decide to use subsets of the features to better fit in with your data infrastructure (e.g., orchestration with Airflow or K8s or Airbyte’s…)We intend to integrate with DBT for the transformation piece soon and let the community contribute normalization schemas for all connectors. Unlike Singer, Airbyte uses one single open-source repo to standardize and consolidate all developments from the community, leading to higher quality connectors. We built a compatibility layer with Singer so that Singer taps can run within Airbyte.Also published .