visit
This is a story about software architecture, about a personal itch, and about scalability. And like any good tech story, it begins with a shaky architecture.
At , we help large enterprises to measure the security posture of their suppliers. But I’m not going to get into the whole 3rd party security management extravaganza with you. we came to talk about our architecture and process.
a Dynamic Workflow Engine, built to create workflows and execute them as Kubernetes Jobs.
A container based architecture makes The Transporter both flexible enough to configure jobs separately and efficient enough to scale.
It favors parallelism when possible, according to the workflow dependencies and provides a REST API for a fully automated pipeline.
Standing on the Shoulders of Giants The Transporter’s API is automatically triggered whenever a new company is entered to the platform. The Transporter then deploys the jobs to kubernetes in parallel or sequentially, according to a predefined workflow.
Overview As with the original transporter, The Transporter follows a few simple rules:
The Rules
In our case a job is the equivalent of running a .A group of these jobs are a phase, a phase can be sequential or parallel.A workflow is a sequence of phases.
Now, we can enjoy parallelism while still follow some rules.Workflow Example
The Transporter leverage a distributed task queue architecture.In this architecture tasks get transported to queues, workers consume the tasks from the queues and perform these tasks.This architecture makes it possible to retry a failed task,set a timeout, set a priority, and schedule tasks for later.
We send notifications to alert on workflow start, success and failures.
Distributed Task Queue Architecture
Now, we are ready to tie it all together — The transporter provides endpoints to manipulate a workflowbehind the scenes the workflow gets translated to tasks. we use celery chains and celery groups to set dependencies.these tasks get transported to queues based on the dependencies.On the other side celery workers consume tasks from the queuesand deploy the corresponding .The result — a workflow getting accomplished according to job dependencies.
We also added endpoints to control workers for convenience.The number of running workers sets the limit for how many jobs can run concurrently.
Our new deployment process includes security researchers building and pushing to Registry.The Transporter transports the corresponding jobs according to the workflow and a which defines the version of each job.Kubernetes is the engine actually executing the underlying docker containers.
Updating Jobs
How Not to Name a Job
and this is how we discovered some Kubernetes naming limitations:
#1 Tip - unique identifiers for names. but we still wanted to know the original job name and the company name which leads me to — #2 Tip - labels. labels everywhere.
UI — We want to add a UI to make it easy to monitor and troubleshoot active and finished workflows.
Generify — If we make The Transporter a bit more generic maybe we could release it as an open source.
If you have a use case which involves running batch jobs according to certain dependencies (e.g Data Acquisition, Web Crawling System) and you are interested in scaling with The Transporter please comment or reach out to let me know.
If you enjoyed this post, feel free to hold down the clap button 👏🏽 and if you’re interested in posts to come, make sure to follow me onMedium: Twitter: LinkedIn: