Data Engineer - Data Foundry Engineer
Data
Machine Learning
São Paulo, SP
Remote
TRACTIAN is transforming the industrial world by empowering frontline maintenance workers to achieve more. We’ve fused cutting-edge hardware with innovative software into one powerful platform, disrupting legacy systems and delivering smarter, faster solutions for our clients.
Design and maintain robust data pipelines to ingest from a wide range of sources, including APIs, documents, websites, and raw sensor data
Integrate and optimize ETL/ELT processes developed by MLE colleagues, improving performance, reliability, and long-term maintainability
Own the full dataset lifecycle, from raw ingestion through cleaning, validation, and delivery as training-ready data
Define and enforce data quality standards and governance practices across the Data Foundry team
Build and maintain labeling pipeline infrastructure for ML applications, working closely with the annotation team
Participate in architectural decisions, code reviews, and technical mentorship within the team
Document data sources, pipeline logic, and processing decisions for reproducibility and team alignment
3+ years of experience in data engineering
Degree in Computer Science, Data Engineering, Computer Engineering, Information Systems, or equivalent technical background
Solid understanding of the ML training lifecycle and what properties make a dataset suitable for model training
Familiarity with layered data architecture patterns such as Medallion Architecture (Bronze/Silver/Gold) or Data Mesh
Proficiency in Python, with focus on data manipulation, pipeline development, and automation
Workflow orchestration using code-based tools such as Temporal, Airflow, Prefect, Dagster, or equivalent
Distributed data processing with Spark, Databricks, or similar
REST and gRPC API integration
Strong SQL skills, both for data modeling and query optimization
Experience with streaming systems and event-driven pipelines (Kafka, Kinesis, or equivalent)
Comfortable jumping into ongoing codebases and optimizing work built by others, without needing to start from scratch
Technology-agnostic: you evaluate tools based on what the project needs, adopt new ones quickly, and don't get attached to a specific stack
At ease in fast-moving environments where priorities shift and the right answer isn't always obvious
Engineering-first mindset: you think in pipelines, own outcomes, and care about the quality of what you ship
Driven by curiosity and innovation, not by comfort with a known toolset
Experience making architectural decisions and contributing to the technical growth of a team, formally or informally
Go, for high-performance pipeline components
dbt for transformation layer modeling
Open table formats: Delta Lake, Apache Iceberg, or Hudi
Data quality frameworks such as Great Expectations or Soda
Cloud experience, preferably OCI (our current migration target). AWS, GCP, or Azure background is also valued
Rapid prototyping with Streamlit or similar tools. The use of LLMs and GenAI to speed up internal tooling and experimentation is actively encouraged
Experience with data annotation workflows or training dataset pipelines
• Competitive salary and stock options
• 30 days of paid annual leave
• Education and courses stipend
• Earn a trip anywhere in the world every 4 years
• R$1.035/month for meals allowance
• Health plan with national coverage and without coparticipation
• Dental Insurance: we help you with dental treatment for a better quality of life.
• Wellhub and Sports Incentive: R$300/mo extra if you practice activities

Tractian Ranked #24 Fastest-Growing Company in North America
Read More
If you want to build a ship, don't organize people to collect wood, assign them tasks, and give orders. Instead, teach them to long for the vast and endless sea.
Antoine Saint-Exupery