Data Engineer (Spark, Python, MongoDB)

Data Engineer (Spark, Python, MongoDB)

Data Engineer (Spark, Python, MongoDB)

Our customer, a large-scale company, is seeking a Data Engineer (Spark, Python, MongoDB).

Responsibilities:


- Implement data pipelines in Apache Spark & Python environment.
- Clean & prepare data for the Bi team.
- Supporting and improving scripts for data ingestion and maintaining data in the data lake.
- Monitor performance and recommend/implement any necessary infrastructure changes.
- Define data retention and access policies.
- Automate data pipeline infrastructure in a Linux environment.
- Design systems that are cost-aware and which provide best value.
- Manage cluster and environment and related scripts/tools.
- Work with developers and other infrastructure engineers to deliver complimentary solution.
- Work with a skilled team of developers and architects within a group dynamic setting.
- Bring innovations and new solution to current and future challenge.

Skills Required:


- 2+ years of experience as a Data Engineer.
- 2+ years of experience with production Hadoop pipelines and tools.
- 3+ years of experience in Python.
- Experience with Scala and Spark.
- 3+ years of experience working in a Linux environment.
- Proficient understanding of distributed computing principles.
- Practical experience with SQL and or NoSQL (MySQL|MongoDB|Elastic Search|MemSQL).
- Ability to understand and troubleshoot Big Data issues at many layers of the stack.
- Ability to multi-task, prioritize and estimate effort.
- Experience establishing integration standards and related processes.
- Detail oriented.
- Good documentation habits.
- Experience with workflow management tools (Jira, GitHub Workflow).
- Self-supporting (able to set up and maintain dev environment), an asset.
- Experience with AWS EMR and AirFlow, an asset.
- A worshiper of TDD, an asset.
- Conceptual tool experience (UML Diagramming, Flow), an asset.
- Experience with containers and container-based architectures, an asset.
- Experience with R, an asset.
- Experience with Machine Learning algorithms, an asset.
- Experience with Apache Spark Dataframes|Datasets, an asset.
- Experience working in an Agile environment, an asset.


  • Region

    Montreal (greater Montreal)

  • Status

    Permanent

  • Enterprise

    Client

  • Job ID

    9568

APPLY NOW