In this role, expect to build out high volume data pipelines and work to improve the existing infrastructure. You will improve performance, de-bug, and increase visibility into the data ecosystem. Expect end to end ownership of your code… If you enjoy working with large data sets and the challenges associated with them this is the role for you.
- Design, build and launch proficient and reliable data pipelines
- Design and develop systems and use the most efficient tools that will enable our consumers to comprehend and analyze the data quicker.
- Maintain and trouble shoot existing pipelines.
- Experiment with various frameworks in the Big Data ecosystem to identify the ideal approach for extracting insights from out data sets.
- 4+ years of experience working in the Hadoop ecosystem.
- 5+ years of experience processing large amounts of both structured and unstructured data, ingesting, processing, storing, querying and cluster-computing
- Ability to write, reusable code components in Python, or Java
- A self-starter with an attention to detail
- Ability to examine data issues across a large and intricate systems by working alongside multiple departments and structures.
- Familiarity with relational databases and SQL
- Must be willing to work in New York, NY
- Experience with Spark, Storm, or similar
- Proven proficiency with Scala
- Master's Degree with a specialization in Computer Science, Engineering, Physics or other quantitative field or equivalent