This is an exciting opportunity for data engineering candidates to apply their skills in the big
data domain. You’ll get an opportunity to gain hands-on project experience of creating
self-service data platforms with a Fortune 500 company. You will use the latest big data
technologies to ingest the data from various applications using ETL tools and batch processing/
real-time processing to the Hadoop ecosystem, and make this data available in data marts for
access by data science and data analytics teams.
Roles & Responsibilities:
● Data Engineering and Technical Delivery
● Prepare data for analysis using Presto SQL or domain specific tool (example: Omniture
for Digital), visualizing the data and executing to specifications
● Web Scraping using Python to get basic datasets from popular websites (e.g.: LinkedIn)
as required, Parsing JSON objects to get the data in tabular format
● Good knowledge of databases/SQL, relevant tools like R or Python, Omniture (if digital)
● Experience with frameworks like PySpark to handle large data
● Shows drive to increase the breadth & depth of tools and systems creating Data
schemas, building the pipelines, collecting data and moving it into storage.
● Preparing the data as part of ETL or ELT processes.
● Stitch the data together with scripting languages and often work with DBA’s to construct
data stores or data models.
● Ensure data is available for ready to use and use framework and microservices to serve
up the data
● Design, build and optimize applications’ containerization and orchestration with Docker
and Kubernetes
● Stakeholder Engagement
● Grasp requirements on call and deliver to specification; Present to Senior Management
& Leadership
● Present findings to team lead/managers and to external stakeholders
● Drive stakeholder engagements by driving complex analytical projects including
bottoms-up projects
● Develop executive presentations with guidance
● Expert user of Python & Presto SQL
● Working experience on, Hadoop ecosystem, Hive, Kubernetes
● Usage of various machine learning or statistical libraries, frameworks like PySpark

