Data Pipeline Engineer
Uni Systems
At Uni Systems, we are working towards turning digital visions into reality. We are continuously growing and we are looking for a professional Data Pipeline Engineer to join our UniQue Ispra team.
What will you be bringing to the team?
Requirements
What do you need to succeed in this position?
At Uni Systems, we are providing equal employment opportunities and banning any form of discrimination on grounds of gender, religion, race, color, nationality, disability, social class, political beliefs, age, marital status, sexual orientation or any other characteristics. Take a look at our Diversity, Equality & Inclusion Policy for more information.
What will you be bringing to the team?
- Setting up / improving pipelines to process all required documents that uniquely identify and trace decisions and processing steps. This is to be conducted on the provided classified sandbox environment, with provided performance hardware and toolsets.
- Implementing / improving (missing) pipeline steps for marking duplicate files, based on file attributes, path (structure), and content (similarity), and establishing rules for determining if a file or structure is a duplicate.
- Extracting document-format records from Functional Area Systems (FAS) databases and other performed backups. Archiving SMEs and system SMEs are available to guide target formats and interpret source system structure and data. Each FAS is processed individually; not all sprints address this item.
- Processing / Monitoring various office, image, and video file types into accepted archiving formats, including metadata extraction and preparation of semantic indexes for search.
- Automating the registration of all processed documents and their semantic indexes with the sandbox natural language search tool.
- Automating the final transfer of all non-duplicate and extracted archive documents, including content and metadata, to the Institution archiving system.
- Reporting the status, progress, and statistics of raw files being converted into archive formats, along with associated metadata and search indexes.
- Delivering full reporting of results, pipeline step traceability, and documented (stakeholder-approved) exceptions.
Requirements
What do you need to succeed in this position?
- Master's degree in Computer Science, Engineering, or a relevant field (an advanced degree in Data Science is preferred)
- At least 3 years of practical experience in the field of data science and/or data analytics
- Experience using data processing, visualization, and analytics software packages and development environments, preferably such as KNIME, VS Code, GitLab, Power BI, Jupyter Lab, and Docker-based APIs
- Experience with Big Data processing, creating and utilizing containerized building blocks, and running containers (APIs) on Kubernetes clusters
- Proficient in programming/scripting languages such as Python, R, and SQL, and working with data formats like CSV, XML, and JSON
- Experience in content extraction from files, databases, and systems; use of embedding models (including LLM-based), entity extraction, keyword extraction, and content similarity measurement
- Strong drafting, communication, and presentation skills in English, suitable for both technical and non-technical audience
At Uni Systems, we are providing equal employment opportunities and banning any form of discrimination on grounds of gender, religion, race, color, nationality, disability, social class, political beliefs, age, marital status, sexual orientation or any other characteristics. Take a look at our Diversity, Equality & Inclusion Policy for more information.
JOB SUMMARY
Data Pipeline Engineer
Uni Systems
The Hague
11 days ago
N/A
Full-time
Data Pipeline Engineer