Computing Cluster Operations Lead
MILA - Institut québécois d'intelligence artificielle
Montreal
Conseillère ou conseiller en environnement (agronomie)
Ville de Québec
Québec
Assistant-contremaître - Gestion des eaux
brh réseau d’experts
Nicolet
Superviseur développement analytique
Halo Pharmaceutical Canada
Mirabel
Analyste contrôle qualité - Temporaire indéterminé
Halo Pharmaceutical Canada
Mirabel
Analyste développement analytique
Halo Pharmaceutical Canada
Mirabel
Technicien de Prélèvements et d'Inspection en Entrepôt
Halo Pharmaceutical Canada
Mirabel
Officier·ère des sciences biologiques
Canadian Armed Forces/ Forces armées canadiennes
Canada
Biologiste, spécialiste des milieux humides
Nature-Action Québec
Beloeil
Technicien/technicienne à l'environnement
Management Simo Inc.
Longueuil
Engineering scientist
Centre technologique en aérospatiale (CTA)
Longueuil
Chargé de projet en environnement
Les Gazons Tholano Inc.
Saint-Thomas
Scientific research department manager
Institut de recherche et de développement en agroenvironnement
Québec
Environmental Technician – Environmental Characterization and Rehabilitation
GROUPE CONSEIL UDA INC.
Québec
Spécialiste terrain - Technicien(ne) forestier(ère), agricole ou en environnement (Montérégie & Montréal)
GROUPE CONSEIL UDA INC.
Saint-Charles-sur-Richelieu
Computing Cluster Operations Lead
MILA - Institut québécois d'intelligence artificielle
About Mila
Founded by Professor Yoshua Bengio from the University of Montreal, Mila brings together researchers specializing in artificial intelligence (AI), particularly in machine learning. Globally recognized for its significant contributions to the fields of deep learning and reinforcement learning, Mila has distinguished itself in areas such as language modeling, machine translation, object recognition, and generative models. Since 2017, Mila has been the result of a collaboration between the University of Montreal and McGill University, in close partnership with Polytechnique Montreal and HEC Montreal.
Mila’s mission is to be a global hub for scientific advancements, inspiring innovation and the growth of artificial intelligence for the benefit of all.
The Role
Mila is seeking a highly experienced and visionary Head of Infrastructure to lead and evolve our critical computing infrastructure. This individual will be responsible for the strategic planning, design, implementation, and operation of Mila's high-performance computing (HPC/AI) clusters, data centers, and network infrastructure. The successful candidate will play a pivotal role in ensuring that our researchers and students have access to state-of-the-art computing resources to push the boundaries of AI.
Responsibilities
- Strategic Leadership: Develop and execute a comprehensive infrastructure strategy aligned with Mila's research goals, including future needs for growth and emerging technologies.
- HPC Cluster Management: Oversee the architecture, deployment, maintenance, and optimization of HPC clusters, ensuring high availability, performance, and scalability.
- Vendor Management & Procurement: Lead the RFP process for the procurement of new HPC clusters and other infrastructure components, ensuring cost-effectiveness and alignment with technical requirements.
- Team Leadership: Lead, mentor, and grow a team of skilled infrastructure engineers and administrators.
- Operations & Reliability: Establish and enforce best practices for infrastructure operations, monitoring, troubleshooting, and incident response to maintain a highly reliable environment.
- Budget Management: Manage infrastructure budgets.
- Security & Compliance: Ensure the security and compliance of all infrastructure components, implementing robust security measures and data protection protocols.
- Collaboration: Work closely with researchers, faculty, and other departments to understand their computing needs and provide tailored solutions.
- Innovation: Stay abreast of the latest advancements in computing infrastructure and AI hardware, proposing and implementing innovative solutions to enhance Mila's capabilities.
Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 10+ years of experience in IT infrastructure, with at least 5 years in a leadership role managing complex computing environments.
- Deep expertise in HPC cluster architecture, design, and operations, including experience with schedulers (e.g., Slurm), high-speed interconnects (e.g., InfiniBand), and parallel file systems (e.g., Lustre, BeeGFS).
- Proven experience managing data centers, network infrastructure, and storage solutions.
- Strong understanding of virtualization technologies (e.g., Proxmox, Docker, Podman).
- Experience with infrastructure as code (e.g., Ansible, Terraform) and automation tools.
- Excellent leadership, communication, and interpersonal skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.
- Demonstrated ability to manage projects, prioritize tasks, and work effectively in a fast-paced research environment.
- A passion for contributing to cutting-edge AI research and a commitment to Mila's mission.
Desirable skills
- Experience with GPU-accelerated computing and deep learning frameworks.
- Knowledge of research computing environments and the specific challenges faced by AI researchers.
- Familiarity with open-source technologies and community contributions.
Why join Mila?
- The opportunity to contribute to a unique mission with a major impact;
- A comprehensive group insurance program (health, dental, disability, life, travel and extended benefits);
- An employee and family assistance program;
- Access to a telemedicine service;
- A vacation policy offering a base of 20 days' vacation upon hiring;
- A retirement savings plan with a minimum employer contribution of 4%;
- A generous flexible package allowing you to tailor your benefits to what contributes to your well-being. You can select and combine options to suit your needs, including lifestyle credits, enhanced insurance, extra vacation days and increased pension contributions;
- Flexible working hours, a summer schedule and the possibility of telecommuting;
- A work environment in the heart of Little Italy, in the trendy Mile-Ex district, close to public transportation;
- A team of passionate experts in their field;
- A collaborative and inclusive work environment.
We want to know you
At Mila, diversity is important to us. We value a work environment that is fair, open and respectful of differences. We encourage anyone who wants to work in an ecosystem that is constantly evolving and stimulated to contribute to the application and definition of a healthy and inclusive culture, to apply.
Please note that only selected candidates will be contacted.
https://mila.quebec/fr/protection-de-la-vie-privee
About MILA - Institut québécois d'intelligence artificielle
Ready to Apply?
Join MILA - Institut québécois d'intelligence artificielle and take the next step in your career.
By applying, you agree to our Terms of Service and Privacy Policy.