Azure Specialized Compute drives the hardware roadmap, software and services that enable our users to run technical computing workloads in Azure - from batch workloads to AI & machine learning to traditional HPC simulations to remote visualization. We are responsible for providing the most scalable MPI platform as well as the most powerful GPU clusters for our end customers in their quest for finding answers to some of the most difficult questions of science and industry.
* Excellent problem-solving skills and analytical ability.
* Solid understanding of AI architecture and requirements [processor technology, networks, memory components etc.]
* Solid working knowledge of Linux and able to compile and modify AI codes that use C++, MPI, CUDA, Python, and OpenMP.
* Ability to use CPU and GPU profiling tools to identify bottlenecks in the performance.
* Proficient in one of the following AI frameworks: PyTorch, TensorFlow, MXNet
* Working knowledge of container orchestration including setting up and configuring Docker and Kubernetes.
* Working knowledge of Slurm is desired.
* Experience with scientific/engineering software for AI systems
* Experience/education in fields where AI is used, including Deep Learning, Computer Vision, Physics, Engineering, Data Analytics, etc.
* An understanding of the issues affecting AI application performance.
* Willingness to take feedback and be a team player.
* Ability to clearly communicate issues and results to stakeholders.
* Master's program or Ph.D completed in Computer Science or related fields
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
* Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
This is an exciting time for HPC + AI, as they are undergoing a massive shift. AI technologies are being merged with existing HPC approaches, and both are moving to the cloud. At this critical juncture, we are looking for AI benchmarking technical lead to be part of our benchmarking initiative. This team member would be responsible for AI industry standard and customer specific application benchmarks on our latest and greatest hardware offering showcasing the best of Azure. Typical team project would include gathering performance data and characteristics for key AI applications, analyzing and optimizing the application to run best on Azure HPC infrastructure based on latest GPUs, CPUs and other accelerators. The benchmarking team works closely with Product management, engineering, and is engaged in key customer performance evaluations.
A successful candidate would have experience in experience with AI training and validation workloads with interest in large scale applications [think supercomputer scale]. Be a self-starter that is willing and able to mentor junior members of the team and provide training to the field team. Finally, the candidate must be coachable and a team player.