Research Software Engineer - Clinical NLP Specialty (Data Science and AI Institute)
About the positionResponsibilities• The successful candidates will participate in ground-breaking research projects that need advanced software solutions requiring expertise in software engineering not commonly found in scientific collaborations. • The projects will require development of state-of-the art clinical NLP solutions using the latest deep learning libraries trained on state-of-the-art hardware in secure healthcare computing environments. • Projects will involve analysis of massive data sets either in the cloud or on premises.• Projects will require development of novel NLP software pipelines for processing of unstructured clinical notes. • Some projects may require deep engagement, possibly leading to co-authorship on scientific publications, while others may involve a more casual consulting engagement. • They may require software solutions developed from scratch or refactoring existing solutions to make them conform to industry standards (quality, efficiency, reusability, robustness, portability, documentation, etc.).• It is a high-level goal of DSAI to translate the efforts for the individual projects into frameworks and template patterns for sustainable scientific infrastructure benefiting future projects. Requirements• Strong NLP, LLM, machine learning and deep learning skills. • Practical experience building NLP models and pipelines in a secure, HIPPA compliant healthcare environment. • Expert-level knowledge of multiple modern NLP and LLM libraries and models. • Hands-on experience adapting and fine-tuning large language models for domain-specific clinical applications, with attention to data efficiency, interpretability, and reproducibility.• Demonstrated expertise in prompt engineering, evaluation, and benchmarking of large language models, including applying responsible AI principles in clinical or sensitive-data contexts• Expert-level knowledge of the Python programming language. • Familiarity with or willingness to learn C++ or other languages as may be needed. • Familiarity with software containerization technologies such as Docker and Singularity. • Familiarity with the Databricks platform. • Fluency in the Linux operating system and related tools.• Familiarity with modern software engineering best practices, such as Git source control, peer code review, test-driven development, build automation and continuous integration / continuous delivery. • Familiarity with cloud development and deployment. • Demonstrated leadership and self-direction. • Willingness to teach others both informally and in short course format. • Willingness to continually learn new tools and techniques as needed. • Excellent verbal and written communication. • Masters in a quantitative discipline such as computer science, engineering, physics or bioinformatics, with strong scientific computing and/or mathematics background.• Three year's experience working in software development in large clinical NLP projects in industry or academia. • Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula. Nice-to-haves• PhD in a quantitative discipline. • Five (5) years' experience as above in clinical NLP. • Experience in CUDA GPU programming. • Experience authoring open-source Python packages in PyPI.• Experience in open-source project governance. • Experience in open-source community adoption initiatives. Apply tot his job