Handles and preprocesses large and diverse datasets, including data cleaning, transformation, and feature engineering.
Key Behaviors
- Demonstrates an understanding of when/ why data transformations are necessary.
- Organizes, stores, and maintains data in a way that optimizes access patterns and security and accommodates different data types and characteristics.
- Aggregates data from multiple sources, creates crosswalks for data, and uses standard organization and structuring techniques to manage data.
- Performs quality checks on data sets to detect/ address any issues before storage, analysis, or dissemination.
- Prepares data and ensures its suitability for the intended use (e.g., storage, analysis, dissemination, or modeling) by cleaning, wrangling, and formatting the data.
Developmental opportunities for this competency are available from the NIH Training Center.