Enabling AI with Data Cards
- By: JAIC Public Affairs
Embracing new data-driven concepts and leveraging commercial-sector innovations will improve military operations and increase lethality. – The DOD Data Strategy
Data is one of the DoD’s most strategic asset and it fuels AI. As such, data readiness and traceability is vital to enabling the rapid development of effective and reliable AI capabilities. In order to effectively put data to use, JAIC data scientists must work with DoD commanders and managers to track how data evolves in the product development process to ensure traceability. The goal is to enable AI development and provide DoD customers with the necessary AI-enabling infrastructure, tools, and expertise for an American strategic advantage.
"AI is not a single technology, it is a collection of many different technologies and techniques across multiple data types and domains,” explained Nand Mulchandani, the JAIC’s Chief Technology Officer. One of those technologies, machine learning (ML), requires a robust data pipeline at its foundation to be truly impactful. Building a data pipeline enables ML models to operate and make decisions. However, the data used in those models can naturally change over time. Due to these naturally-occurring changes, models often need to be re-trained with new data. Data cards provide future developers the structured framework and detailed overview of an existing model’s uses and limitations. They enhance, hasten, and secure the development process by:
- Providing dataset summaries, characteristics, intended use cases, labeling methods and procedures, and validation methods and descriptions
- Enabling faster reproducibility of past data models by reducing barriers to entry through listing previous features
- Delivering AI assurance and transparency throughout the data card model reproduction process by clearly displaying changes and identifying anomalies and potential ethical issues in usage, data representation, and model structure
- Documenting the model’s performance, biases, and ethical issues
- Supplying a catalog of datasets for DoD partners to browse and use
is a readable format used for transmitting, structuring, storing, and exchanging data
A MULTILAYER PERCEPTRON (MLP)
is a deep, artificial neural network that detects data features
is a popular, versatile supervised learning algorithm allows rapid identification of significant information from vast datasets
This benefits the JAIC and its DoD partners by underscoring responsible AI, where everyone works towards creating systems that are nonpartisan, transparent, and accessible. This awareness also cultivates confidence and insight into AI systems, while meeting system requirements. If the data is private to any group, then any model to be trained using that data needs privacy safeguards. Also, any identified security threat stated in the data card can help the JAIC’s DoD partners keep their systems protected.
Product managers can evaluate the data quality, and also the sufficiency of the data to represent the required objectives being modeled. They can establish a data collection timeline which is consistent with model development and project requirements. The internally customized data cards provide a library of real data allowing our DoD partners to use genuine, not synthetic, data in their initial development. Overtime, data cards will help the JAIC’s customers more effectively and rapidly develop their AI capabilities by reducing barriers to entry and using repeatable processes.
The use of data cards is key to accelerating the adoption of safe and ethical AI into DoD operations. Ultimately, these AI-enabled solutions will provide data-driven and data-informed decision-making at speed, creating strategic and operational advantages for the Warfighter.