This tool converts NSRL-CAID JSON files to UCO (Unified Cyber Ontology) format. It maps NSRL-CAID media objects to UCO observable:File objects with appropriate facets and relationships to support data provenance.
NSRL-CAID JSON files are available from https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl/nsrl-download/non-rds-hash
python nsrl_to_uco.py data/NSRL-CAID-WMV.json
python nsrl_to_uco.py data/
python nsrl_to_uco.py data/ -o custom_output/
When using the --combine flag, the script will:
- Create individual UCO files for each input
- Create an additional combined graph file
Using default output folder:
python nsrl_to_uco.py data/ --combine
Using custom output folder:
python nsrl_to_uco.py data/ -o custom_output/ --combine
# Enable debug logging
python nsrl_to_uco.py data/ --debug
# Write logs to file
python nsrl_to_uco.py data/ --log-file conversion.log
# Enable output validation
python nsrl_to_uco.py data/ --validate
The converter maps NSRL-CAID objects to UCO objects following these rules:
- NSRL Media objects become
uco-observable:File
objects - MediaID becomes the object's
@id
with prefix "kb:media-" - Category maps to
uco-observable:categories
- Includes FileFacet with:
- File name and path
- Size in bytes from MediaSize
- MD5 hash from MediaFile (as
xsd:hexBinary
) - SHA1 hash from parent Media object (as
xsd:hexBinary
) - Hash methods using
uco-vocabulary:HashNameVocab
- Each MediaFile becomes a
uco-observable:File
object - Includes FileFacet with filename and filepath
- Includes HashFacet with MD5 hash (as
xsd:hexBinary
) - All timestamps in UTC format
Each input file produces a corresponding UCO JSON-LD file containing:
- Full UCO context definitions with all required namespaces
- Bundle with tool, organization, and source objects
- File objects with appropriate facets
- Provenance relationships with timestamps
- Compliant with UCO 1.3.0 specification
When using --combine, an additional uco-combined.json is created containing:
- Single UCO context with all namespaces
- All bundles from individual files
- Preserved relationships and provenance
- Validated against CASE/UCO standards
- Python 3.9+
- CASE Utilities Python package
- Standard library
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
.\venv\Scripts\Activate.ps1 # Windows
# Install dependencies
pip install case_utils
- Reports processing errors for individual files
- Continues processing remaining files if one fails
- Creates output directory if it doesn't exist
- Detailed logging with optional file output
- Progress tracking for batch processing
- Validation error reporting
The tool uses CASE Utilities to validate output against UCO 1.3.0:
- Validates JSON-LD structure
- Checks UCO ontology compliance
- Verifies relationship integrity
- Ensures proper timestamp formats
- Reports validation errors
For details on how NSRL CAID fields map to UCO, see Data Model Design Document