Shared Training Infrastructure
This page documents the reusable infrastructure that resolves TE experiment identity, prepares immutable artifact folders, initializes shared Lightning components, and maintains family/program registries.
Shared training utilities for TE run identity, artifacts, and registries.
- class scripts.training.shared_training_infrastructure.ExperimentIdentity(model_family, model_type, run_name)[source]
Logical experiment identity resolved from a training configuration.
- Parameters:
model_family (str)
model_type (str)
run_name (str)
- class scripts.training.shared_training_infrastructure.ModelParameterSummary(backbone_name, trainable_parameter_count, frozen_parameter_count, total_parameter_count)[source]
Trainable, frozen, and total parameter counts for one backbone.
- Parameters:
backbone_name (str)
trainable_parameter_count (int)
frozen_parameter_count (int)
total_parameter_count (int)
- class scripts.training.shared_training_infrastructure.RunArtifactIdentity(artifact_kind, model_family, run_name, run_instance_id, output_directory)[source]
Physical artifact identity used for immutable output folders.
- Parameters:
artifact_kind (str)
model_family (str)
run_name (str)
run_instance_id (str)
output_directory (Path)
- scripts.training.shared_training_infrastructure.load_training_config(config_path=DEFAULT_CONFIG_PATH)[source]
Load and validate a YAML training configuration.
- Parameters:
config_path (str | Path) – Absolute or project-relative configuration path.
- Returns:
Parsed training configuration dictionary.
- Return type:
dict[str, Any]
- scripts.training.shared_training_infrastructure.resolve_experiment_identity(training_config)[source]
Resolve the logical experiment identity from the training config.
- Parameters:
training_config (dict[str, Any]) – Parsed training configuration dictionary.
- Returns:
Normalized model family, model type, and run name.
- Return type:
- scripts.training.shared_training_infrastructure.prepare_output_artifact_training_config(training_config, artifact_kind=RUN_OUTPUT_ARTIFACT_KIND, run_name_suffix=None, run_instance_id=None)[source]
Attach output-artifact metadata to a cloned training configuration.
- Parameters:
training_config (dict[str, Any]) – Source training configuration.
artifact_kind (str) – Artifact family such as training run or validation check.
run_name_suffix (str | None) – Optional suffix appended to the logical run name.
run_instance_id (str | None) – Optional explicit immutable run instance identifier.
- Returns:
Cloned training configuration enriched with artifact metadata under the metadata section.
- Return type:
dict[str, Any]
- scripts.training.shared_training_infrastructure.resolve_run_artifact_identity(training_config)[source]
Resolve the full physical artifact identity for one prepared config.
- Parameters:
training_config (dict[str, Any])
- Return type:
- scripts.training.shared_training_infrastructure.create_datamodule_from_training_config(training_config)[source]
Instantiate the TE LightningDataModule from the training config.
- Parameters:
training_config (dict[str, Any])
- Return type:
- scripts.training.shared_training_infrastructure.create_regression_backbone_from_training_config(training_config, input_feature_dim)[source]
Instantiate the regression backbone declared in the config.
- Parameters:
training_config (dict[str, Any]) – Parsed training configuration dictionary.
input_feature_dim (int) – Input dimension resolved from the prepared dataset.
- Returns:
Configured regression backbone.
- Return type:
nn.Module
- scripts.training.shared_training_infrastructure.create_regression_module_from_training_config(training_config, regression_backbone, input_feature_dim, target_feature_dim, normalization_statistics)[source]
Wrap a configured backbone in the shared Lightning regression module.
- Parameters:
training_config (dict[str, Any])
regression_backbone (Module)
input_feature_dim (int)
target_feature_dim (int)
normalization_statistics (NormalizationStatistics)
- Return type:
- scripts.training.shared_training_infrastructure.initialize_training_components(training_config)[source]
Build the datamodule, backbone, module, and normalization bundle.
- Parameters:
training_config (dict[str, Any]) – Parsed training configuration dictionary.
- Returns:
Fully initialized training components ready for fit or validation work.
- Return type:
tuple[TransmissionErrorDataModule, nn.Module, TransmissionErrorRegressionModule, NormalizationStatistics]
- scripts.training.shared_training_infrastructure.fetch_first_batch(datamodule, split_name='train')[source]
Fetch the first batch from one requested dataloader split.
- Parameters:
datamodule (TransmissionErrorDataModule)
split_name (str)
- Return type:
dict[str, Any]
- scripts.training.shared_training_infrastructure.validate_batch_dictionary(batch_dictionary, input_feature_dim, target_feature_dim)[source]
Validate the structural contract of a collated point batch.
- Parameters:
batch_dictionary (dict[str, Any]) – Batch emitted by the datamodule collate function.
input_feature_dim (int) – Expected final input feature dimension.
target_feature_dim (int) – Expected final target feature dimension.
- Returns:
Small structural summary of the validated batch.
- Return type:
dict[str, Any]
- scripts.training.shared_training_infrastructure.build_common_metrics_snapshot(training_config, config_path, output_directory, datamodule, parameter_summary, runtime_config, best_model_path, validation_metric_list, test_metric_list)[source]
Build the canonical metrics snapshot stored with a training artifact.
- Parameters:
training_config (dict[str, Any])
config_path (str | Path)
output_directory (Path)
datamodule (TransmissionErrorDataModule)
parameter_summary (ModelParameterSummary)
runtime_config (dict[str, object])
best_model_path (str)
validation_metric_list (list[dict[str, object]])
test_metric_list (list[dict[str, object]])
- Return type:
dict[str, object]
- scripts.training.shared_training_infrastructure.save_yaml_snapshot(snapshot_dictionary, output_path)[source]
Persist one YAML snapshot to disk, creating parent folders as needed.
- Parameters:
snapshot_dictionary (dict[str, Any])
output_path (Path)
- Return type:
None
- scripts.training.shared_training_infrastructure.save_training_config_snapshot(training_config, output_directory)[source]
Persist the effective training configuration inside an artifact folder.
- Parameters:
training_config (dict[str, Any])
output_directory (Path)
- Return type:
None
- scripts.training.shared_training_infrastructure.save_common_metrics_snapshot(metrics_snapshot_dictionary, output_directory)[source]
Persist the common metrics snapshot inside an artifact folder.
- Parameters:
metrics_snapshot_dictionary (dict[str, Any])
output_directory (Path)
- Return type:
None
- scripts.training.shared_training_infrastructure.update_family_registry(metrics_snapshot_dictionary)[source]
Update the family leaderboard and latest-family-best snapshots.
- Parameters:
metrics_snapshot_dictionary (dict[str, Any]) – Common metrics snapshot for one completed training artifact.
- Returns:
Selected best entry for the model family after update.
- Return type:
dict[str, Any]