Polished Transmission Error Dataset Generators
Overview
The repository maintains two complete copies of the polished transmission-error dataset generator:
data/generate_polished_dataset.pyscripts/datasets/generate_polished_transmission_error_dataset.py
The first copy is standalone and does not import repository modules. The second copy is integrated into the repository path layout. Their processing logic is identical.
Intentional Path Difference
The standalone script resolves paths relative to its own location:
input = ../original_dataset
output = ./generated_polished_dataset
This means that the following portable bundle works on another computer:
data/
generate_polished_dataset.py
original_dataset/
polished_dataset/
Run it with:
python data/generate_polished_dataset.py
The repository copy resolves paths from the repository root:
input = data/original_dataset
output = output/generated_polished_dataset
Run it with:
conda run --no-capture-output -n pinns_env python scripts/datasets/generate_polished_transmission_error_dataset.py
Configuration Constants
Both scripts retain the same editable constants:
USE_FORWARD_DIRECTION = True
USE_BACKWARD_DIRECTION = True
OVERWRITE_EXISTING_FILES = False
Set one direction flag to False when only one direction is required.
Existing destination files are protected by default. Set
OVERWRITE_EXISTING_FILES = True only when replacing generated files is
intentional.
Progress and verbose terminal reporting are controlled by:
SHOW_PROGRESS_BAR = True
VERBOSE_LOGGING = True
The scripts print their resolved paths and direction configuration, report the
accepted source inventory, display a tqdm progress bar with the current
filename, and print a final export summary. Skip and error messages use
tqdm.write() so they do not corrupt the live bar.
The standalone script requires:
python -m pip install "tqdm>=4.67,<5.0"
The repository environment installs the same dependency through
requirements.txt.
Confirmed Corrections
The current scripts include these approved corrections:
removed workstation-specific absolute paths;
made the standalone defaults relative to the script location;
made repository defaults relative to the repository root;
made raw-row parsing atomic, preventing partially numeric rows from producing columns with different lengths;
added checks for missing or invalid input and output paths;
rejected identical input and output paths;
rejected empty supported-source inventories;
preflighted all planned destinations before writing, preventing silent replacement and collision-driven partial output;
made processing skips fail the final run instead of reporting success;
replaced the broad generated-folder prefix rule with explicit known folder names.
Preserved Historical Behavior
The following behavior remains unchanged:
gear ratio
81;sample time
0.25 ms;forward and backward
DataValidselection;output-side zeroing and cluster correction;
first-sample
theta_dotconvention;duplicate-condition selection;
ignored historical filename list;
output hierarchy, filenames, header, numeric precision, and equations.
The ignored source list still matches filenames globally rather than by full relative path. This is intentional because those names identify the known historical exceptions in the original dataset.
Verification
Both scripts compile in pinns_env. A controlled test using:
data/original_dataset/Test_35deg/1800rpm/1800.0rpm0.0Nm35.0deg.csv
generated both directions from each implementation. The standalone and repository copies were byte-identical to each other and to the corresponding tracked polished CSV files.