Transmission Error Dataset Family Reference
Purpose
This document is the canonical reference for the three transmission-error
dataset surfaces stored under data/:
data/original_dataset/data/simplified_dataset/data/polished_dataset/
The three roots share the same experimental origin but serve different purposes. They must not be treated as interchangeable file formats.
Dataset Lineage
original_dataset
├── simplified_dataset
└── polished_dataset
└── generated by generate_polished_dataset.py
original_dataset contains the raw test-rig recordings. Both derived datasets
come from those recordings:
simplified_datasetis the established legacy curve dataset retained for compatibility and historical artifact reproduction;polished_datasetis a direct row-level export produced bydata/generate_polished_dataset.py.
A complete path-adapted repository copy is maintained at
scripts/datasets/generate_polished_transmission_error_dataset.py. The two
implementations have identical processing logic and differ only in their
default path block.
The repository defaults to polished_dataset in
config/datasets/transmission_error_dataset.yaml. The shared loader supports
both schemas through the polished_dataset and simplified_dataset
selectors.
Verified Inventory
The audit performed on June 20, 2026 found:
Dataset |
CSV files |
Approximate size |
Direction representation |
|---|---|---|---|
|
975 |
11.498 GiB |
Both validity channels in each raw file |
|
969 |
2.605 GiB |
Forward and backward curves in one CSV |
|
1,938 |
6.737 GiB |
One CSV per direction |
The 969 canonical operating conditions are distributed evenly:
323 conditions at nominal
25 degC;323 conditions at nominal
30 degC;323 conditions at nominal
35 degC.
polished_dataset therefore contains:
969 files under
backward/;969 files under
forward/;323 files for each direction and nominal temperature combination.
Original Dataset
Role
Use data/original_dataset/ when preprocessing, validity-window extraction,
zeroing, signal interpretation, or provenance must be reconstructed.
Raw Structure
The raw tree is grouped by nominal temperature and motor speed:
data/original_dataset/
Test_25deg/
1000rpm/
1000.0rpm100.0Nm25.0deg.csv
Test_30deg/
Test_35deg/
The CSV files are semicolon-delimited and have no header row. The support material in the same directory includes:
Info_DataStructure.pptx, which describes the rig and signal layout;TE.m, which demonstrates directional filtering and TE computation;NumDiff.m, which contains the historical MATLAB numerical derivative;fft_fun.m, which contains the historical harmonic helper.
Raw Columns Used By The Polished Export
Column numbers below are one-based, matching MATLAB and the presentation:
Raw column |
Generator name |
Meaning |
|---|---|---|
2 |
|
Cumulative, common-zeroed Renishaw input-side encoder position in degrees |
3 |
|
Cumulative, common-zeroed Renishaw output-side encoder position in degrees |
4 |
|
Measured load/output-side Manner torque in Nm |
5 |
|
Forward valid-window flag |
6 |
|
Backward valid-window flag |
8 |
|
Measured tested-reducer oil temperature in degrees Celsius |
11 |
|
Raw absolute Renishaw output-side encoder position in degrees |
The presentation identifies the Renishaw devices as absolute encoders, but columns 2 and 3 are cumulative multi-turn signals after common software zeroing. They are not the unchanged single-turn absolute readings.
Validity Windows
The test procedure runs each operating condition in both motion directions. The PLC activates the corresponding validity channel while the load-side absolute encoder traverses the selected revolution:
raw column 5 selects forward rows;
raw column 6 selects backward rows.
The polished generator accepts every nonzero flag value. It does not merge directions and does not retain transient rows outside the selected windows.
Simplified Dataset
Role
data/simplified_dataset/ remains the compatibility source for legacy
five-feature training and evaluation workflows.
Structure And Schema
Each operating condition has one comma-delimited CSV:
data/simplified_dataset/
Test_25degree/
1000rpm/
1000.0rpm100.0Nm25.0deg.csv
Each file contains both directions:
Poisition_Output_Reducer_Fw,Transmission_Error_Fw,Position_Output_Reducer_Bw,Transmission_Error_Bw
The misspelling Poisition_Output_Reducer_Fw is present in the source files
and is intentionally supported by
scripts/datasets/transmission_error_dataset.py.
The current loader turns the 969 files into 1,938 directional curve samples. It parses nominal speed, torque, and temperature from the path, sorts each direction by reducer-output position, and exposes direction as an explicit model feature.
Polished Dataset
Role
data/polished_dataset/ preserves valid time-ordered rows and adds measured
speed, load torque, and oil temperature to every exported sample. It is useful
for temporal modeling, preprocessing audits, signal-level analysis, and future
loaders that need the measured operating state instead of only nominal
filename metadata.
Direction-Separated Structure
data/polished_dataset/
backward/
25degree/
1000rpm/
1000.0rpm100.0Nm25.0deg.csv
forward/
25degree/
1000rpm/
1000.0rpm100.0Nm25.0deg.csv
Direction is encoded by the top-level folder and is not repeated as a CSV column.
The filename format is:
<nominal_speed>.0rpm<nominal_torque>.0Nm<nominal_temperature>.0deg.csv
Folder and filename values describe nominal test setpoints. The CSV columns contain measured or derived sample-level values and therefore need not equal the nominal values exactly.
Verified CSV Schema
Every polished file has exactly:
theta,theta_dot,tau_load,T,theta_TE
Column |
Unit |
Classification |
Verified meaning |
|---|---|---|---|
|
deg |
Derived from a measured position |
Input-side cumulative Renishaw angle divided by |
|
rpm |
Derived |
Motor/input-side speed calculated from consecutive input-side position samples at |
|
Nm |
Measured |
Signed load/output-side Manner torque from raw column 4 |
|
degC |
Measured |
Tested-reducer oil temperature from raw column 8 |
|
deg |
Derived from measured positions |
Transmission error after output-side zeroing correction |
This corrects two potentially misleading shorthand descriptions:
thetaoriginates from the motor/input-side absolute encoder system, but the exported value is common-zeroed, cumulative, ratio-scaled, and wrapped;theta_TEis not measured by a dedicated TE sensor. It is calculated from the two measured encoder positions.
Generation Equations
Constants:
gear_ratio = 81
sample_time = 0.00025 s
For rows selected by the relevant direction flag:
theta_rad = radians(input_encoder_cumulative_deg) / gear_ratio
theta = degrees(theta_rad modulo 2*pi)
The first theta_dot value uses the difference between the first two selected
samples. Every subsequent value uses the current-minus-previous difference:
dtheta_rad_s[i] = (theta_rad[i] - theta_rad[i - 1]) / sample_time
theta_dot[i] = degrees(dtheta_rad_s[i]) / 6 * gear_ratio
Because theta_rad was divided by the gear ratio and the speed expression
multiplies it back, theta_dot is a motor/input-side speed in rpm.
The output-side zeroing offset uses the first three raw samples:
raw_offset = radians(mean(q_abs_deg[0:3]) - mean(q_enc_deg[0:3]))
q_offset = atan2(sin(raw_offset), cos(raw_offset))
The generator then applies its retained cluster correction:
if q_offset < -0.002 rad: q_offset += 0.0044 rad
if q_offset > 0.002 rad: q_offset -= 0.00415 rad
Finally:
q_not_zeroed_rad = radians(output_encoder_cumulative_deg) + q_offset
theta_TE = degrees(q_not_zeroed_rad - theta_rad)
Direction And Sign Conventions
forwardfiles have positive meantheta_dot;backwardfiles have negative meantheta_dot;tau_loadis signed measurement data;filename torque is a nominal nonnegative setpoint magnitude.
For this dataset, forward torque samples commonly carry the opposite sign from
backward samples. Consumers must not replace measured tau_load with the
unsigned filename value.
Full-Population Audit Results
All 1,938 polished CSV files were parsed during the June 20, 2026 audit:
expected headers: 1,938 of 1,938;
numeric data rows: 75,585,373;
files with malformed rows: 0;
files with non-finite values: 0;
empty files: 0;
minimum rows in one file: 10,799;
maximum rows in one file: 194,401.
Observed full-population ranges:
Column |
Minimum |
Maximum |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The instantaneous theta_dot extrema show that the numerical derivative can
contain excursions beyond the nominal speed. Use the filename for the nominal
condition and the column for the measured/derived sample-level speed.
Raw-To-Polished Verification
The raw inventory contains 975 CSV files. The generator explicitly ignores six known duplicate or connection files, leaving 969 source conditions:
200.0rpm0.0Nm25.0deg1.csv
200.0rpm100.0Nm25.0deg1.csv
800.0rpm200.0Nm25.0deg.csv
1100.0rpm100.0Nm30.0deg_collegamento.csv
1600.0rpm100.0Nm30.0degCollegamiento.csv
1600.0rpm100.0Nm30.0degcollegamento2.csv
The retained corrected 800 rpm source uses the _1.csv suffix. Export
filenames are normalized and omit that suffix.
A deterministic formula check sampled 27 raw files across all three temperatures and the minimum, median, and maximum speed folders. It compared both directions, 54 polished outputs, and 4,082,398 rows. The maximum absolute difference was exactly zero for all five exported columns.
Choosing The Correct Dataset
Use original_dataset when:
reconstructing preprocessing or zeroing;
validating
DataValidbehavior;inspecting signals not present in the derived datasets;
auditing experimental provenance.
Use simplified_dataset when:
reproducing legacy repository training or TE curve evaluation;
working with one TE curve per direction and operating condition;
relying on current configuration and loader compatibility.
Use polished_dataset when:
running new repository training through the default selector;
preserving the time order of valid samples;
using measured torque and temperature at sample level;
developing temporal or sequence-aware loaders;
auditing the direct encoder-to-TE transformation.
Reproducing The Polished Export
The standalone script resolves its defaults relative to its own location:
data/generate_polished_dataset.py
Then run:
python data/generate_polished_dataset.py
The repository-integrated copy uses data/original_dataset/ as input and
output/generated_polished_dataset/ as output:
conda run --no-capture-output -n pinns_env python scripts/datasets/generate_polished_transmission_error_dataset.py
Both versions protect existing files unless
OVERWRITE_EXISTING_FILES = True and show a tqdm progress bar by default.
Usage Constraints
Do not point the current simplified-dataset loader at
polished_dataset; its expected four-column schema is different.Do not infer direction from torque sign; use the
forward/orbackward/path.Do not interpret filename metadata as measured sample-level values.
Do not call
thetathe unchanged absolute motor position.Do not call
theta_TEa directly sensed channel.Preserve the validity-window and zeroing logic when creating future derivations.