Molequle Logo
DSC

Dr. Sarah Chen

sarah.chen@molequle.ai

Dataset Weight Reduction

Calculate absorption energies and reduce dataset weight by removing low-importance columns using ML predictions and SHAP analysis.

OC2020 Absorption Analysis

Started: 1/15/2024, 2:30:00 PM

Stage: SHAP • Status: running

running
45%

Molecular Descriptors Reduction

Started: 1/15/2024, 11:00:00 AM

Stage: REDUCTION • Status: completed

completed

Completed: 1/15/2024, 3:30:00 PM

✅ Weight Reduction Complete

Reduced Dataset Summary

Original Size:1,247 columns × 15,847 rows
Final Size:156 columns × 2,341 rows
Column Reduction:87.5% (removed low-importance)
Row Reduction:85.2% (filtered by SHAP features)

Ready for Quantum ML

The dataset has been optimized by removing low-importance columns and filtering rows based on SHAP feature criteria. This compact dataset is now ready for quantum machine learning processing.

✓ Absorption energies calculated for 15,847 compounds

✓ Feature importance analyzed with SHAP (1,247 → 156 features)

✓ Low-importance columns removed (87.5% reduction)

✓ Dataset rows filtered by SHAP criteria (15,847 → 2,341 compounds)

✓ Compact dataset optimized for quantum processing

ML Absorption Energy Prediction

Train machine learning models to predict molecular absorption energies from structural features. These predictions will be used for downstream analysis and feature importance calculation.

Model Configuration

Algorithm: Random Forest
Features: 1,247 molecular descriptors
Samples: 15,847 compounds
Target: Absorption Energy (eV)