scientific-skills/exploratory-data-analysis/references/spectroscopy_analytical_formats.md
This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation.
Description: Raw time-domain NMR data from Bruker, Agilent, JEOL Typical Data: Complex time-domain signal Use Cases: NMR spectroscopy, structure elucidation Python Libraries:
nmrglue: nmrglue.bruker.read_fid('fid') or nmrglue.varian.read_fid('fid')nmrstarlib: NMR data handling
EDA Approach:Description: Fourier-transformed NMR spectrum Typical Data: Processed frequency-domain data Use Cases: NMR analysis, peak integration Python Libraries:
nmrglue: Frequency domain readingDescription: Bruker processed spectrum (real part) Typical Data: 1D or 2D processed NMR spectra Use Cases: NMR data analysis with Bruker software Python Libraries:
nmrglue: Bruker format support
EDA Approach:Description: JCAMP-DX format for NMR Typical Data: Standardized NMR spectrum Use Cases: Data exchange between software Python Libraries:
jcamp: JCAMP readernmrglue: Can import JCAMP
EDA Approach:Description: Mestrelab Research Mnova format Typical Data: NMR data with processing info Use Cases: Mnova software workflows Python Libraries:
nmrglue: Limited Mnova supportDescription: Standard XML-based MS format Typical Data: MS spectra, chromatograms, metadata Use Cases: Proteomics, metabolomics, lipidomics Python Libraries:
pymzml: pymzml.run.Reader('file.mzML')pyteomics.mzml: pyteomics.mzml.read('file.mzML')MSFileReader: Various wrappers
EDA Approach:Description: Legacy XML MS format Typical Data: Mass spectra and chromatograms Use Cases: Proteomics workflows (older) Python Libraries:
pyteomics.mzxmlpymzml: Can read mzXML
EDA Approach:Description: Legacy PSI MS format Typical Data: Mass spectrometry data Use Cases: Legacy data archives Python Libraries:
pyteomics: Limited supportDescription: Proprietary instrument data Typical Data: Raw mass spectra and metadata Use Cases: Direct instrument output Python Libraries:
pymsfilereader: Thermo RAW filesThermoRawFileParser: CLI wrapperDescription: Agilent MS data folder Typical Data: LC-MS, GC-MS with methods Use Cases: Agilent MassHunter workflows Python Libraries:
Description: AB SCIEX/SCIEX instrument format Typical Data: Mass spectrometry data Use Cases: SCIEX instrument workflows Python Libraries:
Description: Peak list format for MS/MS Typical Data: Precursor and fragment masses Use Cases: Peptide identification, database searches Python Libraries:
pyteomics.mgf: pyteomics.mgf.read('file.mgf')pyopenms: MGF support
EDA Approach:Description: Binary peak list format Typical Data: Serialized MS/MS spectra Use Cases: Software-specific storage Python Libraries:
pickle: Standard deserializationpyteomics: PKL support
EDA Approach:Description: Simple text format for MS data Typical Data: MS1 and MS2 scans Use Cases: Database searching, proteomics Python Libraries:
pyteomics.ms1 and ms2Description: TPP peptide identification format Typical Data: Peptide-spectrum matches Use Cases: Proteomics search results Python Libraries:
pyteomics.pepxml
EDA Approach:Description: TPP protein inference format Typical Data: Protein identifications Use Cases: Proteomics protein-level results Python Libraries:
pyteomics.protxml
EDA Approach:Description: NIST spectral library format Typical Data: Reference mass spectra Use Cases: Spectral library searching Python Libraries:
matchms: Spectral library handlingDescription: Thermo Galactic spectroscopy format Typical Data: IR, Raman, UV-Vis spectra Use Cases: Various spectroscopy instruments Python Libraries:
spc: spc.File('file.spc')specio: Multi-format reader
EDA Approach:Description: Thermo Fisher FTIR format Typical Data: FTIR spectra Use Cases: OMNIC software data Python Libraries:
Description: Bruker OPUS FTIR format (numbered files) Typical Data: FTIR spectra and metadata Use Cases: Bruker FTIR instruments Python Libraries:
brukeropusreader: OPUS format parserspecio: OPUS support
EDA Approach:Description: Simple XY data format Typical Data: Generic spectroscopic data Use Cases: Renishaw Raman, generic exports Python Libraries:
pandas: CSV-like readingDescription: Renishaw WiRE data format Typical Data: Raman spectra and maps Use Cases: Renishaw Raman microscopy Python Libraries:
renishawWiRE: WDF readerDescription: Generic text export from instruments Typical Data: Wavelength/wavenumber and intensity Use Cases: Universal data exchange Python Libraries:
pandas: Text file readingnumpy: Simple array loading
EDA Approach:Description: ASD FieldSpec spectroradiometer Typical Data: Hyperspectral UV-Vis-NIR data Use Cases: Remote sensing, reflectance spectroscopy Python Libraries:
spectral.io.asd: ASD format supportDescription: Perkin Elmer UV/Vis format Typical Data: UV-Vis spectrophotometer data Use Cases: PE Lambda instruments Python Libraries:
Description: CSV export from UV-Vis instruments Typical Data: Wavelength and absorbance/transmittance Use Cases: Universal format for UV-Vis data Python Libraries:
pandas: Native CSV support
EDA Approach:Description: Crystal structure and diffraction data Typical Data: Unit cell, atomic positions, structure factors Use Cases: Crystallography, materials science Python Libraries:
gemmi: gemmi.cif.read_file('file.cif')PyCifRW: CIF reading/writingpymatgen: Materials structure analysis
EDA Approach:Description: Miller indices and intensities Typical Data: Integrated diffraction intensities Use Cases: Crystallographic refinement Python Libraries:
Description: Binary crystallographic data Typical Data: Reflections, phases, structure factors Use Cases: Macromolecular crystallography Python Libraries:
gemmi: MTZ supportcctbx: Comprehensive crystallography
EDA Approach:Description: 2-theta vs intensity data Typical Data: Powder X-ray diffraction patterns Use Cases: Phase identification, Rietveld refinement Python Libraries:
pandas: Simple XY readingpymatgen: XRD pattern analysis
EDA Approach:Description: Vendor-specific XRD raw data Typical Data: XRD patterns with metadata Use Cases: Bruker, PANalytical, Rigaku instruments Python Libraries:
Description: General Structure Analysis System Typical Data: Powder diffraction for Rietveld Use Cases: Rietveld refinement Python Libraries:
Description: VG Scienta spectrometer format Typical Data: XPS, UPS, ARPES spectra Use Cases: Photoelectron spectroscopy Python Libraries:
specio: Multi-format support
EDA Approach:Description: Princeton Instruments/Roper Scientific Typical Data: CCD spectra, Raman, PL Use Cases: Spectroscopy with CCD detectors Python Libraries:
spe2py: SPE file readerspe_loader: Alternative parser
EDA Approach:Description: Photon Technology International Typical Data: Fluorescence, phosphorescence spectra Use Cases: Fluorescence spectroscopy Python Libraries:
Description: Generic binary or text spectroscopy data Typical Data: Various spectroscopic measurements Use Cases: Many instruments use .dat extension Python Libraries:
numpy, pandas for known formats
EDA Approach:Description: Generic chromatography format Typical Data: Retention time vs signal Use Cases: HPLC, GC, LC-MS Python Libraries:
pandas for text exports
EDA Approach:Description: Agilent ChemStation format Typical Data: Chromatograms and method parameters Use Cases: Agilent HPLC and GC systems Python Libraries:
agilent-chemstation: Community toolsDescription: Waters Empower format Typical Data: UPLC/HPLC chromatograms Use Cases: Waters instrument data Python Libraries:
Description: Shimadzu chromatography format Typical Data: GC/HPLC data Use Cases: Shimadzu instruments Python Libraries:
Description: Thermal analysis data (TA Instruments) Typical Data: Temperature vs heat flow or mass Use Cases: Differential scanning calorimetry, thermogravimetry Python Libraries:
pandas for exported data
EDA Approach:Description: Elemental analysis data Typical Data: Element concentrations or counts Use Cases: Inductively coupled plasma MS/OES Python Libraries:
Description: Electrochemical experiment data Typical Data: Potential vs current or charge Use Cases: Cyclic voltammetry, chronoamperometry Python Libraries:
galvani: Biologic EC-Lab files
EDA Approach: