references/spectroscopy_analytical_formats.md

Spectroscopy and Analytical Chemistry File Formats Reference

This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation.

NMR Spectroscopy

.fid - NMR Free Induction Decay

Description: Raw time-domain NMR data from Bruker, Agilent, JEOL Typical Data: Complex time-domain signal Use Cases: NMR spectroscopy, structure elucidation Python Libraries: - nmrglue: nmrglue.bruker.read_fid('fid') or nmrglue.varian.read_fid('fid') - nmrstarlib: NMR data handling EDA Approach: - Time-domain signal decay - Sampling rate and acquisition time - Number of data points - Signal-to-noise ratio estimation - Baseline drift assessment - Digital filter effects - Acquisition parameter validation - Apodization function selection

.ft / .ft1 / .ft2 - NMR Frequency Domain

Description: Fourier-transformed NMR spectrum Typical Data: Processed frequency-domain data Use Cases: NMR analysis, peak integration Python Libraries: - nmrglue: Frequency domain reading - Custom processing pipelines EDA Approach: - Peak picking and integration - Chemical shift range - Baseline correction quality - Phase correction assessment - Reference peak identification - Spectral resolution - Artifacts detection - Multiplicity analysis

.1r / .2rr - Bruker NMR Processed Data

Description: Bruker processed spectrum (real part) Typical Data: 1D or 2D processed NMR spectra Use Cases: NMR data analysis with Bruker software Python Libraries: - nmrglue: Bruker format support EDA Approach: - Processing parameters review - Window function effects - Zero-filling assessment - Linear prediction validation - Spectral artifacts

.dx - NMR JCAMP-DX

Description: JCAMP-DX format for NMR Typical Data: Standardized NMR spectrum Use Cases: Data exchange between software Python Libraries: - jcamp: JCAMP reader - nmrglue: Can import JCAMP EDA Approach: - Format compliance - Metadata completeness - Peak table validation - Integration values - Compound identification info

.mnova - Mnova Format

Description: Mestrelab Research Mnova format Typical Data: NMR data with processing info Use Cases: Mnova software workflows Python Libraries: - nmrglue: Limited Mnova support - Conversion tools to standard formats EDA Approach: - Multi-spectrum handling - Processing pipeline review - Quantification data - Structure assignment

Mass Spectrometry

.mzML - Mass Spectrometry Markup Language

Description: Standard XML-based MS format Typical Data: MS spectra, chromatograms, metadata Use Cases: Proteomics, metabolomics, lipidomics Python Libraries: - pymzml: pymzml.run.Reader('file.mzML') - pyteomics.mzml: pyteomics.mzml.read('file.mzML') - MSFileReader: Various wrappers EDA Approach: - Scan count and MS level distribution - Retention time range and TIC - m/z range and resolution - Precursor ion selection - Fragmentation patterns - Instrument configuration - Quality control metrics - Data completeness

.mzXML - Mass Spectrometry XML

Description: Legacy XML MS format Typical Data: Mass spectra and chromatograms Use Cases: Proteomics workflows (older) Python Libraries: - pyteomics.mzxml - pymzml: Can read mzXML EDA Approach: - Similar to mzML - Version compatibility - Conversion quality assessment

.mzData - mzData Format

Description: Legacy PSI MS format Typical Data: Mass spectrometry data Use Cases: Legacy data archives Python Libraries: - pyteomics: Limited support - Conversion to mzML recommended EDA Approach: - Format conversion validation - Data completeness - Metadata extraction

.raw - Vendor Raw Files (Thermo, Agilent, Bruker)

Description: Proprietary instrument data Typical Data: Raw mass spectra and metadata Use Cases: Direct instrument output Python Libraries: - pymsfilereader: Thermo RAW files - ThermoRawFileParser: CLI wrapper - Vendor-specific APIs EDA Approach: - Method parameter extraction - Instrument performance metrics - Calibration status - Scan function analysis - MS/MS quality metrics - Dynamic exclusion evaluation

.d - Agilent Data Directory

Description: Agilent MS data folder Typical Data: LC-MS, GC-MS with methods Use Cases: Agilent MassHunter workflows Python Libraries: - Community parsers - Chemstation integration EDA Approach: - Directory structure validation - Method parameters - Calibration curves - Sequence metadata - Signal quality metrics

.wiff - AB SCIEX Data

Description: AB SCIEX/SCIEX instrument format Typical Data: Mass spectrometry data Use Cases: SCIEX instrument workflows Python Libraries: - Vendor SDKs (limited Python support) - Conversion tools EDA Approach: - Experiment type identification - Scan properties - Quantitation data - Multi-experiment structure

.mgf - Mascot Generic Format

Description: Peak list format for MS/MS Typical Data: Precursor and fragment masses Use Cases: Peptide identification, database searches Python Libraries: - pyteomics.mgf: pyteomics.mgf.read('file.mgf') - pyopenms: MGF support EDA Approach: - Spectrum count - Charge state distribution - Precursor m/z and intensity - Fragment peak count - Mass accuracy - Title and metadata parsing

.pkl - Peak List (Binary)

Description: Binary peak list format Typical Data: Serialized MS/MS spectra Use Cases: Software-specific storage Python Libraries: - pickle: Standard deserialization - pyteomics: PKL support EDA Approach: - Data structure inspection - Conversion to standard formats - Metadata preservation

.ms1 / .ms2 - MS1/MS2 Formats

Description: Simple text format for MS data Typical Data: MS1 and MS2 scans Use Cases: Database searching, proteomics Python Libraries: - pyteomics.ms1 and ms2 - Simple text parsing EDA Approach: - Scan count by level - Retention time series - Charge state analysis - m/z range coverage

.pepXML - Peptide XML

Description: TPP peptide identification format Typical Data: Peptide-spectrum matches Use Cases: Proteomics search results Python Libraries: - pyteomics.pepxml EDA Approach: - Search result statistics - Score distribution - Modification analysis - FDR assessment - Enzyme specificity

.protXML - Protein XML

Description: TPP protein inference format Typical Data: Protein identifications Use Cases: Proteomics protein-level results Python Libraries: - pyteomics.protxml EDA Approach: - Protein group analysis - Coverage statistics - Confidence scoring - Parsimony analysis

.msp - NIST MS Search Format

Description: NIST spectral library format Typical Data: Reference mass spectra Use Cases: Spectral library searching Python Libraries: - matchms: Spectral library handling - Custom parsers EDA Approach: - Library size and coverage - Metadata completeness - Peak count statistics - Compound annotation quality

Infrared and Raman Spectroscopy

.spc - Galactic SPC

Description: Thermo Galactic spectroscopy format Typical Data: IR, Raman, UV-Vis spectra Use Cases: Various spectroscopy instruments Python Libraries: - spc: spc.File('file.spc') - specio: Multi-format reader EDA Approach: - Wavenumber/wavelength range - Data point density - Multi-spectrum handling - Baseline characteristics - Peak identification - Absorbance/transmittance mode - Instrument information

.spa - Thermo Nicolet

Description: Thermo Fisher FTIR format Typical Data: FTIR spectra Use Cases: OMNIC software data Python Libraries: - Custom binary parsers - Conversion to JCAMP or SPC EDA Approach: - Interferogram vs spectrum - Background spectrum validation - Atmospheric compensation - Resolution and scan number - Sample information

.0 - Bruker OPUS

Description: Bruker OPUS FTIR format (numbered files) Typical Data: FTIR spectra and metadata Use Cases: Bruker FTIR instruments Python Libraries: - brukeropusreader: OPUS format parser - specio: OPUS support EDA Approach: - Multiple block types (AB, ScSm, etc.) - Sample and reference spectra - Instrument parameters - Optical path configuration - Beam splitter and detector info

.dpt - Data Point Table

Description: Simple XY data format Typical Data: Generic spectroscopic data Use Cases: Renishaw Raman, generic exports Python Libraries: - pandas: CSV-like reading - Text parsing EDA Approach: - X-axis type (wavelength, wavenumber, Raman shift) - Y-axis units (intensity, absorbance, etc.) - Data point spacing - Header information - Multi-column data handling

.wdf - Renishaw Raman

Description: Renishaw WiRE data format Typical Data: Raman spectra and maps Use Cases: Renishaw Raman microscopy Python Libraries: - renishawWiRE: WDF reader - Custom parsers for WDF format EDA Approach: - Spectral vs mapping data - Laser wavelength - Accumulation and exposure time - Spatial coordinates (mapping) - Z-scan data - Baseline and cosmic ray correction

.txt (Spectroscopy)

Description: Generic text export from instruments Typical Data: Wavelength/wavenumber and intensity Use Cases: Universal data exchange Python Libraries: - pandas: Text file reading - numpy: Simple array loading EDA Approach: - Delimiter and format detection - Header parsing - Units identification - Multiple spectrum handling - Metadata extraction from comments

UV-Visible Spectroscopy

.asd / .asc - ASD Binary/ASCII

Description: ASD FieldSpec spectroradiometer Typical Data: Hyperspectral UV-Vis-NIR data Use Cases: Remote sensing, reflectance spectroscopy Python Libraries: - spectral.io.asd: ASD format support - Custom parsers EDA Approach: - Wavelength range (UV to NIR) - Reference spectrum validation - Dark current correction - Integration time - GPS metadata (if present) - Reflectance vs radiance

.sp - Perkin Elmer

Description: Perkin Elmer UV/Vis format Typical Data: UV-Vis spectrophotometer data Use Cases: PE Lambda instruments Python Libraries: - Custom parsers - Conversion to standard formats EDA Approach: - Scan parameters - Baseline correction - Multi-wavelength scans - Time-based measurements - Sample/reference handling

.csv (Spectroscopy)

Description: CSV export from UV-Vis instruments Typical Data: Wavelength and absorbance/transmittance Use Cases: Universal format for UV-Vis data Python Libraries: - pandas: Native CSV support EDA Approach: - Lambda max identification - Beer's law compliance - Baseline offset - Path length correction - Concentration calculations

X-ray and Diffraction

.cif - Crystallographic Information File

Description: Crystal structure and diffraction data Typical Data: Unit cell, atomic positions, structure factors Use Cases: Crystallography, materials science Python Libraries: - gemmi: gemmi.cif.read_file('file.cif') - PyCifRW: CIF reading/writing - pymatgen: Materials structure analysis EDA Approach: - Crystal system and space group - Unit cell parameters - Atomic positions and occupancy - Thermal parameters - R-factors and refinement quality - Completeness and redundancy - Structure validation

.hkl - Reflection Data

Description: Miller indices and intensities Typical Data: Integrated diffraction intensities Use Cases: Crystallographic refinement Python Libraries: - Custom parsers (format dependent) - Crystallography packages (CCP4, etc.) EDA Approach: - Resolution range - Completeness by shell - I/sigma distribution - Systematic absences - Twinning detection - Wilson plot

.mtz - MTZ Format (CCP4)

Description: Binary crystallographic data Typical Data: Reflections, phases, structure factors Use Cases: Macromolecular crystallography Python Libraries: - gemmi: MTZ support - cctbx: Comprehensive crystallography EDA Approach: - Column types and data - Resolution limits - R-factors (Rwork, Rfree) - Phase probability distribution - Map coefficients - Batch information

.xy / .xye - Powder Diffraction

Description: 2-theta vs intensity data Typical Data: Powder X-ray diffraction patterns Use Cases: Phase identification, Rietveld refinement Python Libraries: - pandas: Simple XY reading - pymatgen: XRD pattern analysis EDA Approach: - 2-theta range - Peak positions and intensities - Background modeling - Peak width analysis (strain/size) - Phase identification via matching - Preferred orientation effects

.raw (XRD)

Description: Vendor-specific XRD raw data Typical Data: XRD patterns with metadata Use Cases: Bruker, PANalytical, Rigaku instruments Python Libraries: - Vendor-specific parsers - Conversion tools EDA Approach: - Scan parameters (step size, time) - Sample alignment - Incident beam setup - Detector configuration - Background scan validation

.gsa / .gsas - GSAS Format

Description: General Structure Analysis System Typical Data: Powder diffraction for Rietveld Use Cases: Rietveld refinement Python Libraries: - GSAS-II Python interface - Custom parsers EDA Approach: - Histogram data - Instrument parameters - Phase information - Refinement constraints - Profile function parameters

Electron Spectroscopy

.vms - VG Scienta

Description: VG Scienta spectrometer format Typical Data: XPS, UPS, ARPES spectra Use Cases: Photoelectron spectroscopy Python Libraries: - Custom parsers for VMS - specio: Multi-format support EDA Approach: - Binding energy calibration - Pass energy and resolution - Photoelectron line identification - Satellite peak analysis - Background subtraction quality - Fermi edge position

.spe - WinSpec/SPE Format

Description: Princeton Instruments/Roper Scientific Typical Data: CCD spectra, Raman, PL Use Cases: Spectroscopy with CCD detectors Python Libraries: - spe2py: SPE file reader - spe_loader: Alternative parser EDA Approach: - CCD frame analysis - Wavelength calibration - Dark frame subtraction - Cosmic ray identification - Readout noise - Accumulation statistics

.pxt - Princeton PTI

Description: Photon Technology International Typical Data: Fluorescence, phosphorescence spectra Use Cases: Fluorescence spectroscopy Python Libraries: - Custom parsers - Text-based format variants EDA Approach: - Excitation and emission spectra - Quantum yield calculations - Time-resolved measurements - Temperature-dependent data - Correction factors applied

.dat (Spectroscopy Generic)

Description: Generic binary or text spectroscopy data Typical Data: Various spectroscopic measurements Use Cases: Many instruments use .dat extension Python Libraries: - Format-specific identification needed - numpy, pandas for known formats EDA Approach: - Format detection (binary vs text) - Header identification - Data structure inference - Units and axis labels - Instrument signature detection

Chromatography

.chrom - Chromatogram Data

Description: Generic chromatography format Typical Data: Retention time vs signal Use Cases: HPLC, GC, LC-MS Python Libraries: - Vendor-specific parsers - pandas for text exports EDA Approach: - Retention time range - Peak detection and integration - Baseline drift - Resolution between peaks - Signal-to-noise ratio - Tailing factor

.ch - ChemStation

Description: Agilent ChemStation format Typical Data: Chromatograms and method parameters Use Cases: Agilent HPLC and GC systems Python Libraries: - agilent-chemstation: Community tools - Binary format parsers EDA Approach: - Method validation - Integration parameters - Calibration curve - Sample sequence information - Instrument status

.arw - Empower (Waters)

Description: Waters Empower format Typical Data: UPLC/HPLC chromatograms Use Cases: Waters instrument data Python Libraries: - Vendor tools (limited Python access) - Database extraction tools EDA Approach: - Audit trail information - Processing methods - Compound identification - Quantitation results - System suitability tests

.lcd - Shimadzu LabSolutions

Description: Shimadzu chromatography format Typical Data: GC/HPLC data Use Cases: Shimadzu instruments Python Libraries: - Vendor-specific parsers EDA Approach: - Method parameters - Peak purity analysis - Spectral data (if PDA) - Quantitative results

Other Analytical Techniques

.dta - DSC/TGA Data

Description: Thermal analysis data (TA Instruments) Typical Data: Temperature vs heat flow or mass Use Cases: Differential scanning calorimetry, thermogravimetry Python Libraries: - Custom parsers for TA formats - pandas for exported data EDA Approach: - Transition temperature identification - Enthalpy calculations - Mass loss steps - Heating rate effects - Baseline determination - Purity assessment

.run - ICP-MS/ICP-OES

Description: Elemental analysis data Typical Data: Element concentrations or counts Use Cases: Inductively coupled plasma MS/OES Python Libraries: - Vendor-specific tools - Custom parsers EDA Approach: - Element detection and quantitation - Internal standard performance - Spike recovery - Dilution factor corrections - Isotope ratios - LOD/LOQ calculations

.exp - Electrochemistry Data

Description: Electrochemical experiment data Typical Data: Potential vs current or charge Use Cases: Cyclic voltammetry, chronoamperometry Python Libraries: - Custom parsers per instrument (CHI, Gamry, etc.) - galvani: Biologic EC-Lab files EDA Approach: - Redox peak identification - Peak potential and current - Scan rate effects - Electron transfer kinetics - Background subtraction - Capacitance calculations

← Back to exploratory-data-analysis