references/microscopy_imaging_formats.md

Microscopy and Imaging File Formats Reference

This reference covers file formats used in microscopy, medical imaging, remote sensing, and scientific image analysis.

Microscopy-Specific Formats

.tif / .tiff - Tagged Image File Format

Description: Flexible image format supporting multiple pages and metadata Typical Data: Microscopy images, z-stacks, time series, multi-channel Use Cases: Fluorescence microscopy, confocal imaging, biological imaging Python Libraries: - tifffile: tifffile.imread('file.tif') - Microscopy TIFF support - PIL/Pillow: Image.open('file.tif') - Basic TIFF - scikit-image: io.imread('file.tif') - AICSImageIO: Multi-format microscopy reader EDA Approach: - Image dimensions and bit depth - Multi-page/z-stack analysis - Metadata extraction (OME-TIFF) - Channel analysis and intensity distributions - Temporal dynamics (time-lapse) - Pixel size and spatial calibration - Histogram analysis per channel - Dynamic range utilization

.nd2 - Nikon NIS-Elements

Description: Proprietary Nikon microscope format Typical Data: Multi-dimensional microscopy (XYZCT) Use Cases: Nikon microscope data, confocal, widefield Python Libraries: - nd2reader: ND2Reader('file.nd2') - pims: pims.ND2_Reader('file.nd2') - AICSImageIO: Universal reader EDA Approach: - Experiment metadata extraction - Channel configurations - Time-lapse frame analysis - Z-stack depth and spacing - XY stage positions - Laser settings and power - Pixel binning information - Acquisition timestamps

.lif - Leica Image Format

Description: Leica microscope proprietary format Typical Data: Multi-experiment, multi-dimensional images Use Cases: Leica confocal and widefield data Python Libraries: - readlif: readlif.LifFile('file.lif') - AICSImageIO: LIF support - python-bioformats: Via Bio-Formats EDA Approach: - Multiple experiment detection - Image series enumeration - Metadata per experiment - Channel and timepoint structure - Physical dimensions extraction - Objective and detector information - Scan settings analysis

.czi - Carl Zeiss Image

Description: Zeiss microscope format Typical Data: Multi-dimensional microscopy with rich metadata Use Cases: Zeiss confocal, lightsheet, widefield Python Libraries: - czifile: czifile.CziFile('file.czi') - AICSImageIO: CZI support - pylibCZIrw: Official Zeiss library EDA Approach: - Scene and position analysis - Mosaic tile structure - Channel wavelength information - Acquisition mode detection - Scaling and calibration - Instrument configuration - ROI definitions

.oib / .oif - Olympus Image Format

Description: Olympus microscope formats Typical Data: Confocal and multiphoton imaging Use Cases: Olympus FluoView data Python Libraries: - AICSImageIO: OIB/OIF support - python-bioformats: Via Bio-Formats EDA Approach: - Directory structure validation (OIF) - Metadata file parsing - Channel configuration - Scan parameters - Objective and filter information - PMT settings

.vsi - Olympus VSI

Description: Olympus slide scanner format Typical Data: Whole slide imaging, large mosaics Use Cases: Virtual microscopy, pathology Python Libraries: - openslide-python: openslide.OpenSlide('file.vsi') - AICSImageIO: VSI support EDA Approach: - Pyramid level analysis - Tile structure and overlap - Macro and label images - Magnification levels - Whole slide statistics - Region detection

.ims - Imaris Format

Description: Bitplane Imaris HDF5-based format Typical Data: Large 3D/4D microscopy datasets Use Cases: 3D rendering, time-lapse analysis Python Libraries: - h5py: Direct HDF5 access - imaris_ims_file_reader: Specialized reader EDA Approach: - Resolution level analysis - Time point structure - Channel organization - Dataset hierarchy - Thumbnail generation - Memory-mapped access strategies - Chunking optimization

.lsm - Zeiss LSM

Description: Legacy Zeiss confocal format Typical Data: Confocal laser scanning microscopy Use Cases: Older Zeiss confocal data Python Libraries: - tifffile: LSM support (TIFF-based) - python-bioformats: LSM reading EDA Approach: - Similar to TIFF with LSM-specific metadata - Scan speed and resolution - Laser lines and power - Detector gain and offset - LUT information

.stk - MetaMorph Stack

Description: MetaMorph image stack format Typical Data: Time-lapse or z-stack sequences Use Cases: MetaMorph software output Python Libraries: - tifffile: STK is TIFF-based - python-bioformats: STK support EDA Approach: - Stack dimensionality - Plane metadata - Timing information - Stage positions - UIC tags parsing

.dv - DeltaVision

Description: Applied Precision DeltaVision format Typical Data: Deconvolution microscopy Use Cases: DeltaVision microscope data Python Libraries: - mrc: Can read DV (MRC-related) - AICSImageIO: DV support EDA Approach: - Wave information (channels) - Extended header analysis - Lens and magnification - Deconvolution status - Time stamps per section

.mrc - Medical Research Council

Description: Electron microscopy format Typical Data: EM images, cryo-EM, tomography Use Cases: Structural biology, electron microscopy Python Libraries: - mrcfile: mrcfile.open('file.mrc') - EMAN2: EM-specific tools EDA Approach: - Volume dimensions - Voxel size and units - Origin and map statistics - Symmetry information - Extended header analysis - Density statistics - Header consistency validation

.dm3 / .dm4 - Gatan Digital Micrograph

Description: Gatan TEM/STEM format Typical Data: Transmission electron microscopy Use Cases: TEM imaging and analysis Python Libraries: - hyperspy: hs.load('file.dm3') - ncempy: ncempy.io.dm.dmReader('file.dm3') EDA Approach: - Microscope parameters - Energy dispersive spectroscopy data - Diffraction patterns - Calibration information - Tag structure analysis - Image series handling

.eer - Electron Event Representation

Description: Direct electron detector format Typical Data: Electron counting data from detectors Use Cases: Cryo-EM data collection Python Libraries: - mrcfile: Some EER support - Vendor-specific tools (Gatan, TFS) EDA Approach: - Event counting statistics - Frame rate and dose - Detector configuration - Motion correction assessment - Gain reference validation

.ser - TIA Series

Description: FEI/TFS TIA format Typical Data: EM image series Use Cases: FEI/Thermo Fisher EM data Python Libraries: - hyperspy: SER support - ncempy: TIA reader EDA Approach: - Series structure - Calibration data - Acquisition metadata - Time stamps - Multi-dimensional data organization

Medical and Biological Imaging

.dcm - DICOM

Description: Digital Imaging and Communications in Medicine Typical Data: Medical images with patient/study metadata Use Cases: Clinical imaging, radiology, CT, MRI, PET Python Libraries: - pydicom: pydicom.dcmread('file.dcm') - SimpleITK: sitk.ReadImage('file.dcm') - nibabel: Limited DICOM support EDA Approach: - Patient metadata extraction (anonymization check) - Modality-specific analysis - Series and study organization - Slice thickness and spacing - Window/level settings - Hounsfield units (CT) - Image orientation and position - Multi-frame analysis

.nii / .nii.gz - NIfTI

Description: Neuroimaging Informatics Technology Initiative Typical Data: Brain imaging, fMRI, structural MRI Use Cases: Neuroimaging research, brain analysis Python Libraries: - nibabel: nibabel.load('file.nii') - nilearn: Neuroimaging with ML - SimpleITK: NIfTI support EDA Approach: - Volume dimensions and voxel size - Affine transformation matrix - Time series analysis (fMRI) - Intensity distribution - Brain extraction quality - Registration assessment - Orientation validation - Header information consistency

.mnc - MINC Format

Description: Medical Image NetCDF Typical Data: Medical imaging (predecessor to NIfTI) Use Cases: Legacy neuroimaging data Python Libraries: - pyminc: MINC-specific tools - nibabel: MINC support EDA Approach: - Similar to NIfTI - NetCDF structure exploration - Dimension ordering - Metadata extraction

.nrrd - Nearly Raw Raster Data

Description: Medical imaging format with detached header Typical Data: Medical images, research imaging Use Cases: 3D Slicer, ITK-based applications Python Libraries: - pynrrd: nrrd.read('file.nrrd') - SimpleITK: NRRD support EDA Approach: - Header field analysis - Encoding format - Dimension and spacing - Orientation matrix - Compression assessment - Endianness handling

.mha / .mhd - MetaImage

Description: MetaImage format (ITK) Typical Data: Medical/scientific 3D images Use Cases: ITK/SimpleITK applications Python Libraries: - SimpleITK: Native MHA/MHD support - itk: Direct ITK integration EDA Approach: - Header-data file pairing (MHD) - Transform matrix - Element spacing - Compression format - Data type and dimensions

.hdr / .img - Analyze Format

Description: Legacy medical imaging format Typical Data: Brain imaging (pre-NIfTI) Use Cases: Old neuroimaging datasets Python Libraries: - nibabel: Analyze support - Conversion to NIfTI recommended EDA Approach: - Header-image pairing validation - Byte order issues - Conversion to modern formats - Metadata limitations

Scientific Image Formats

.png - Portable Network Graphics

Description: Lossless compressed image format Typical Data: 2D images, screenshots, processed data Use Cases: Publication figures, lossless storage Python Libraries: - PIL/Pillow: Image.open('file.png') - scikit-image: io.imread('file.png') - imageio: imageio.imread('file.png') EDA Approach: - Bit depth analysis (8-bit, 16-bit) - Color mode (grayscale, RGB, palette) - Metadata (PNG chunks) - Transparency handling - Compression efficiency - Histogram analysis

.jpg / .jpeg - Joint Photographic Experts Group

Description: Lossy compressed image format Typical Data: Natural images, photos Use Cases: Visualization, web graphics (not raw data) Python Libraries: - PIL/Pillow: Standard JPEG support - scikit-image: JPEG reading EDA Approach: - Compression artifacts detection - Quality factor estimation - Color space (RGB, grayscale) - EXIF metadata - Quantization table analysis - Note: Not suitable for quantitative analysis

.bmp - Bitmap Image

Description: Uncompressed raster image Typical Data: Simple images, screenshots Use Cases: Compatibility, simple storage Python Libraries: - PIL/Pillow: BMP support - scikit-image: BMP reading EDA Approach: - Color depth - Palette analysis (if indexed) - File size efficiency - Pixel format validation

.gif - Graphics Interchange Format

Description: Image format with animation support Typical Data: Animated images, simple graphics Use Cases: Animations, time-lapse visualization Python Libraries: - PIL/Pillow: GIF support - imageio: Better GIF animation support EDA Approach: - Frame count and timing - Palette limitations (256 colors) - Loop count - Disposal method - Transparency handling

.svg - Scalable Vector Graphics

Description: XML-based vector graphics Typical Data: Vector drawings, plots, diagrams Use Cases: Publication-quality figures, plots Python Libraries: - svgpathtools: Path manipulation - cairosvg: Rasterization - lxml: XML parsing EDA Approach: - Element structure analysis - Style information - Viewbox and dimensions - Path complexity - Text element extraction - Layer organization

.eps - Encapsulated PostScript

Description: Vector graphics format Typical Data: Publication figures Use Cases: Legacy publication graphics Python Libraries: - PIL/Pillow: Basic EPS rasterization - ghostscript via subprocess EDA Approach: - Bounding box information - Preview image validation - Font embedding - Conversion to modern formats

.pdf (Images)

Description: Portable Document Format with images Typical Data: Publication figures, multi-page documents Use Cases: Publication, data presentation Python Libraries: - PyMuPDF/fitz: fitz.open('file.pdf') - pdf2image: Rasterization - pdfplumber: Text and layout extraction EDA Approach: - Page count - Image extraction - Resolution and DPI - Embedded fonts and metadata - Compression methods - Image vs vector content

.fig - MATLAB Figure

Description: MATLAB figure file Typical Data: MATLAB plots and figures Use Cases: MATLAB data visualization Python Libraries: - Custom parsers (MAT file structure) - Conversion to other formats EDA Approach: - Figure structure - Data extraction from plots - Axes and label information - Plot type identification

.hdf5 (Imaging Specific)

Description: HDF5 for large imaging datasets Typical Data: High-content screening, large microscopy Use Cases: BigDataViewer, large-scale imaging Python Libraries: - h5py: Universal HDF5 access - Imaging-specific readers (BigDataViewer) EDA Approach: - Dataset hierarchy - Chunk and compression strategy - Multi-resolution pyramid - Metadata organization - Memory-mapped access - Parallel I/O performance

.zarr - Chunked Array Storage

Description: Cloud-optimized array storage Typical Data: Large imaging datasets, OME-ZARR Use Cases: Cloud microscopy, large-scale analysis Python Libraries: - zarr: zarr.open('file.zarr') - ome-zarr-py: OME-ZARR support EDA Approach: - Chunk size optimization - Compression codec analysis - Multi-scale representation - Array dimensions and dtype - Metadata structure (OME) - Cloud access patterns

.raw - Raw Image Data

Description: Unformatted binary pixel data Typical Data: Raw detector output Use Cases: Custom imaging systems Python Libraries: - numpy: np.fromfile() with dtype - imageio: Raw format plugins EDA Approach: - Dimensions determination (external info needed) - Byte order and data type - Header presence detection - Pixel value range - Noise characteristics

.bin - Binary Image Data

Description: Generic binary image format Typical Data: Raw or custom-formatted images Use Cases: Instrument-specific outputs Python Libraries: - numpy: Custom binary reading - struct: For structured binary data EDA Approach: - Format specification required - Header parsing (if present) - Data type inference - Dimension extraction - Validation with known parameters

Image Analysis Formats

.roi - ImageJ ROI

Description: ImageJ region of interest format Typical Data: Geometric ROIs, selections Use Cases: ImageJ/Fiji analysis workflows Python Libraries: - read-roi: read_roi.read_roi_file('file.roi') - roifile: ROI manipulation EDA Approach: - ROI type analysis (rectangle, polygon, etc.) - Coordinate extraction - ROI properties (area, perimeter) - Group analysis (ROI sets) - Z-position and time information

.zip (ROI sets)

Description: ZIP archive of ImageJ ROIs Typical Data: Multiple ROI files Use Cases: Batch ROI analysis Python Libraries: - read-roi: read_roi.read_roi_zip('file.zip') - Standard zipfile module EDA Approach: - ROI count in set - ROI type distribution - Spatial distribution - Overlapping ROI detection - Naming conventions

.ome.tif / .ome.tiff - OME-TIFF

Description: TIFF with OME-XML metadata Typical Data: Standardized microscopy with rich metadata Use Cases: Bio-Formats compatible storage Python Libraries: - tifffile: OME-TIFF support - AICSImageIO: OME reading - python-bioformats: Bio-Formats integration EDA Approach: - OME-XML validation - Physical dimensions extraction - Channel naming and wavelengths - Plane positions (Z, C, T) - Instrument metadata - Bio-Formats compatibility

.ome.zarr - OME-ZARR

Description: OME-NGFF specification on ZARR Typical Data: Next-generation file format for bioimaging Use Cases: Cloud-native imaging, large datasets Python Libraries: - ome-zarr-py: Official implementation - zarr: Underlying array storage EDA Approach: - Multiscale resolution levels - Metadata compliance with OME-NGFF spec - Coordinate transformations - Label and ROI handling - Cloud storage optimization - Chunk access patterns

.klb - Keller Lab Block

Description: Fast microscopy format for large data Typical Data: Lightsheet microscopy, time-lapse Use Cases: High-throughput imaging Python Libraries: - pyklb: KLB reading and writing EDA Approach: - Compression efficiency - Block structure - Multi-resolution support - Read performance benchmarking - Metadata extraction

.vsi - Whole Slide Imaging

Description: Virtual slide format (multiple vendors) Typical Data: Pathology slides, large mosaics Use Cases: Digital pathology Python Libraries: - openslide-python: Multi-format WSI - tiffslide: Pure Python alternative EDA Approach: - Pyramid level count - Downsampling factors - Associated images (macro, label) - Tile size and overlap - MPP (microns per pixel) - Background detection - Tissue segmentation

.ndpi - Hamamatsu NanoZoomer

Description: Hamamatsu slide scanner format Typical Data: Whole slide pathology images Use Cases: Digital pathology workflows Python Libraries: - openslide-python: NDPI support EDA Approach: - Multi-resolution pyramid - Lens and objective information - Scan area and magnification - Focal plane information - Tissue detection

.svs - Aperio ScanScope

Description: Aperio whole slide format Typical Data: Digital pathology slides Use Cases: Pathology image analysis Python Libraries: - openslide-python: SVS support EDA Approach: - Pyramid structure - MPP calibration - Label and macro images - Compression quality - Thumbnail generation

.scn - Leica SCN

Description: Leica slide scanner format Typical Data: Whole slide imaging Use Cases: Digital pathology Python Libraries: - openslide-python: SCN support EDA Approach: - Tile structure analysis - Collection organization - Metadata extraction - Magnification levels

← Back to exploratory-data-analysis