================================================================================
SUPPORTING DOCUMENTATION
================================================================================

Working Title
-------------

Volatile organic compound fluxes and source apportionment factor fluxes measured 
by PTR-MS, Delhi, India, November 2018


Summary
-------

This dataset contains quality-controlled eddy covariance flux measurements of
470 volatile organic compound (VOC) species and source apportionment results
from Positive Matrix Factorization (PMF) analysis. The data were collected
during a post-monsoon air quality campaign in Old Delhi, India (5-23 November
2018) as part of the NERC DelhiFlux project under the Air Pollution and Human
Health in India (APHH-India) programme. The measurements provide direct
emission flux data and source attribution for urban VOC emissions in South
Asia, supporting model evaluation and air quality policy development.


File format
-----------

All files are in CSV (comma-separated values) format, UTF-8 encoding.


File names and/or naming convention
------------------------------------

The dataset comprises four CSV files:

1. VOC_species_fluxes_QAQCd.csv
2. VOC_species_flux_LODs_QAQCd.csv
3. PMF_VOC_Factor_Profiles_9f.csv
4. PMF_VOC_Factor_Fluxes_9f.csv

Files 1 and 2 contain species-resolved measurements; files 3 and 4 contain
source-resolved results from PMF analysis.


Nature and units of recorded values
------------------------------------

File 1: VOC_species_fluxes_QAQCd.csv (640 rows × 471 columns)

  Variable       Description                            Units/Format
  --------------------------------------------------------------------------
  timestamp      Timestamp marking end of 30-minute     DD/MM/YYYY HH:MM
                 averaging period                       (IST, UTC+5:30)
                 Valid range: 05/11/2018 13:59 to 23/11/2018 23:30

  25.008142 to   Turbulent vertical flux of 470 VOC     nmol m⁻² s⁻¹
  166.09103      species identified by their nominal    Valid range: -10 to
  (470 columns)  mass-to-charge ratio (m/z)             +500 (species-
                 Net ecosystem-atmosphere exchange      dependent)

                 Sign convention: Positive = emission (upward flux)
                                  Negative = deposition (downward flux)

                 Missing values: Blank cells indicate flux below detection
                 limit or failed quality control (not zero flux)

                 Column headers: Nominal m/z values (e.g., 79.055038 =
                 benzene-H3O+, 137.13319 = monoterpenes-H3O+). Some species
                 can be chemically identified, others remain as m/z only.


File 2: VOC_species_flux_LODs_QAQCd.csv (640 rows × 471 columns)

  Variable       Description                            Units/Format
  --------------------------------------------------------------------------
  timestamp      Timestamp (same as File 1)             DD/MM/YYYY HH:MM
                 Valid range: 05/11/2018 13:59 to 23/11/2018 23:30

  25.008142 to   Half-hourly limit of detection         nmol m⁻² s⁻¹
  166.09103      for each VOC flux measurement          Valid range: ≥0
  (470 columns)  Minimum detectable flux above noise

                 Calculated using LOD averaging method (Langford et al.,
                 2015). Flux values in File 1 below corresponding LOD are
                 set to blank (missing). Column headers match File 1.


File 3: PMF_VOC_Factor_Profiles_9f.csv (1231 rows × 12 columns)

  Variable       Description                            Units/Format
  --------------------------------------------------------------------------
  HRFamily       Chemical family of mass peak           Text
                 (e.g., CH, CHO, CHN)                   May be "other"

  Species_label  Chemical formula or identifier         Text
                 (e.g., C2H2, C6H6)                     May be blank

  m/z            Mass-to-charge ratio                   Decimal number
                 Valid range: 25-166                    1230 mass peaks

  SFVOC          Solid Fuel VOC factor profile          Arbitrary units
  GEN-VOC        General VOC factor profile             0 to ~0.05
  IND-VOC        Industrial VOC factor profile          Normalized per
  TRAF1          Traffic factor 1 profile               factor
  TRAF2          Traffic factor 2 profile
  PVOC           Primary VOC factor profile
  EVOC           Evaporative VOC factor profile
  OVOC1          Oxygenated VOC factor 1 profile
  OVOC2          Oxygenated VOC factor 2 profile

                 Each factor column shows the contribution (loading) of each
                 mass peak to that emission source profile. Higher values
                 indicate the mass peak is characteristic of that source.


File 4: PMF_VOC_Factor_Fluxes_9f.csv (607 rows × 10 columns)

  Variable       Description                            Units/Format
  --------------------------------------------------------------------------
  Timestamp      Timestamp marking end of 30-minute     DD/MM/YYYY HH:MM
                 averaging period                       (IST, UTC+5:30)
                 Valid range: 05/11/2018 20:00 to 23/11/2018 23:30
                 Note: 607 periods (subset of File 1, 33 fewer due to
                 stricter PMF data completeness requirements)

  SFVOC          Solid Fuel VOC flux                    nmol m⁻² s⁻¹
  GEN-VOC        General VOC flux                       Valid range: -2 to
  IND-VOC        Industrial VOC flux                    +50 (factor-
  TRAF1          Traffic flux (factor 1)                dependent)
  TRAF2          Traffic flux (factor 2)
  PVOC           Primary VOC flux
  EVOC           Evaporative VOC flux
  OVOC1          Oxygenated VOC flux (factor 1)
  OVOC2          Oxygenated VOC flux (factor 2)

                 Time series of flux contributions from each PMF emission
                 source factor. Positive values indicate net emission,
                 negative values indicate net deposition. Factor names
                 reflect preliminary source interpretations based on profile
                 composition and temporal patterns.


Spatial coverage
----------------

Measurements were made at a single fixed location:

Site: Indira Gandhi Delhi Technical University for Women (IGDTUW) campus,
      Old Delhi, India
Coordinates: 28°39'51.8"N, 77°13'55.2"E (WGS84)
Measurement height: 25m on tower mounted on building rooftop (~35m above
                    ground level)
Flux footprint: Represents an area of mixed residential, commercial, and light
                industrial land use. The 80% flux footprint extent ranges from
                ~200m (unstable daytime conditions) to >1000m (stable nighttime
                conditions), calculated using the Flux Footprint Prediction
                (FFP) method (Kljun et al., 2015).
Elevation: Approximately 215m above sea level


Temporal coverage and resolution
---------------------------------

Time period: 5 November 2018 to 23 November 2018 (post-monsoon season)
Measurement frequency: Continuous measurements at 5 Hz (VOCs) and 20 Hz (wind)
Flux averaging period: 30 minutes (eddy covariance standard)
Number of flux periods: 640 half-hourly periods in VOC flux files; 607 periods
                        in PMF flux file
Data coverage: Variable by species (40-70% for abundant species, <10% for rare
               species) due to limit of detection filtering and quality control


Collection/Generation/Transformation methods
---------------------------------------------

Measurement techniques:

VOC concentrations were measured using a Proton Transfer Reaction Quadrupole
interface Time-of-Flight Mass Spectrometer (PTR-QiTOF-MS, Ionicon Analytik,
Austria) operated in H3O+ mode. Air was sampled through a 25m long, 1/2" OD
PFA Teflon inlet heated to 60°C at ~10 L/min flow rate. The PTR-MS measured
mass spectra from m/z 0-500 with 1-second time resolution, binned to 5 Hz for
flux calculations. The drift tube was operated at 2.30 mbar, 60°C, 600V
(E/N = 118.8 Td).

Three-dimensional wind velocities and sonic temperature were measured at 20 Hz
using a Gill R3 ultrasonic anemometer (Gill Instruments, UK) mounted adjacent
to the PTR-MS inlet.

Supporting meteorological data (temperature, pressure, humidity, rainfall,
wind) were recorded using a Vaisala WXT530 weather station.

Standard Operating Procedures and references:

Eddy covariance fluxes were calculated using bespoke, tailored software
developed for this dataset, following standard micrometeorological protocols. Processing
included: spike removal (Mauder et al., 2013), block-averaging detrending,
double coordinate rotation, time lag optimization by covariance maximization,
and spectral corrections for high- and low-frequency flux losses.

PTR-MS operation followed established protocols (Yuan et al., 2017). The
instrument was calibrated using a certified gas standard (Apel-Riemer
Environmental Inc.) every 5-7 days. Zero air measurements (1 minute every
10 minutes) monitored instrumental baselines.

Source apportionment used the Virtual Eddy Accumulation PMF (VEA-PMF)
approach. High-resolution mass spectra were conditionally sampled based on
vertical wind velocity to create updraft and downdraft spectra. The PMF2
algorithm (Paatero, 1997) was applied to the difference spectra (607 periods ×
1230 mass peaks) to identify 9 emission source factors. Bootstrap analysis
(100 runs) assessed solution stability.

Data processing steps:

1. Raw 5 Hz PTR-MS and 20 Hz sonic anemometer data were recorded digitally and
   stored on local computers
2. Eddy covariance fluxes calculated post-campaign using bespoke, tailored
   flux-processing software
3. Quality control filtering applied (see Quality Control section)
4. Limit of detection calculated using LOD averaging method (Langford et al.,
   2015)
5. Flux values below LOD set to missing (blank cells)
6. PMF analysis performed on quality-controlled flux spectra
7. Results exported to CSV format for repository deposit

Date of analysis: Field measurements October-November 2018; flux calculations
and quality control November 2018-January 2019; PMF analysis January-February
2019.


Experimental design/sampling regime
------------------------------------

This was an observational field campaign measuring ambient VOC fluxes under
natural urban conditions. No experimental treatments were applied.

Flux measurements were made continuously at high frequency (5 Hz VOC, 20 Hz
wind), with fluxes calculated over consecutive 30-minute averaging periods
following standard eddy covariance protocols. Each 30-minute flux represents
9000 individual measurements (5 Hz × 1800 seconds).

The campaign duration (9 October - 23 November 2018) captured temporal
variability across multiple diurnal cycles and synoptic weather patterns during
the post-monsoon season. Quality-controlled data are available for 640
half-hourly periods (5-23 November 2018).

Replication: Continuous measurements with high-frequency sampling provide
statistical replication within each 30-minute period and across the campaign
duration.

Missing samples: Gaps in the time series occurred due to instrument
maintenance, calibration periods, power outages, and periods failing quality
control criteria (insufficient turbulence, instrument malfunction, values below
detection limit).


Fieldwork and/or laboratory instrumentation
--------------------------------------------

PTR-QiTOF-MS (Proton Transfer Reaction Quadrupole interface Time-of-Flight
Mass Spectrometer):
- Manufacturer: Ionicon Analytik, Innsbruck, Austria
- Model: PTR-QiTOF 8000
- Operating Mode: H3O+ reagent ion chemistry
- Mass Resolution: ~4000 m/Δm
- Sensitivity: ~3000 cps/ppbv (typical for protonated acetone)
- Time Resolution: 1 second spectra, binned to 5 Hz for flux calculations
- Mass Range: m/z 0-500
- Drift Tube Conditions: 2.30 mbar, 60°C, 600V (E/N = 118.8 Td)

Gill R3 Ultrasonic Anemometer:
- Manufacturer: Gill Instruments, Lymington, UK
- Model: R3-50
- Measurement: 3D wind velocity (u, v, w) and sonic temperature
- Sampling Rate: 20 Hz
- Wind Speed Range: 0-45 m/s
- Wind Speed Resolution: 0.01 m/s
- Wind Speed Accuracy: <1% RMS at 12 m/s

Vaisala WXT530 Weather Station:
- Manufacturer: Vaisala, Finland
- Model: WXT530
- Measurements: Wind speed/direction, air temperature, barometric pressure,
  relative humidity, rainfall
- Sampling Rate: 1 minute averages

Inlet System:
- Material: 1/2" OD PFA Teflon tubing
- Length: 25 meters
- Heating: Heated to 60°C to prevent condensation
- Flow Rate: ~10 L/min total, ~200 mL/min to PTR-MS


Calibration steps and values
-----------------------------

PTR-MS Calibration:

The PTR-QiTOF-MS was calibrated using a certified gas standard (Apel-Riemer
Environmental Inc.) containing 15 VOC species at known mixing ratios in
nitrogen. The standard was dynamically diluted to produce a range of mixing
ratios (1-50 ppbv). Calibrations were performed every 5-7 days during the
campaign.

VOCs in calibration standard: methanol, acetonitrile, acetaldehyde, acetone,
isoprene, methyl vinyl ketone (MVK), methacrolein (MACR), methyl ethyl ketone
(MEK), benzene, toluene, C8-aromatics (xylenes, ethylbenzene), C9-aromatics
(trimethylbenzenes), and alpha-pinene.

Calibration factors (normalized counts per ppbv, ncps/ppbv) were calculated
for each species. For VOCs not in the standard, calibration factors were
calculated theoretically using known ion-molecule rate constants and instrument
transmission functions.

Primary ion monitoring: H3O+ (m/z 21.022) count rate monitored continuously
(typical: 1-3 × 10⁶ cps). Sensitivity normalization performed by dividing raw
signals by primary ion count to account for variations in ionization
efficiency.

Zero air measurements: VOC-free air measured for 1 minute every 10 minutes by
passing ambient air through a platinum catalyst heated to 350°C. These provided
instrumental background signals that were subtracted from ambient measurements.

Mass calibration: Performed daily using known peaks in ambient spectra to
ensure accurate mass-to-charge ratio assignment.

Sonic Anemometer Calibration:

The Gill R3 sonic anemometer was factory-calibrated by the manufacturer. No
field calibration was performed, as sonic anemometers measure wind speed based
on time-of-flight of acoustic pulses (absolute measurement). The instrument was
leveled using a spirit level to within ±1° of horizontal. Double coordinate
rotation was applied during flux processing to align the wind coordinate system
with mean streamlines.

Meteorological Sensor Calibration:

The Vaisala WXT530 was factory-calibrated. Pressure readings were compared with
local airport observations (agreement within 2 hPa). Temperature and humidity
readings were cross-checked with the sonic anemometer and found to be
consistent (within 0.5°C).


Quality control
---------------

Multiple levels of quality control were applied to ensure data integrity:

Instrumental Quality Control:

1. PTR-MS performance monitoring:
   - Primary ion (H3O+) signal monitored continuously (typical: 1-3 × 10⁶ cps)
   - Mass resolution checked daily using known peaks (maintained at ~4000 m/Δm)
   - Zero air measurements every 10 minutes to monitor baselines and drift
   - Calibrations every 5-7 days to verify sensitivity stability
   - Detection limits: sub-pptv for most VOCs

2. Sonic anemometer quality checks:
   - Sonic temperature compared with Vaisala thermometer (agreement within
     0.5°C)
   - Wind statistics (mean, variance, skewness) examined for each period
   - Diagnostic error codes monitored

Flux Quality Control:

Quality control followed standard eddy covariance protocols with additional
filtering specific to low signal-to-noise ratio trace gas fluxes.

1. Primary hard flags (Level 0): Periods rejected if they had insufficient data
   (>10% missing), high angle of attack (>30°), discontinuities in time series,
   or instrument diagnostic warnings.

2. Spike detection: Applied to 5 Hz raw data using Mauder et al. (2013) method.
   Spikes defined as values exceeding 3.5-7 standard deviations from median.
   Periods with >1% spikes were flagged.

3. Stationarity test: Each 30-minute flux compared to mean of six 5-minute
   sub-period fluxes. Periods with >30% difference were flagged as
   non-stationary, indicating violation of eddy covariance assumptions.

4. Integral turbulence characteristics: Normalized turbulence intensity
   (σw/u*) compared to expected values (0.5-2.0 range based on similarity
   theory). Periods outside this range were flagged.

5. Friction velocity threshold: Periods with u* <0.1 m/s were excluded due to
   insufficient turbulent mixing.

6. Physical range checks: Species-specific flux ranges established by examining
   data distributions. Values beyond ±10 standard deviations from median were
   removed.

7. Limit of detection (LOD) filtering: Applied using the LOD averaging method
   (Langford et al., 2015). Flux values with signal-to-noise ratio <3σ were
   set to missing. LOD values calculated for each species and each 30-minute
   period. This is the most stringent filter, removing ~60-90% of data for many
   species.

Quality flags (Mauder and Foken, 2006):
- Flag 0 (high quality): Passed all tests
- Flag 1 (moderate quality): Failed stationarity or spike test  
- Flag 2 (low quality): Failed turbulence characteristics test

Only fluxes with flags 0-1 AND passing the LOD threshold are included in the
final dataset.

PMF Quality Control:

1. Input data quality: Only flux periods passing the above QC were used. Mass
   peaks with <25% data coverage were excluded.

2. Solution validation: Q/Q_expected ratio examined (good solutions: 0.5-2).
   Residuals analyzed for patterns. Bootstrap analysis (100 runs) assessed
   factor stability. Physical interpretation of factors required for
   acceptance.

3. Factor interpretation: Profiles compared to known emission source
   signatures. Temporal patterns cross-checked with ancillary data (traffic
   counts, cooking times, etc.).

Data Coverage:

After quality control, data coverage varies by species:
- High-concentration species (e.g., aromatics, acetone): 40-70% coverage
- Moderate-concentration species: 20-40% coverage
- Low-concentration species: <10% coverage

Overall: 640 time periods out of potential ~1100 (58%) have at least some valid
flux data.

Factors Affecting Data Quality:

1. Meteorological conditions: Low wind speeds (<1 m/s) and stable atmospheric
   conditions (nighttime) reduced data coverage due to insufficient turbulence.
   Rain events caused instrument downtime.

2. Instrumental factors: PTR-MS sensitivity variations addressed by
   normalization and calibration. Inlet wall effects minimized by heating but
   may affect sticky compounds. Mass spectral interferences mean some peaks may
   contain multiple isomers.

3. Footprint variability: Wind direction changes affect source areas. Unstable
   vs. stable conditions change footprint extent. No directional filtering
   applied; data represent all wind directions.

Missing data: Blank cells indicate data below detection limit or failed quality
control. The proportion of missing data varies greatly by species. Blank cells
do not represent zero flux; they represent unknown or unquantifiable values.


Miscellaneous
-------------

Data Usage Notes:

1. Flux sign convention: Positive flux values indicate net emission (upward
   flux). Negative values indicate net deposition (downward flux). Most VOCs
   show predominantly positive fluxes, but some oxygenated species may show
   occasional deposition.

2. Species identification: VOC species are identified by chemical formula where
   possible (e.g., C6H6 = benzene, C10H16 = sum of monoterpenes). Some column
   headers show nominal m/z values where exact chemical identity is uncertain
   or multiple isomers may contribute.

3. Monoterpene caveat: The C10H16 measurement represents the sum of all
   monoterpene isomers (alpha-pinene, beta-pinene, limonene, etc.), which
   cannot be distinguished by PTR-MS. Other terpenoid compounds may have
   isobaric interferences.

4. Isobaric interferences: PTR-MS cannot always distinguish isomers or isobaric
   compounds with the same nominal mass. Mass spectral resolution (~4000 m/Δm)
   separates many but not all interferences.

5. Time zone: All timestamps are in Indian Standard Time (IST = UTC+5:30). No
   daylight saving time is used in India.

6. Footprint considerations: Flux measurements integrate emissions over an
   upwind source area (footprint) that varies with atmospheric stability, wind
   speed, and measurement height. The 80% footprint extent ranges from ~200m
   (unstable conditions) to >1000m (stable conditions). No footprint weighting
   or filtering has been applied; all wind directions are included.

7. Diurnal patterns: VOC fluxes show strong diurnal cycles driven by source
   activity patterns (traffic, cooking) and atmospheric boundary layer
   dynamics. Daytime periods generally have better data coverage due to
   favorable turbulent conditions.

8. PMF factor interpretation: The 9 PMF factors represent statistically
   independent emission sources identified based on mass spectral profiles and
   temporal patterns. Preliminary identifications include traffic-related
   emissions (multiple factors), biomass/biofuel burning, cooking emissions,
   evaporative fuel emissions, solvent/industrial emissions, and biogenic
   emissions. Refer to the associated peer-reviewed publication for detailed
   factor interpretation and validation.

9. PMF temporal coverage: PMF factor fluxes (607 periods) cover fewer time
   periods than the main flux dataset (640 periods) because PMF requires
   sufficient data across all included mass peaks, necessitating stricter data
   completeness criteria.

Data Limitations:

10. Spatial representation: Data represent a single measurement point. Fluxes
    integrate over an upwind area (~0.1-1 km²) but do not provide spatially
    resolved emission maps.

11. Flux uncertainty: Random uncertainty in individual flux measurements is
    estimated at 20-40% based on LOD calculations. Systematic uncertainties
    (calibration, spectral corrections) add ~20%. Total uncertainty varies by
    species and atmospheric conditions.

12. Detection limits: Low-concentration species have sparse data coverage due
    to LOD filtering. Absence of data does not prove absence of emissions;
    small fluxes may be below detection.

13. Campaign period: Data cover only the post-monsoon season (November 2018).
    Emissions may differ substantially in other seasons due to meteorology,
    source activities, and biogenic temperature dependence.

14. Urban heterogeneity: The measurement footprint includes mixed land use
    (residential, commercial, light industrial). Fluxes represent the
    integrated effect of multiple source types and cannot be attributed to
    specific buildings or activities without additional analysis.

Data Citation and Acknowledgments:

When using this dataset, please cite the dataset DOI and the associated
peer-reviewed publication. Proper acknowledgment of the funding agencies is
requested:

- UK Natural Environment Research Council (NERC) DelhiFlux project under the
  Newton Bhabha Fund programme, Air Pollution and Human Health in India
  (APHH-India): NE/P016502/1 and NE/P016472/1
- NERC E3 DTP studentship for James M. Cash: NE/L002558/1
- NERC National Capability award SUNRISE: NE/R000131/1
- Earth System Science Organization, Ministry of Earth Sciences, Government of
  India: MoES/16/19/2017-APHH (DelhiFlux)

Contact Information:

For questions about this dataset, please contact:

Dr. James M. Cash
UK Centre for Ecology & Hydrology
Bush Estate, Penicuik, EH26 0QB, United Kingdom
ORCID: [To be provided]

Alternative contact:
Professor Eiko Nemitz
UK Centre for Ecology & Hydrology


References
----------

Kljun, N., Calanca, P., Rotach, M.W., and Schmid, H.P. (2015). A simple
two-dimensional parameterisation for Flux Footprint Prediction (FFP).
Geoscientific Model Development, 8(11), 3695-3713.
DOI: 10.5194/gmd-8-3695-2015

Langford, B., Acton, W.J.F., Ammann, C., Valach, A., and Nemitz, E. (2015).
Eddy-covariance data with low signal-to-noise ratio: time-lag determination,
uncertainties and limit of detection. Atmospheric Measurement Techniques,
8(10), 4197-4213. DOI: 10.5194/amt-8-4197-2015

Mauder, M., Cuntz, M., Drüe, C., Graf, A., Rebmann, C., Schmid, H.P., Schmidt,
M., and Steinbrecher, R. (2013). A strategy for quality and uncertainty
assessment of long-term eddy-covariance measurements. Agricultural and Forest
Meteorology, 169, 122-135. DOI: 10.1016/j.agrformet.2012.09.006

Mauder, M., and Foken, T. (2006). Impact of post-field data processing on eddy
covariance flux estimates and energy balance closure. Meteorologische
Zeitschrift, 15(6), 597-609. DOI: 10.1127/0941-2948/2006/0167

Paatero, P. (1997). Least squares formulation of robust non-negative factor
analysis. Chemometrics and Intelligent Laboratory Systems, 37(1), 23-35.
DOI: 10.1016/S0169-7439(96)00044-5

Yuan, B., Koss, A.R., Warneke, C., Coggon, M., Sekimoto, K., and de Gouw, J.A.
(2017). Proton-Transfer-Reaction Mass Spectrometry: Applications in Atmospheric
Sciences. Chemical Reviews, 117(21), 13187-13229.
DOI: 10.1021/acs.chemrev.7b00325


================================================================================
Document Information
================================================================================

Document Version: 1.0
Date Prepared: 16 December 2024
Prepared by: James M. Cash, UK Centre for Ecology & Hydrology

This supporting documentation was prepared for submission to the NERC
Environmental Information Data Centre (EIDC) in accordance with EIDC data
submission guidelines.

================================================================================
END OF SUPPORTING DOCUMENTATION
================================================================================
