Predicting the distribution of organic carbon content in surface sediments of the eastern China seas using random forest algorithm
 
                
                 
                
                    
                                                            
                    - 
Abstract
    Clarifying the distribution characteristics and controlling factors of sedimentary organic carbon in China marginal seas is crucial for establishing an organic carbon cycle model for the East Asian marginal seas and its “source-to-sink” pattern. Currently, the distribution map of organic carbon in the Eastern China marginal seas is constructed mainly based on mathematical interpolation of existing data. However, this method is significantly limited by the location and quantity of sampling stations, and in addition, the mathematical interpolation mapping neglects the differences between the samples and environmental factors such as seawater physicochemical properties, seabed topography, and ocean currents, thus oversimplifying the complex geological issues. Machine learning methods can extract key information from high-dimensional and complex data and establish mapping relationships between geological property features and predictive variables. In this study, the commonly used Random Forest (RF) algorithm in machine learning was employed to predict the organic carbon content in the surface sediments of the Eastern China marginal seas by learning the mapping relationship among 405 marine sediment organic carbon data and 50 geological property features. Compared to the organic carbon distribution map generated by the Kriging interpolation calculations based on the same number of samples, the RF algorithm showed smaller errors of evaluation indicators, including mean absolute error, root mean square error, and maximum residual error. The ten-fold cross-validation R2 reached 0.60, indicating high fitting accuracy. Notably, for regions with low sampling density or missing data due to sampling difficulties, the RF algorithm demonstrated a superior predictive accuracy for surface sediment organic carbon content, reflecting its potential for more realistic predictions and extrapolation advantages. The RF model established in this study provided valuable insights for predicting other geochemical indicators of marine sediments in the future and holds significant practical implications for resource investigation and environmental protection in the Eastern China marginal seas.
 
- 
                          
-