A comparison of cloud detection methods on Sentinel-2 data

The aim of this project is to give students an opportunity to experiment with cloud detection in Earth Observation data. Students should deal with issues related to installing the different packages, running the classification models, and evaluation classification accuracy.

This project’s result should be a report based on experiments related to the existing classification models applicable for cloud detection on Sentinel-2 data. The report should be a manual describing how to install, use, train classifiers capable of detecting clouds in Sentinel-2 data. It does not need to include every detail, but there should be, at least, references to sources that provide the information. The goal of the project is not to train the classifiers from scratch. It is acceptable and, from some perspectives, welcome to use pre-trained models. However, training the models from scratch is not viewed negatively. The final report should give information on how to train any of the examined classifiers. The report should compare the classification accuracy of the methods on a dataset created by the students by downloading the data from available Sentinel-2 archives. The report should mainly focus on accuracy in cloud detection, but accuracy in the classification of other classes such as Land, Water, Snow is welcome.

The following articles might serve as an inspiration for this work:

Improving Cloud Detection with Machine Learning by Anze Zupanc
There are also other articles related to the topic at the Sentinel-hub blog. Please review them.
Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images by André Hollstein and others.
This article also discusses a dataset of labelled Earth Observation data. This could be useful when creating a dataset for this project.

The input data should consist of a dataset of labelled and unlabeled events earth observations. If there is an objective reason why students could not procure labelled data, the reason needs to be explained in the report.

Some possible steps of the work

Install and compare cloud detection/classification methods. The examined libraries/software packages should include at least: F-mask, Sen2cor, Maja, S2cloudless. Document the installation process, notable features.
Create a toy dataset for comparison of the classifiers. The dataset should contain a variety of observations – summer, winter, various climates, geographic locations. The creation of the dataset should be well documented and supported by scripts automatizing the download process.
Experiment with cloud detection methods and document as much as possible.
Apply the classification models to all data in the toy dataset and evaluate accuracy.
Prepare a presentation of your results of the review and experiments.

Computational resources

A remote computer for long-running code can be provided to students after discussion.

Semester 2020/2021

This project is assigned to Viktor Vince.

Gitlab group for the project: https://git.kpi.fei.tuke.sk/groups/svd/comparison-of-cloud-detection-methods

Resources

Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666. https://doi.org/10.3390/rs8080666
- https://github.com/hollstein/cB4S2
F-mask
- https://github.com/ubarsc/python-fmask
Sen2cor
- Sen2Cor – STEP
Maja
- https://github.com/CNES/Start-MAJA
S2 cloudless
https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless