Training
Before picking particles, EPicker always needs a model from a previous training process. We had provided several models which can be used directely for picking or as the basis of further training.
Training a model need some micrographs (typiclal, ~10, the more the better) with annotations of the positive particles. The annotations can be sparse.
EPicker provides three training modes, joint training, fine-tuning, and continual training. Both the joint and continual training can start from the scratch or a pre-trained model. The fine-tuning must be based on a pre-trained model.
Training command
The training script is epicker_train.sh in Epicker/Epicker/bin. Run the command with parameters
$ epicker_train.sh {Parameters}
Run the program without following any parameters to get the help infomation.
$ epicker_train.sh
Essential parameters(training)
Set the path to dataset, annotations and output, choose data format and working mode using the following essential parameters.
--data Micrograph list(.thi) or a folder that contains micrographs used for training.
--label Folder contains the coordinate files.
--label_type (default=thi) Format of your label files, thi/star/box/coord.
--exp_id Output folder, the output model file is model_last.pth in this folder.
--mode (default=particle) Training mode: particle, vesicle or fiber.
Optional parameters(training)
These parameters allow to adjust the training procedure. If not spcified, the default setting will be used.
- --load_model The pre-trained model file.
If training a model from scratch, just ignore this parameter. For fine-tuning and continual training, use this option to import the old model.
- --lr (default=1e-4) Learning rate, the step size of gradient descent.
Empirically, you don't need to change it.
- --batch_size (default=4) Batchsize, number of images in a mini-batch.
Batchsize should not exceed 4*Number_Of_GPUs. For example, if you're using one Tesla P100, set 4. For 4 GPUs, set 16.
- --num_epoch (default=140) Total epochs for training.
The number of epoch is not the larger the better. Too many epoch may lead to over refinement. In our tests, 140 is a good choice.
--train_pct (default=80) Percentage of data used for training.
--test_pct (default=20) Percentage of data used for testing.
EPicker has a testing module based on COCOAPI, with which you can quickly compute your precision, recall and draw a PR-curve.
- --gpus (default=0) Specify which gpus you would like to use. GPU device IDs should be divided with comma like --gpus 0,1,2,3.
Training on multiple gpus will be faster than training on only one. However, training on multiple gpus will cause a slight decay in performance. Yet we haven't come up with a solution to it. So we recommend to use just one gpu for the current version.
- --sparse_anno If the annotations are sparse, add --sparse_anno to parameters.
Sparse annotations means many positive samples are missed in your coordinate files. Training on such datasets is also known as "positive unlabeled learning". Addressing this option will activate a positive unlabeled method in EPicker.
Continual training
The continual training enables incrementally adding new knowledge to the old model. Constrained by the exemplar dataset and extra terms of loss function, the new model performs well both on the new and the old datasets. After training on a new dataset, EPicker will extract a small set of particles from the new dataset and update the old exemplar dataset. Following parameters are essential for continual training.
--load_model The pre-trained model file.
--sampling_size (default=200) Number of sample particles selected from the training dataset to construct an exemplar.
--load_exemplar Path to the old exemplar dataset.
--output_exemplar Path to output the new exemplar dataset.
After each continual training process, EPicker outputs a new exemplar dataset updated on the current training dataset.
- --continual If you are training in a continual manner, add --continual to the parameters.
Joint training
The joint training takes the similar parameters as the continual training and excludes --load_model, --load_exemplar and --continual options in the parameter. The options --sampling_size and --output_exemplar are optional, and if included, an exemplar dataset will be output, which can be used for futher continual training.
Fine-tuning
The fine-tuning is performed on an old model specified by --load_model. Other optional opitons used for the continual training should be excluded.