Data drops // Short research & technical findings

Deep Learning Morphology in Integrative Phylogenetics: Insights from genus Hastula (Terebridae)

Published on: August 2025

Abstract

Shell morphology has long been used to classify gastropods, yet it is prone to convergence and often poorly reflects evolutionary history. Molecular phylogenies have provided more reliable frameworks, but integrating morphological and molecular evidence remains a central challenge in systematics. Here we use the sand-dwelling auger snails of the genus Hastula (family Terebridae) as a case study to evaluate the phylogenetic signal of morphology extracted with convolutional neural networks (CNNs). A CNN model trained on 3,204 shell images representing 23 species achieved high performance in species-level identification (93% validation accuracy). Feature vectors from the network were used to compute pairwise cosine distances, construct similarity matrices, and infer morphological trees by hierarchical clustering and neighbor-joining with bootstrapping. These analyses revealed coherent morphological clusters, such as Hastula matheroniana + H. strigilata and H. albula + H. solida + H. hectica, while taxa with fewer images (H. salleana, H. rufopunctata) showed reduced performance and isolated placement.

Comparison with a multilocus phylogeny (12S, 16S, COI, 28S) of 15 Hastula species revealed both overlaps and conflicts. While some associations were recovered in both datasets, including H. matheroniana + H. penicillata, other relationships differed, notably H. lanceata + H. strigilata (strongly supported molecularly but absent in CNN-based trees). Quantitative metrics confirmed limited congruence: the Robinson–Foulds distance was 16/18 (normalized RF = 0.889), and the Mantel test indicated only a weak, non-significant correlation between molecular and morphological distances (r = 0.228, p = 0.225).

These results demonstrate that CNN-derived features capture biologically meaningful morphological signal but only partially align with evolutionary history. As in other gastropods, shell convergence likely explains much of the discordance. We argue that CNN-based morphology is best deployed within integrative, total-evidence frameworks that combine DNA, morphology, and ecology to resolve evolutionary relationships in Terebridae.

Introduction

The use of shell morphology has historically been the primary basis for classifying gastropods, including the sand-dwelling auger snails (family Terebridae). While shells provide abundant and readily accessible characters, they are also prone to convergence and homoplasy. Similar forms often evolve independently under shared ecological pressures, such as burrowing lifestyle or sediment type, leading to morphological groupings that do not always reflect evolutionary history [1, 2]. As a result, morphology-only classifications of gastropods frequently conflict with molecular evidence, and it is now widely accepted that integrative approaches combining DNA and morphology are essential for robust systematics [1, 2] .

Molecular phylogenetics has revolutionized the classification of Conoidea, the diverse superfamily that includes cone snails (Conidae), auger snails (Terebridae), and their relatives. Early shell- and radula-based systems often grouped unrelated taxa, while DNA evidence has consistently revealed that many traditional genera were polyphyletic or paraphyletic [4]. Within Terebridae specifically, the first broad molecular phylogeny [3] confirmed the family’s monophyly and identified five major clades (A–E), each corresponding to lineages with distinct evolutionary histories. Among these, Hastula (Clade D) emerged as a coherent and well-supported genus, validating its traditional recognition on the basis of its smooth, glossy, and elongate shells. It is also demonstrated [3] that anatomical traits such as the venom apparatus had been lost independently in multiple terebrid lineages, underscoring the need for molecular evidence to reconstruct evolutionary history reliably [3].

Subsequent work expanded and refined this framework. Macroevolutionary analyses across Terebridae [5] revealed that diversification was more strongly associated with environmental factors than with the gain or loss of the venom gland. This highlighted the role of ecology in shaping terebrid diversity and suggested that shell morphology may often capture ecological adaptation rather than phylogenetic signal. Fedosov et al. (2020) [6] integrated multilocus DNA data with shell and anatomical traits in a comprehensive revision of the family. They proposed a three-subfamily classification (Pellifroniinae, Pervicaciinae, Terebrinae) and formally redefined genera to align taxonomy with molecular clades. Hastula was retained as a valid genus within Terebrinae, encompassing a distinct radiation of venomous, shallow-water sand-dwelling auger snails. Although morphologically cohesive, Fedosov et al. also noted substantial substructure within Hastula and emphasized the prevalence of convergence across the family, advocating for DNA-based diagnoses coupled with detailed morphological descriptions [6].

In parallel with these molecular advances, new computational tools have made it possible to revisit morphological data in novel ways. Deep learning, and particularly convolutional neural networks (CNNs), can extract complex quantitative features from shell images, capturing aspects of morphology that may be difficult to describe using traditional character coding. Recent proof-of-concept studies have shown that CNN-derived morphological embeddings can be used to construct distance matrices and even phylogenetic trees, providing a new source of automated, high-dimensional morphological data [7]. While such morphology-derived trees often differ from DNA-based topologies, they offer a complementary perspective and can be integrated into total-evidence frameworks that combine molecules, morphology, and fossils [2, 6, 7].

Here we present a case study of Hastula that applies CNN-based morphology alongside multilocus molecular phylogenetics. Building on earlier investigations in Oxymeris and other terebrids, we evaluate the performance of CNNs for species-level identification, examine whether CNN-derived feature spaces capture morphological affinities among species, and compare these morphological hypotheses with DNA-based phylogenies. This approach allows us to assess both the strengths and limitations of CNN-based morphology in reflecting evolutionary history and to explore how such methods can be integrated into the ongoing effort to align auger snail taxonomy with phylogeny.

Methods

Data Acquisition

Shell images were collected from many online resources, from specialized websites on shell collecting to institutes and universities. One of the largest collections of shell images is available on GBIF. Also online marketplace such as ebay contain a large collection of images. Other large shell image collections are available at , Malacopics, Femorale and Thelsica. A shell dataset created for AI is available [8].

Some online resources have facilities to download images, but most websites require a specialized webscraper. Scrapy , an open source and collaborative framework for extracting the data from websites, is used to create a custom webscraper to extract images and their scientific names. All data was stored in a MySQL database before further processing was performed.

The dataset for the Hastula CNN model comprises 3204 shell images representing 23 Hastula species (see table II). There are 104 species in the genus Hastula (WoRMS or MolluscaBase), but not enough images were found for 81 species. Species with less than 25 images were removed (see Minimum number of images needed for each species).

All sequences were retrieved from https://www.ncbi.nlm.nih.gov/gene/ using search expression "Terebridae"[Organism] OR Terebridae[All Fields]. A total of 3398 entries were retrieved and stored locally in a BioSQL database. These data were cross checked with MolluscaBase/WoRMS.

The sequences selected for this report are the ribosomal DNA , mitochondrial 12S, 16S and nuclear 28S and the mitochondrial Cox1 gene that have a valid MolluscaBase name. The publications with the source of most sequences are those by Holford et al. (2009) [3] and Modica et al. (2020) [5]

Image Pre-processing

All names were checked against WoRMS or MolluscaBase for their validity. Names that were not found in WoRMS/MolluscaBase were excluded for further processing. While a large part of this data quality step was automated, a manual verification (time-consuming) step was also included. In addition to text-based quality control, both automated and manual preprocessing steps were applied to the images. When an image contained multiple shells, we applied thresholding to binarize the background and then used contour detection to locate each shell’s outline, cropping out each detected contour as an individual image. The background was replaced with a uniform black background. A square image was made by padding with a black background. All shells were resized (400 x 400 px). A final visual selection was made before producing the final image dataset. Overall, 10-20% of the images were removed for various reasons (when other objects were visible in the picture such as hands, habitat, text, etc.).

Hardware

An HP Omen 30L GT13 was used for training the model. It contains a Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz processor, with 64GB RAM, Nvidia GeForce RTX 3080 10GB.

Model Training

For this study, Python (version 3.10.12) was used. The EffiecientNetV2B2 pre-trained models were used. (see Identifying Shells using Convolutional Neural Networks: Data Collection and Model Selection) Table 2 lists the hyperparameters. The models were trained using a batch size of 64 samples, and the number of epochs used was 100. The learning process was initiated with an initial learning rate of 0.0005 and the Adam optimiser was utilised for efficient weight updates. Two callbacks were used, one to monitor the validation loss and decreasing the learning rate , a second callback for early stopping. Both callbacks were applied to prevent the model from over-fitting. Fine-tuning the model was performed as described before. The top 3 layers of the model were unfrozen.

Table I. Hyperparameters

Hyperparameter	Value	Comments
Batch Size	64
Epochs	100	The number of epochs determines how many times the entire training dataset is passed through the model. Because early-stopping is used, often less than 100 epochs were needed. The current model ran for 30 epochs
Optimizer	Adam	The optimizer determines the algorithm used to update model weights during training.
Learning Rate	0.0005	The validation loss was monitored and adjusted reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, min_lr=1e-6)
Loss	Categorical Cross-entropy
Regularization	0.0001

Evaluation Metrics

The evaluation of the performance of the CNN models was carried out by using standard metrics for classification: accuracy, precision, recall, and F1 score, which are defined by [9] in terms of the number of FP (false positives); TP (true positives); TN (true negatives); and FN (false negatives) as follows:

Accuracy = \frac{TP + TN}{TP + TN + FN + FP}

Precision = \frac{TP}{TP + FP}

Recall = \frac{TP}{TP + FN}

F_{1} - Score = 2 x  \frac{Precision Recall}{Precision + Recall}

Python library sklearn.metrics was used to calculate these metrics.

Feature vectors from the the penultimate layer of the trained Hastula CNN model

To analyze the internal representations learned by the Hastula CNN model, we extracted high-dimensional feature vectors from the penultimate layer of the trained network. These embeddings capture rich semantic information about each image while abstracting away from pixel-level details. The model was implemented and trained using TensorFlow, and feature vectors were obtained using Keras’ Model subclassing, where a truncated version of the network outputs activations from the final convolutional or dense layer prior to classification. Each image in the dataset was passed through the network in inference mode, and the resulting feature vector (1408 dimensions) was stored for further analysis.
To quantify similarity between images, we computed pairwise cosine similarity between feature vectors.

For each image $i$ , compute similarity to all other images $j$ in the same class:

\text{sim}(i,j) = \cos(\theta) = \frac{\vec{f_i} \cdot \vec{f_j}}{\|\vec{f_i}\| \|\vec{f_j}\|}

Species	# images	Recall	Precision	F1
Hastula aciculina (Lamarck, 1822)	227	0.955	0.955	0.955
Hastula alboflava Bratcher, 1988	105	1.000	1.000	1.000
Hastula albula (Menke, 1843)	236	0.878	0.973	0.923
Hastula bacillus (Deshayes, 1859)	104	1.000	0.828	0.906
Hastula cinerea (Born, 1778)	323	0.855	0.908	0.881
Hastula cuspidata (Hinds, 1844)	81	0.889	0.762	0.821
Hastula escondida (Terryn, 2006)	74	0.944	0.944	0.944
Hastula exacuminata Sacco, 1891	53	0.867	0.813	0.839
Hastula hastata (Gmelin, 1791)	131	0.958	0.920	0.939
Hastula hectica (Linnaeus, 1758)	184	0.912	1.000	0.954
Hastula inconstans (Hinds, 1844)	78	1.000	0.900	0.947
Hastula lanceata (Linnaeus, 1767)	337	1.000	1.000	1.000
Hastula leloeuffi Bouchet, 1983	70	1.000	1.000	1.000
Hastula lepida (Hinds, 1844)	148	1.000	1.000	1.000
Hastula maryleeae R. D. Burch, 1965	105	0.737	0.875	0.800
Hastula matheroniana (Deshayes, 1859)	208	0.947	0.947	0.947
Hastula penicillata (Hinds, 1844)	109	0.923	0.923	0.923
Hastula raphanula (Lamarck, 1822)	146	1.000	0.957	0.978
Hastula rufopunctata (E. A. Smith, 1877)	38	0.818	0.750	0.783
Hastula salleana (Deshayes, 1859)	40	0.500	0.500	0.500
Hastula solida (Deshayes, 1859)	106	1.000	0.950	0.974
Hastula strigilata (Linnaeus, 1758)	189	0.905	0.927	0.916
Hastula stylata (Hinds, 1844)	112	0.867	0.867	0.867
This table provides a detailed breakdown of the model's classification performance for each of the 23 Hastula species included in the study. # images indicates the total number of images used for each species. Recall (Sensitivity) measures the model's ability to correctly identify all images of a given species. Precision measures the proportion of correct identifications among all images assigned to a species. The F1-score is the harmonic mean of precision and recall, providing a single metric for overall accuracy per species. Values approaching 1.0 indicate high performance.

References

[1] Wagner, P.J. Gastropod phylogenetics: Progress, problems, and implications. Journal of Paleontology. 75(6):1128-1140 (2001)
[2] Dayrat, B. Towards integrative taxonomy. Biological Journal of the Linnean Society 85: 407–415 (2005)
[3] M Holford et al. Evolution of the Toxoglossa Venom Apparatus as Inferred by Molecular Phylogeny of the Terebridae. Mol. Biol. Evol. 26(1):15–25. (2009)
[4] Puillandre N et al. Molecular phylogeny and evolution of the cone snails (Gastropoda, Conoidea). Mol Phylogenet Evol. 78:290-303, (2014)
[5] Modica, M. V., Gorson, J., Fedosov, A. E., et al. Macroevolutionary analyses suggest that environmental factors, not venom apparatus, play key role in Terebridae marine snail diversification. Systematic Biology 69 (3): 413–430 (2020)
[6] Fedosov, A. E., Malcolm, G., Terryn, Y., et al. Phylogenetic classification of the family Terebridae (Neogastropoda: Conoidea). Journal of Molluscan Studies 86: 1–29 (2020)
[7] Hunt, R et al. Integrating Deep Learning Derived Morphological Traits and Molecular Data for Total-Evidence Phylogenetics: Lessons from Digitized Collections. Systematic Biology 74(2) (2025)
[8] Zhang, Q., Zhou, J., He, J. et al. A shell dataset, for shell features extraction and recognition.. Nature, Sci Data 6, 226 (2019)
[9] Powers, D. M. W. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. (2011)
[10] Ph. Kerremans Deep Learning Meets Phylogeny: Evaluating CNN‑derived Morphological Signal in Auger Snail Genus Oxymeris (Terebridae). Identifyshell.org (2025)
[11] Ph. Kerremans Technical Reports for a Total-Evidence Phylogenetics of Terebridae. Identifyshell.org (2025)