Back to Article
Article Notebook
Download Source

Diaptera wings classification using Topological Data Analysis

Authors
Affiliation

Guilherme Vituri F. Pinto

Unesp

Sergio Ura

Northon

Published

April 15, 2025

Abstract

We studied etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc

Keywords

Topological Data Analysis, Persistent homology

In [1]:
using TDAfly, TDAfly.Preprocessing, TDAfly.TDA, TDAfly.Analysis
using Images: mosaicview
using Plots: plot, display, heatmap, scatter
using PersistenceDiagrams
Precompiling packages...
   3570.9 ms  ✓ TDAfly
  1 dependency successfully precompiled in 5 seconds. 397 already precompiled.

1 Introduction

Falar sobre o dataset, TDA, etc.

2 Methods

All images are in the images/processed directory. For each image, we load it, apply a gaussian blur, crop and make it have 150 pixels of height. The blurring step is necessary to “glue” small holes in the figure and keep it connected.

In [2]:
paths = readdir("images/processed", join = true)
species = basename.(paths) .|> (x -> replace(x, ".png" => ""))
individuals = map(species) do specie
  s = split(specie, " ")
  s[1][1] * "-" * s[2]
end
wings = load_wing.(paths, blur = 1.3)
Xs = map(image_to_r2, wings);
In [3]:
mosaicview(wings, ncol = 4, fillvalue=1)

2.1 Vietoris-Rips filtration

We select 500 points from each image using a farthest point sample method

In [4]:
samples = map(Xs) do X
  ids = farthest_points_sample(X, 500)
  X[ids]
end;
Progress:   0%|▏                                        |  ETA: 0:00:53Progress: 100%|█████████████████████████████████████████| Time: 0:00:00
Progress:  97%|███████████████████████████████████████▋ |  ETA: 0:00:00Progress: 100%|█████████████████████████████████████████| Time: 0:00:00
Progress:  88%|████████████████████████████████████     |  ETA: 0:00:00Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

We then calculate its persistence diagrams using the Vietoris-Rips filtration etc.

In [5]:
# get only the 1-dimensional PD
rips = rips_pd.(samples, cutoff = 5, threshold = 200) .|> last;

We create the 1-dimensional persistence image for each persistence diagram using 10x10 matrices

In [6]:
PI = PersistenceImage(rips, size = (10, 10))

images_rips = PI.(rips);

2.2 Examples

Below are some examples of 1-dimensional barcodes, its persistence image and the original wing that generated it. Note: we are plotting the barcode using the birth and persistence.

In [7]:
# plot some images to see the barcodes
map([1, 4, 8, 10, 15]) do i
  p = plot_wing_with_pd(rips[i], images_rips[i], samples[i], species[i])
  display(p)
end;

We now calculate the Euclidean distance between each persistence image (seen as a vector of \(\mathbb{R}^{10x10}\)) and plot its heatmap

In [8]:
D_rips = pairwise_distance(images_rips);
In [9]:
plot_heatmap(D_rips, individuals, "Distance matrix for Vietoris-Rips barcodes")

2.3 Persistence Homology Transform

Now we will create several filtrations based on points and lines, etc.

We start with the point (0, 0). Its filtration is the following

In [10]:
A = wings[1] |> image_to_array;
f = dist_to_point(0, 0)
Af = modify_array(A, f)
heatmap(Af)

with corresponding sublevel barcode as

In [11]:
point_pds = cubical_pd(Af, cutoff = 0.05)
plot_pd(point_pds)

or, with persistence in the y-axis:

In [12]:
plot_pd(point_pds, persistence = true)

Let’s see step-by-step of this filtration:

In [13]:
for tr  reverse([0:0.1:1;])
  X = findall_ids(>(tr), Af)
  title = "threshold: $tr"
  p = scatter(first.(X), last.(X), title = title)
  display(p)
end

Due to noise, some connected components are born in 0.2 and die only at 0. But the loops seems alright.

3 Draft…..

In [14]:
i = 11
X = Xs[i]
sample = samples[i]
scatter(last.(X), first.(X))
scatter(last.(sample), first.(sample))