Group Signal Alignment
Dave Deriso · 2026
When analyzing collections of time series—whether neural recordings, motion capture data, or physiological signals—we often encounter signals that share a common underlying shape but are misaligned in time. Standard averaging smears out these temporal differences, destroying the very structure we want to study. Group signal alignment solves this by jointly finding a common reference and the time warps that best align each signal to it.
In time-warped group alignment, the goal is to estimate central tendency . We solve this using GDTW with iterative refinement. Below is an illustration.
Optimization Formulation
We formulate group alignment as an optimization problem, where the warp functions and the target signal are the variables to be chosen. Our formulation follows the pattern common in machine learning: we minimize an objective that includes a loss function measuring the alignment error between warped signals and target, plus regularization terms that penalize excessive warping of the time axis.
Loss Functional
Let be a vector penalty function. We define the loss associated with a time warp function , on signal and target , as
the average value of the penalty function of the difference between the time-warped signal and the target. The smaller is, the better we consider to approximate .
Group Alignment via Regularized Loss Minimization
We propose to choose and by solving the optimization problem
where and are positive hyperparameters used to vary the relative weight of the three terms. Since regularizers and depend only on , not on , they are identical to those used in pairwise time warping and remain unchanged in the group setting. The variables in this optimization problem are the warp functions and the target signal .
The Iterative Solver
This problem is hard to solve exactly, but a simple iterative procedure works well in practice. We observe that if we fix the target , the problem splits into separate dynamic time warping problems that we can solve (separately, in parallel). Conversely, if we fix the warping functions , we can optimize over by minimizing
This is typically easy to do; for example, with square loss, we choose to be the mean of ; with absolute value loss, we choose to be the median of .
We solve this problem using block coordinate descent, alternating between optimizing the warping functions and the target signal. This leads to a simple iterative procedure:
Group time-warped alignment via block coordinate descent.
Iterative alignment process. At , the original misaligned signals with the initial reference highlighted in orange. Each subsequent iteration shows signals converging toward alignment.
This method of alternating between updating the target and updating the warp functions (in parallel) typically converges quickly. However, it need not converge to the global minimum. One simple initialization is to start with no warping, i.e., . Another is to choose one of the original signals as the initial value for .
Shaping the Results
There are two main ways to influence the behavior of this algorithm. First, we can choose a good initialization for —selecting a signal that is already “central” in some sense can speed convergence and improve final alignment quality. We explore several initialization strategies below. Second, we can add constraints to the optimization problem itself, restricting the space of admissible warping functions.
The choice of initialization in line 1 is our focus. A poor choice may require more iterations or, in pathological cases, converge to a suboptimal alignment.
When aligning a group of signals to a common reference, the choice of initial reference can significantly impact convergence speed and final alignment quality. While the iterative algorithm will eventually converge regardless of initialization, a good starting point can reduce the number of iterations needed and avoid local minima in non-convex settings.
Initialization Method 1: Pointwise Median
The simplest approach is the pointwise median, which computes a synthetic reference by taking the median amplitude at each time point:
This is robust to outliers at each time point and captures the “typical” amplitude. However, unlike the other methods we'll discuss, it does not select an actual signal from the group—it creates a new synthetic signal that may not correspond to any physically realizable waveform.
Limitation: If the signals are significantly misaligned, the pointwise median can have discontinuities or unnatural shapes where signals cross frequently. It may not represent any realistic signal morphology.
Initialization Method 2: Exemplar
To address the limitation that the pointwise median is synthetic, the exemplar method first computes the pointwise median, then selects the actual signal that is closest to it:
Improvement over pointwise median: The exemplar is always a real signal from the group, guaranteeing a physically realizable reference. It inherits the robustness of the pointwise median while producing a valid signal shape.
Limitation: The pointwise median that we're approximating may still be a poor target if signals are severely misaligned—the “closest real signal” to a bad reference is still influenced by that bad reference.
Initialization Method 3: Warping Functional Medoid
The warping functional medoid takes a fundamentally different approach: instead of measuring distances in amplitude space, it measures distances in the time-warped metric. It selects the signal that minimizes the total DTW cost when all other signals are aligned to it:
where is the optimal warping function aligning to candidate reference .
Improvement over Karcher median: While the Karcher median finds the most central signal in space, the warping functional medoid finds the most central signal under the DTW metric. This accounts for timing variations: a signal that is far from others in raw amplitude may actually be very similar after time alignment, making it a better reference for iterative warping.
Limitation: Computing the warping functional medoid requires pairwise DTW computations, making it more expensive than the other methods. However, this cost is often worthwhile when signals have significant timing variability.
Initialization Method 4: Karcher Median
Rather than approximating a synthetic reference, the Karcher median (also called the geometric median or L1-medoid) directly finds the signal that is most central in the group:
Improvement over exemplar: Instead of finding the signal closest to an intermediate synthetic reference, the Karcher median directly optimizes for centrality by minimizing the sum of distances to all other signals. This makes it robust to outliers and ensures the selected signal is truly “central” in signal space.
Limitation: Euclidean distance doesn't account for temporal misalignment. A signal may be geometrically central but temporally shifted, requiring significant warping from all other signals to align.
Summary
| Method | Best When... | Complexity |
|---|---|---|
| Pointwise median | Robustness to outliers is critical; synthetic reference is acceptable | |
| Exemplar | You want a real signal that approximates the “average” | |
| Karcher median | Signals are roughly aligned; you need a fast, robust baseline |
Time-Centered Group Alignment
Centering Constraint
In addition to regularization, we can impose an optional time-centering constraint that requires the warping functions to be evenly arranged about the identity , such that
We denote time-centered time-warp functions satisfying this constraint as . The resulting centered warp functions produce a time-centered estimate of central tendancy,
Time-Centered Group Alignment via Regularized Loss Minimization
To obtain a time-centered alignment, we choose and by solving the optimization problem
where the tilde notation denotes centered quantities: are the centered warp functions and is the centered mean.
The following comparison shows standard alignment (top) versus centered alignment (bottom). Notice how in centered alignment, the warping functions are evenly distributed around the identity—some signals are warped forward in time while others are warped backward, maintaining balance around the original time axis:
Enforcing Centering
The centering constraint can be enforced within the iterative algorithm by adding a simple projection step. After computing the unconstrained warps in each iteration, we compute their pointwise mean and subtract the deviation from the identity to obtain centered warps :
Time-centered group alignment. Lines 5–6 project the warps onto the centering constraint.
Lines 5–6 are the centering projection. Subtracting the mean warp and adding back the identity ensures that the centered warps satisfy at every time point. One can verify this directly:
The projection is inexpensive—a single pass over the warps at each iteration—and does not change the structure of the solver. The only difference from the unconstrained algorithm is that the target in line 7 is computed from the centered warps rather than the raw warps, so the reference signal stays anchored to the original time axis as the iterations progress.
Matrix Notation
It is convenient to collect signals and warps into matrices. Let denote the collection of original signals, the warping functions obtained from group alignment, and the estimated central tendency. After centering, we write
for the centered warps satisfying , and
for the centered aligned signals—the original signals composed with the centered warps. In both cases, the tilde indicates that the centering constraint has been applied.
These two representations provide complementary views of the data: captures what the signals look like after alignment (amplitude structure), while captures how they were warped to get there (timing structure). Together they form a complete decomposition of the original signals into shape and timing components, which can be analyzed independently or jointly via SVD.
Looking ahead
Group alignment gives us a principled way to go from a noisy collection of misaligned signals to a clean pair of structured representations—aligned shapes and the warps that produced them. The centering constraint anchors these representations to the original time axis, making the decomposition unique and directly interpretable. With and in hand, we are ready to ask a deeper question: what are the principal modes of variation in amplitude and timing? We take this up in the next post, where we apply SVD to extract a low-dimensional basis for both.
See also: A General Optimization Framework for Dynamic Time Warping