[ACCEPTED]-Detecting patterns in waves-pattern-recognition

Accepted answer
Score: 55

The first thing that I would do is see what is already out there. Indeed, this 43 specific problem has already been heavily 42 researched. Here is a brief overview of 41 some really simple methods: link.

I must respond 40 to another answer, as well. I do research 39 in signal processing and music information 38 retrieval. On the surface, this problem 37 does appear similar to onset detection, but 36 the problem context is not the same. This 35 type of biological signal processing, i.e., detection 34 of the P, QRS, and T phases, can exploit 33 knowledge of specific time-domain characteristics of each of these waveforms. Onset 32 detection in MIR doesn't, really. (Not reliably, at 31 least.)

One approach that would work well 30 for QRS detection (but not necessarily for 29 note onset detection) is dynamic time warping. When 28 time-domain characteristics remain invariant, DTW 27 can work remarkably well. Here is a short 26 IEEE paper that uses DTW for this problem: link.

This 25 is a nice IEEE magazine article that compares 24 many methods: link. You'll see that many common 23 signal processing models have been tried. Skim 22 the paper, and try one that you understand 21 at a basic level.

EDIT: After browsing these 20 articles, a wavelet-based approach seems 19 most intuitive to me. DTW will work well, too, and 18 there exist DTW modules out there, but the 17 wavelet approach seems best to me. Someone 16 else answered by exploiting derivatives 15 of the signal. My first link examines methods 14 from before 1990 that do that, but I suspect 13 that they are not as robust as more modern 12 methods.

EDIT: I'll try to give a simple 11 solution when I get the chance, but the 10 reason why I think wavelets are suited here 9 are because they are useful at parameterizing 8 a wide variety of shapes regardless of time or amplitude scaling. In 7 other words, if you have a signal with the 6 same repeated temporal shape but at varying 5 time scales and amplitudes, wavelet analysis 4 can still recognize these shapes as being 3 similar (roughly speaking). Also note that 2 I am sort of lumping filter banks into this 1 category. Similar things.

Score: 17

A piece of this puzzle is "onset detection" and 51 a number of complex algorithms have been 50 written to solve this problem. Here is 49 more information on onsets.

The next piece is a 48 Hamming Distance. This algorithms allow you to make fuzzy 47 comparisons, the input is 2 arrays and 46 the output is an integer "distance" or 45 difference between the 2 data sets. The 44 smaller the number, the more alike the 43 2 are. This is very close to what you need, but 42 its not exact. I went ahead and made some 41 modifications to the Hamming Distance algorithm 40 to calculate a new distance, it probably 39 has a name but i don't know what it is. Basically 38 it adds up the absolute distance between 37 each element in the array and returns the 36 total. Here is the code for it in python.

import math

def absolute_distance(a1, a2, length):
       for x in range(0,length):
       return total_distance


This 35 script outputs 2, which is the distance 34 between these 2 arrays.

Now for putting 33 together these pieces. You could use Onset 32 detection to find the beginning of all waves 31 in the data set. You can then loop though 30 these location comparing each wave with 29 a sample P-Wave. If you hit a QRS Complex 28 the distance is going to be the largest. If 27 you hit another P-Wave the number isn't 26 going to be zero, but its going to be much 25 smaller. The distance between any P-Wave 24 and any T-Wave is going to be pretty small, HOWEVER 23 this isn't a problem if you make the following 22 assumption:

The distance between any p-wave and any other p-wave will be smaller than the distance between any p-wave and any t-wave.

The series looks something like 21 this: pQtpQtpQt... The p-wave and t-wave 20 is right next to each other, but because 19 this sequence is predictable it will be 18 easier to read.

On a side not, there is 17 probably a calculus based solution to this 16 problem. However in my mind curve fitting 15 and integrals make this problem more of 14 a mess. The distance function I wrote 13 will find the area difference which is very similar subtracting 12 the integral of both curves.

It maybe possible 11 to sacrifice the onset calculations in favor 10 of iterating by 1 point at a time and thus 9 performing O(n) distance calculations, where 8 n is the number of points in the graph. If 7 you had a list of all of these distance 6 calculations and knew there where 50 pQt 5 sequences then you would know the 50 shortest 4 distances that do not overlap where all locations of p-waves. Bingo! how 3 is that for simplicity? However the trade 2 off is loss of efficiency due to an increased 1 number of distance calculations.

Score: 8

You can use cross-correlation. Take a model sample of each 10 pattern and correlate them with the signal. You 9 will get peaks where the correlation is 8 high. I would expect good results with this 7 technique extracting qrs and t waves. After 6 that, you can extract p waves by looking 5 for peaks on the correlation signal that 4 are before qrs.

Cross-correlation is a pretty 3 easy to implement algorithm. Basically:

x is array with your signal of length Lx
y is an array containing a sample of the signal you want to recognize of length Ly
r is the resulting correlation

for (i=0; i<Lx - Ly; i++){
  r[i] = 0;
  for (j=0; j<Ly ; j++){
    r[i] += x[i+j]*y[j];

And 2 look for peaks in r (values over a threshold, for 1 instance)

Score: 7

The first thing I would do is simplify the data.

Instead of analyzing absolute data, analyze 19 the amount of change from one data point 18 to the next.

Here is a quick one liner that 17 will take ; separated data as input, and 16 output the delta of that data.

perl -0x3b -ple'( $last, $_ ) = ( $_, $_-$last )' < test.in > test.out

Running it 15 on the data you provided, this is the output:

0;0;20;0;0;-1;-1;-1;0;0;0;0;-1;0;0;0;0;0;0;1;0;1;1;1;1;1;1;0;0;2;0;-2;-1;-2;-1;-2;-1;0;-2;-1;1;-1;0;-1;0;0;0; 0;-1;0;-1;2;4;6;9;7;7;6;-4;-6;-8;-7;-5;-4;0;-1;0;-1;1;1;0;1;0;-1;1;0;0;0;0;0;0;-1;0;1;1;-1;0;1;0;0;0;1;0;-1;1; 2;2;0;1;1;1;1;1;1;1;0;0;1;0;0;-1;-2;-1;-2;-2;-2;-2;0;-1;-1;0;-1;0;-1;0;-1;0;1;-1;0;0;0;0;0;0;0;0;-1;1;1;0;0;0; 0;0;0;0;0;-1;1;-1;0;0;1;0;0;0;0;0;0;0;-1;1;0;0;0;0;-1;0;0;0;0;1;0;1;1;0;1;0;0;1;1;1;0;0;0;-1;-1;-2;-1;0;-2;0; -1;0;-1;0;1;-1;0;0;-1;0;0;0;1;5;5;7;8;9;4;-7;-5;-8;-7;-6;-2;-1;0;0;0;0;0;1;0;0;1;-1;0;1;0;-1;1;0;0;0;1;0;0;0; 1;0;1;0;0;0;1;1;0;2;1;1;1;1;1;1;1;1;1;-1;1;0;0;-1;-2;-2;-2;-2;-1;0;-1;-2;-1;0;-1;-1;0;1;-1;1;0;-1;1;-1;1;0;-1; 0;0;0;-1;1;0;0;1;0;-1;0;1;0;0;1;-1;0;-1;1;0;-1;0;0;0;0;1;-1;0;1;-1;0;0;0;0;0;0;1;-1;0;1;0;0;2;0;1;0;1;1;1;-1; 0;-2;0;-1;-2;0;-1;-1;-2;-1;0;0;0;0;0;0;0;0;-1;0;0;4;3;9;8;11;4;-5;-6;-8;-8;-4;-2;-2;0;0;0;-1;1;0;0;1;0;0;1;-1; 0;1;0;0;0;1;-1;0;1;1;0;0;0;0;1;0;1;0;1;2;1;1;2;0;1;1;1;1;0;0;1;1;0;0;-35;0;0;0;

There are new-lines inserted in the above text not originally present in the output.

After 14 you have done that it is trivial to find 13 the qrs complex.

perl -F';' -ane'@F = map { abs($_) > 2 and $_ } @F; print join ";", @F'< test.out


The 12 20 and -35 data points result from the original 11 data starting and ending with 0.

To find the 10 other data points you will have to rely 9 on pattern matching.

If you look at the first 8 p wave, you can clearly see a pattern.

#           \________ up _______/   \________ down _________/

It 7 isn't as easy to see the pattern on the 6 second p wave though. This is because the 5 second one is spread out further

#     \________ up _______/       \________________ down ________________/

The third 4 p wave is a little more erratic than the 3 other two.

#                \_______ up ______/  \__________ down __________/

You would find the t waves in 2 a similar manner to the p waves. The main 1 difference is when they occur.

This should be enough information to get you started.

The two one-liners are for example only, not recommended for daily use.

Score: 4

Are those other two sharp peaks and valleys 19 also qrs complexes?

Off the top of my head, I 18 think what you need to do is calculate the 17 slope of this graph at each point. Then 16 you also need to see how quickly the slope 15 changes (2nd derivative???). If you have 14 an abrupt change then you know you've hit 13 some kind of sharp peak. Of course, you 12 want to limit the detection of the change, so 11 you might want to do something like "if 10 the slope changes by X over time interval 9 T", so that you don't pick up the tiny bumps 8 in the graph.

It's been a while since I did 7 any math... and this seems like a math question 6 ;) Oh, and I haven't done any sort of signal 5 analysis either :).

Just adding another point. You 4 can also try signal-averaging I think. For 3 example, averaging the last 3 or 4 data 2 points. I think you can detect abrupt changes 1 that way too.

Score: 4

I'm no expert in this specific problem, but 7 just off the top of my head from more general 6 knowledge: Let's say you know the QRS complex 5 (or one of the other features, but I'll 4 use the QRS complex for this example) takes 3 place in roughly some fixed time period 2 of length L. I wonder if you could treat 1 this as a classification problem as follows:

  1. Split your signal into overlapping windows of length L. Each window either does or doesn't have the full QRS complex in it.
  2. Fourier transform each window. Your features are the signal strength at each frequency.
  3. Train a decision tree, support vector machine, etc. on some hand-annotated data.
Score: 3

One approach that will very likely yield 15 good results is curve fitting:

  • Divide the continuous wave into intervals (probably it's best to have the interval borders about half way between the sharp peaks of the qrs complexes). Only consider a single interval at a time.
  • Define a model 14 function that can be used to approximate 13 all possible variations of electrocardiographic 12 curves. This is not as difficult as it seems 11 first. The model function can be constructed 10 as a sum of three functions with parameters 9 for the origin (t_), amplitude (a_) and 8 width (w_) of each wave.

       f_model(t) = a_p   *  f_p  ((t-t_p  )/w_p) + 
                    a_qrs *  f_qrs((t-t_qrs)/w_qrs) +
                    a_t   *  f_t  ((t-t_t  )/w_t)

    The functions f_p(t), f_qrs(t), f_t(t) are 7 some simple function that can be uses to 6 model each of the three waves.

  • Use a fitting 5 algorithm (e.g. the Levenberg-Marquardt-Algorithm 4 http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm) to determine the fitting parameters a_p, t_p, w_p, a_qrs, t_qrs, w_qrs, a_t, t_t, w_t 3 for the dataset of each intervall.

    The parameters 2 t_p, t_qrs and t_p are the ones you are 1 interested in.

Score: 3

This is a wonderful question! I have a 21 few thoughts:

Dynamic Time Warping could be an interesting tool 20 here. You would establish "templates" for 19 your three classes, and then using DTW could 18 see the correlation between your template 17 and "chunks" of the signal (break 16 the signal up into, say, .5 second bits, ie. 0-.5 15 .1-.6 .2-.7...). I've worked with something 14 similar for gait analysis with accelerometer 13 data, it worked reasonably well.

Another 12 option is a combined signal processing/ machine 11 learning algorithm. Break your signal into 10 "chunks" again. Make "templates" again 9 (you'll want a dozen or so for each class) take 8 the FFT of each chunk/template and then use 7 a Naïve Bayes Classifier (or another ML classifier, but NB should 6 cut it) to classify for each of your three 5 classes. I've also tried this on gait data, and 4 was able to get upwards of 98% precision 3 and recall with relatively complicated signals. Let 2 me know how this works, it's a very exciting 1 problem.

Score: 1

"Wavelet transform" may be a relevant keyword. I've 6 once attended a presentation by someone 5 who used this technique to detect different 4 heartbeat phases in a noisy ecg.

As far as 3 my limited understanding goes, it's somewhat 2 like a Fourier transform, but using (scaled) copies 1 of a, in your case heartbeat-shaped, pulse.

Score: 1

First, the various components of the standard 27 electrocardiogram wave can be missing from 26 any given plot. Such a plot is generally 25 abnormal and usually indicates some sort 24 of problem, but you can't be promised that 23 they're there.

Secondly, recognizing them 22 is as much art as science, especially in 21 the cases where something is going wrong.

My 20 approach might be to try to train a neural 19 network to identify the components. You 18 would give it the previous 30 seconds of 17 data, normalized so the lowest point was 16 at 0 and the highest point at 1.0 and it 15 would have 11 outputs. The outputs that 14 weren't abnormality ratings would be a weighting 13 for the last 10 seconds. A 0.0 would be 12 -10 seconds from present, and a 1.0 would 11 mean now. The outputs would be:

  1. Where the most recent P wave started
  2. Where the most recent P wave ended
  3. Abnormality rating of most recent P wave with one extreme being 'absent'.
  4. Where the most recent QRS complex started
  5. Where the Q portion of the most recent QRS complex turned into the R portion.
  6. Where the R portion of the most recent QRS complex turned into the S portion.
  7. Where the most recent QRS complex ended.
  8. Abnormality rating of most recent QRS complex with one extreme being 'absent'.
  9. Where the most recent T wave started.
  10. Where the most recent T wave ended.
  11. Abnormality rating of most recent T wave with one extreme being 'absent'.

I might 10 double check this with some of the other 9 kinds of analysis people suggested, or use 8 those other kinds of analysis along with 7 the output of the neural network to give 6 you your answer.

Of course, this detailed 5 description of the neural network shouldn't 4 be taken as prescriptive. I'm sure that 3 I didn't necessarily pick the most optimal 2 outputs for example, I just sort of tossed 1 some ideas about what they might be.

Score: 1

Wavelets have been shown to be the best 14 tool for locating peaks in this type of 13 data where the peaks are "different 12 sizes" - the scaling properties of 11 wavelets make it an ideal tool for this 10 type of multi-scale peak detection. This 9 looks like a non-stationary signal so using 8 a DFT would not be the right tool as some 7 have suggested, but if this is an exploratory 6 project you could look at using the spectrum 5 of the signal (estimated using essentially 4 the FFT of the autocorrelation of the signal.)

Here is 3 a great paper reviewing several peak detection 2 methods - this would be a good place to 1 start.


Score: 1

using BioSPPY

it is not presently possibly 9 to implement T-wave analysis, as at present 8 it only contains R-wave analysis. eg Tstart 7 Tpeak Tend

are not automatically implimented

one 6 would have to use your own analysis.

my suggestion 5 would be to try and implement the below 4 method


which is one that I have recently 3 discovered and found to be very interesting

The 2 other t-wave analysis method worth looking 1 at is that by ECGlib team


hope this helps

Score: 0

I haven't read each other answer thoroughly 11 but I have scanned them and I noticed that 10 no one recommended looking at the Fourier 9 Transform to segment these waves.

To me 8 it seems like a clear cut application of 7 Harmonic analysis in mathematics. There may be several subtle 6 points that I may be missing.

The Discrete Fourier Transform coefficients 5 give you the amplitude and phase of the 4 different sinusoidal components that make 3 up your discrete time signal, which is essentially 2 what your problem states you want to find.

I 1 may be missing something here though ...

More Related questions