I am developing software that depends on musical chords detection. I know some algorithms for pitch detection, with techniques based on cepstral analysis or autocorrelation, but they are mainly focused on monophonic material recognition. But I need to work with some polyphonic recognition, that is, multiple pitches at the same time, like in a chord; does anyone know some good studies or solutions on that matter?
I am currently developing some algorithms based on the FFT, but if anyone has an idea on some algorithms or techniques that I can use, it would be of great help.
This is quite a good Open Source Project: https://patterns.enm.bris.ac.uk/hpa-software-package
It detects chords based on a chromagram - a good solution, breaks down a window of the whole spectrum onto an array of pitch classes (size: 12) with float values. Then, chords can be detected by a Hidden Markov Model.
.. should provide you with everything you need. :)
I started researching different methods of automatic transcription in mid-2009, because I was curious about how far along this technology was, and if it could be integrated into a future version of Capo.
Each of these automatic transcription algorithms start out with some kind of intermediate represenation of the audio data, and then they transfer that into a symbolic form (i.e. note onsets, and durations).
This is where I encountered some computationally expensive spectral representations (The Continuous Wavelet Transform (CWT), Constant Q Transform (CQT), and others.) I implemented all of these spectral transforms so that I could also implement the algorithms presented by the papers I was reading. This would give me an idea of whether they would work in practice.
Capo has some impressive technology. The standout feature is that its main view is not a frequency spectrogram like most other audio programs. It presents the audio like a piano roll, with the notes visible to the naked eye.
(Note: The hard note bars were drawn by a user. The fuzzy spots underneath are what Capo displays.)
There's significant overlap between chord detection and key detection, and so you may find some of my previous answer to that question useful, as it has a few links to papers and theses. Getting a good polyphonic recogniser is incredibly difficult.
My own viewpoint on this is that applying polyphonic recognition to extract the notes and then trying to detect chords from the notes is the wrong way to go about it. The reason is that it's an ambiguous problem. If you have two complex tones exactly an octave apart then it's impossible to detect whether there are one or two notes playing (unless you have extra context such as knowing the harmonic profile). Every harmonic of C5 is also a harmonic of C4 (and of C3, C2, etc). So if you try a major chord in a polyphonic recogniser then you are likely to get out a whole sequence of notes that are harmonically related to your chord, but not necessarily the notes you played. If you use an autocorrelation-based pitch detection method then you'll see this effect quite clearly.
Instead, I think it's better to look for the patterns that are made by certain chord shapes (Major, Minor, 7th, etc).
See my answer to this question: How can I do real-time pitch detection in .Net?
The reference to this IEEE paper is mainly what you're looking for: http://ieeexplore.ieee.org/Xplore/login.jsp?reload=true&url=/iel5/89/18967/00876309.pdf?arnumber=876309
The harmonics are throwing you off. Plus, humans can find fundamentals in sound even when the fundamental isn't present! Think of reading, but by covering half of the letters. The brain fills in the gaps.
The context of other sounds in the mix, and what came before, is very important to how we perceive notes.
This is a very difficult pattern matching problem, probably suitable for an AI technique such as training neural nets or genetic algorithms.
Basically, at every point in time, you guess the number of notes being play, the notes, the instruments that played the notes, the amplitudes, and the duration of the note. Then you sum the magnitudes of all the harmonics and overtones that all those instruments would generate when played at that volume at that point in thier envelope (attack, decay, etc.). Subtract the sum of all those harmonics from the spectrum of you signal, then minimize the difference over all possibilities. Pattern recognition of the thump/squeak/pluck transient noise/etc. at the very onset of the note might also be important. Then do some decision analysis to make sure your choices make sense (e.g. a clarinet didn't suddenly change into a trumpet playing another note and back again 80 mS later), to minimize the error probability.
If you can constrain your choices (e.g. only 2 flutes playing only quarter notes, etc.), especially to instruments with very limited overtone energy, it makes the problem a lot easier.
This post is a bit old, but I thought I'd add the following paper to the discussion:
Klapuri,Anssi; Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model; IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008 255
The paper acts somewhat like a literature review of multipitch analysis and discusses a method based on an auditory model:
(The image is from the paper. I don't know if I have to get permission to post it.)