Linear predictive coding (LPC) of speech - Forward线性预测编码(LPC)语音了.docx
SpeechProcessingProject1.inearPredictivecodingusingVoiceexcitedVocoderECE5525OsamaSarairehFall2005Dr.VetonKepuskaThebasicformofpitchexcitedLPCvocoderisshownbelowThespeechsignalisfilteredtonomorethanonehalfthesystemsamplingfrequencyandthenA/Dconversionisperformed.Thespeechisprocessedonaframebyframebasiswheretheanalysisframelengthcanbevariable.Foreachframeapitchperiodestimationismadealongwithavoicingdecision.AlinearpredictivecoefficientanalysisisperformedtoobtainaninversemodelofthespeechspectrumA(z).InadditionagainparameterG,representingsomefunctionofIhespeechenergyiscomputed.Anencodingprocedureisthenappliedfortransformingtheanalyzedparametersintoanefficientsetoftransmissionparameterswiththegoalofminimizingthedegradationinthesynthesizedspeechforaspecifiednumberofbits.Knowingthetransmissionframerateandthenumberofbitsusedforeachtransmissionparameters,onecancomputeanoise-freechanneltransmissionbitrate.Atthereceiver,thetransmittedparametersaredecodedintoquantizedversionsofthecoeifficentanalysisandpitchestimationparameters.Anexcitationsignalforsynthesisisthenconstructedfromthetransmittedpitchandvoicingparameters.Theexcitationsignalthendrivesasynthesisfilter1/A(z)correspondingtotheanalysismodelA(z).Thedigitalsampless(n)arethenpassedthroughanD/Aconverterandlowpassfilteredtogeneratethesyntheticspeechs(t).Eitherbeforeoraftersynthesis,thegainisusedtomatchthesyntheticspeechenergytotheactualspeechenergy.Thedigitalsamplesaretheconvertedtoananalogsignalandpassedthroughafiltersimilartotheoneattheinputofthesystem.LMearDrediCtiVeCOdin父(LPC)OfSDeeChThelinearpredictivecoding(LPC)methodforspeechanalysisandsynthesisisbasedonmodelingtheVocaltractasalinearAll-Pole(IIR)filterhavingthesystemtransferfunction:T = pitd periodimpulse trainInnovationsu(n)。UVSpeech SignalLPC FilterWI)whitenoisesimplespeechproductionWherepisthenumberofpoles,GisthefilterGain,andakaretheparametersthatdeterminethepoles.Therearetwomutuallyexclusivewaysexcitationfunctionstomodelvoicedandunvoicedspeechsounds.Forashorttime-basisanalysis,voicedspeechisconsideredperiodicwithafundamentalfrequencyofFo,andapitchperiodoflFo,whichdependsonthespeaker.Hence,Voicedspeechisgeneratedbyexcitingtheallpolefiltermodelbyaperiodicimpulsetrain.Ontheotherhand,unvoicedsoundsaregeneratedbyexcitingtheall-polefilterbytheoutputofarandomnoisegenerator.Thefundamentaldifferencebetweenthesetwotypesofspeechsoundscomesfromthewaytheyareproduced.Thevibrationsofthevocalcordsproducevoicedsounds.Therateatwhichthevocalcordsvibratedictatesthepitchofthesound.Ontheotherhand,unvoicedsoundsdonotrelyonthevibrationofthevocalcords.Theunvoicedsoundsarecreatedbytheconstrictionofthevocaltract.ThevocalcordsremainopenandtheconstrictionsofthevocaltractforceairouttoproducetheunvoicedsoundsGivenashortsegmentofaspeechsignal,letssayabout20msor160samplesatasamplingrate8KHz,thespeechencoderatthetransmittermustdeterminetheproperexcitationfunction,thepitchperiodforvoicedspeech,thegain,andthecoefficients3pk.Theblockdiagrambelowdescribestheencoder/decoderfortheLinearPredictiveCoding.Theparametersofthemodelaredeterminedadaptivelyfromthedataandmodeledintoabinarysequenceandtransmittedtothereceiver.Atthereceiverpoint,thespeechsignalisthesynthesizedfromthemodelandexcitationsignal.Theparametersoftheall-polefiltermodelaredeterminedfromthespeechsamplesbymeansoflinearprediction.TobespecifictheoutputofIheLinearPredictionfilterisPS()=工ap(k)s(nk)k=landthecorrespondingerrorbetweentheobservedsampleS(n)andthepredictedvalueAs(h)ise(h)=s(ri)一s(h)byminimizingthesumofthesquarederrorwecandeterminethepoleparameters/7(Jofthemodel.Theresultofdifferentiatingthesumabovewithrespecttoeachoftheparametersandequationtheresulttozero,isasepofplinearequationsP%(Z)Q(机幻=_噎(MWherem=I2.pk=whereGS(Mpresenttheautocorrelationofthesequence$()definedasNQ(M=s()s5+m)H=OtheequationabovecanbeexpressedinmatrixformasRd=一曝whereRSSaisapxpautocorrelationmatrix,GsiSaPXlautocorrelationvector,andaisapx1vectorofmodelparameters.rowcol=size(data);ifcol=1data=data"endnfrane=0;msfr=round(srl(X)Ofr);%Convertmstosamplesmsfs=round(sr/1000*fs);%Convertmstosamplesduration=Iength(data);speech=filler01-preemp,1,data)'%Preemphasizespeechnsoverlap=msfs-nsfr;ramp=0:1/(nsoverlap-1):1J'%Computepartofwindowforframeindex=1:msfr:duration-msfs+1%framerate=20rnsframeData=speech(frameindex:(frameIndex+ms-1);%framesize=3Omsnframe=nfrane+l;CiiitoCor=XcorriframeData);%ComputethecrosscorrelationautoCorVec=autoCor(msfs+0:LJ);TheseequationscanbesolvedinMATLBbyusingtheLevinson-Durbinalgorithm.%Levinson'smethoderr(1)=autoCorVec(I);k(l)=O;=;farindex=1:Lnumerator=/7A.'*autoCorVec(index+1:-1:2);denominator=-1*err(index);k(index)=nuneratordenoninator;%PARCORcoeffsA=A+k(index)*flipud(八);k(index)J;err(index+l)=(1-k(index)2)*err(index);Thegainparameterofthefiltercanbeobtainedbytheinput-outputrelationshipasfollowPs(n)=-Za,(k)s(n一2)+Gx()k=lwhereX(n)representtheinputsequence.WecanfurthermanipulatethisequationandintermsoftheerrorsequencewehavePGx(n)=s(n)+ap(k)s(n-k)=e(n)k=thenNTNTG2x2(n)=e2(n)n=0n=0iftheinputexcitationisnormalizedtounitenergybydesign,thenN-IN-IPG2x2(n)=e25)=(0)+XaP(Z)Q(k)n=0n=()k=lwhereG2issetequaltotheresidualenergyresultingfromtheleastsquareoptimization.%filterresponseifgain=0;cft=O:(1/255):1;forindex=1:Lgain=gain+aCoeffindex,nframe)*exp(-i*2*pi*cft).index;endgain=abs(!./gain);spec(:,nframe)=20*logl0(gain(l:128),;plot(20*lOg10(gain);title(nframe);drawnow;endifimplseResponse=filter(l,aCoeff(:,nframe),/1zeros(l,255)J);freqResp=20*logl0(abs(ffi(ImpidseResponse);plot(freqResp);endoncetheLPCcoefficientsarecomputed,wecandetermineweathertheinputspeechframeisvoiced,andifitisindeedvoicedsound,thenwhatisthepitch.Wecandeterminethepitchbycomputingthefollowingsequenceinmatlab:P小)=W(k)%(-k)k=whwrera(k)isdefinedasfollowPra(n)=aa(k)ap(i+k)k=lwhichisdefinedastheautocorrelationsequenceofthepredictioncoefficients.Thepitchiddetectedbyfindingthepeakofthenormalizedsequencere(11)(0)Inthetimeintervalcorrespondsto3to15msinthe20mssamplingframe.Ifthevalueofthispeakisatleast0.25,theframeofspeechisconsideredvoicedwithaMNP)F?ypitchperiodequaltothevalueof-p,where丫()isamaximumvalue.Ifthepeakvalueislessthan0.25,theframespeechisconsideredunvoicedandthepitchwouldequaltozero.errSig=filter(lA',IJrameData);%findexcitationnoiseG(nframe)=sqrt(err(L+l);%gainautoCorErr=xcorr(errSig);%calculatepitch&voicinginformationB,I=sort(autoCorErr);num=Iength(I);ifB(num-1)>.01*B(num)pitch(nframe)=abs(I(num)-I(num-1);elsepitch(nframe)=0;endThevalueoftheLPCcoefficients,thepitchperiod,andthetypeofexcitationarethentransmittedtothereceiver.Thedecodersynthesizesthespeechsignalbypassingtheproperexcitationthroughtheallpolefiltermodelofthevocaltract.Typicallythepitchperiodrequires6bits,thegainparametersarerepresentedin5bitsafterthedynamicrangeiscompressedIogrithmaticaly,andthepredictioncoefficientsrequire8-10bitsnormallyforaccuracyreasons.ThisisveryimportantinLPCbecauseanysmallchangesinthepredictioncoefficientsresultinlargechangeinthepolepositionsofthefiltermodel,whichcauseinstabilityinthemodel.ThisisovercomebyusingthePARACORmethod.ISSDeeChfr(ImeVoiCedOrUlVVoiCed?OncetheLPCcoefficientsarecompeted,wecandetermineweathertheinputspeechframeisvoiced,andifso,whatthepitchis.Ifthespeechframeisdecidedtobevoiced,animpulsetrainisemployedtorepresentit,withnonzerotapsoccurringeverypitchperiod.Apitch-detectingalgorithmisusedinordertodeterminetocorrectpitchperiod/frequency.Theautocorrelationfunctionisusedtoestimatethepitchperiodas.However,iftheframeisunvoiced,thenwhitenoiseisusedtorepresentitandapitchperiodofT=Oistransmitted.Therefore,eitherwhitenoiseorimpulsetrainbecomestheexcitationoftheLPCsynthesisfilterTwotypesofLPCvocoderswereimplementedinMATLABPlainLPCVocoderdiagramisshownbelow:%LPCvocoderfunctionoutspeech=speechcoderi(inspeech)f%Parameters:%inspeech:wavedatawithsamplingrateFs%(Fscanbechangedunderneathifnecessary)%Returns:%outspeech:wavedatawithsamplingrateFs%(codedandresynthesized)if(nargin-=1)enor('argumentcheckfailed');end;Fs=16000;%samplingrateinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCaCoeff,resid,pitch,G,parcor,stream=proclpc(inspeech,Fs,Order);%decode/SytUheSiZespeechusingLPCandimpulse-trainsasexcitationoutspeech=synlpc(aCoeff,pilch,Fs,G)results:residualplot:voiceexcitedLPCVocoder(utilizingDCTforhighcompressionrate/lowbits)theinputspeechsignalineachframeisfilteredwiththeestimatedtransferfunctionofLPCanalyzer.Thisfilteredsignaliscalledtheresidual.Toachieveahighcompressionrate,thediscretecosinetransform(DCT)oftheresidualsignalcouldbeemployed.TheDCTconcentratesmostoftheenergyofthesignalinthefirstfewcoefficients.Thusonewaytocompressthesignalistotransferonlythecoefficients,whichcontainmostoftheenergy.functionOutspeech=speechcoder2(inspecch)%Parameters:%inspeech:wavedatawithsamplingrateFs%(Fscanbechangedunderneathifnecessary)%Returns:%Outspeech:wavedatawithsamplingraleFs%(codedandresynthesized)if(nargin=1)crror(,argumcnicheckfailed');end;Fs=16000;%samplingraleinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCIaCoeff,resid,pitch,G,parcor.stream=proclpc(inspeech,Fs,Order);%performadiscretecosinetransformontheresidualresid=dct(resid);a,b=size(resid);%onlyusethefirst50DCT-Coefficientsthiscanbedone%becausemostoftheenergyofthesignalisconservedinthesecoeffsresid=resid(1:50,:);zeros(430,b);%quantizethedataresid=uencode(resid,4);resid=udecode(resid,4);%performaninverseDCTresid=idct(resid);%addsomenoisetothesignaltomakeitsoundbetternoise=zeros(50,b);0.01*randn(430,b);resid=resid+noise;%decode/synthesizespeechusingLPCandthecompressedresidualasexcitationOUtspeech=synlpc2(aCoeff.resid,Fs,G);resultsnoise=zeros(50,b);0.01*randn(430,b);resid=resid+noise;MATLABfiles:clearall;%osamasaraireh%speechprocessing%Dr.VetonKepuska%F1TFAll2005a=input('pleaseloadthespeechsignalasa.wavfile','s,);Inputsoundfile=a;inspeech,Fs,bitsl=Wavread(Inputsoundfile);%readthewavefileoutspeech1=speechcoder1(inspeech);%plainLPCvocoderoutspeech2=speechcoder2(inspeech);%VoiceexcitdedLPCvocoder%plotresultsfigure(l);subplot(3,l,l);lot(inseech);grid;subplot(3,l,2);lot(outspeechl);grid;subplot(3,l,3);lot(outspeech2);grid;dis(,Pressanykeytoplaytheoriginalsoundfile');pause;soundsc(inspeech,Fs);disp('PressanykeytoplaytheLPCcompressedfile!');pause;soundsc(outspeech1,Fs);dis(,Pressakeytoplaythevoice-excitedLPCcompressedsound!*);pause;soundsc(outspeech2,Fs);functionaCoeff,resid,pitch,G,parcor.streamJ=proclpc(data,sr,L,fr,fs,preemp)%L-Theorderoftheanalysis.%fr-Frametimeincrement,inms.Defaultsto20ms%fs-Framesizeinms.%aCoeff-TheLPCanalysisresults,%resid-TheLPCresidual,%pitch-calculatedbyfindingthepeakintheresidual'sautocorrelation%fbreachframe.%G-TheLPCgaintoreachframe.%parcor-Theparcorcoefficients.%stream-TheLPCanalysis'residualorexcitationsignalasonelongvector.if(nargin<3),L=10;endif(nargin<4),fr=20;endif(nargin<5),fs=30;endif(nargin<6),preemp=.9378;endrowcol=size(data);ifcol=ldata=data'endnframe=0;msfr=round(sr1000*fr);%Convertmstosamplesmsfs=round(sr/1OOO*fs);%Convertmstosamplesduration=length(data);speech=filter(l-preemp,I,data)'%Preemphasizespeechmsoverlap=msfs-msfr;ramp=0:l(msoverlap-l):1'%Computepartofwindowforframeindex=l:msfr:duration-msfs+l%framerate=20msframeData=seech(framelndexr(framelndex+msfs-l);%framesize=30msnframe=nframe+1;autoCor=xcorr(frameData);%ComputethecrosscorrelationautoCorVec=autoCor(msfs+0:L);%Levinson'smethoderr(l)=autoCorVec(l);k(l)=0;A=;forindex=1:Lnumerator=1A.,*autoCorVec(index+1:-1:2);denominator=-1*err(index);k(index)=numerator/denominator;%PARCORcoeffsA=A+k(index)*flipud(八);k(index)J;err(index+l)=(1-k(index)2)*err(index);endaCoeff(:,nframe)=1;A;parcor(:,nframe)=k,;%filterresponseifgain=0;cft=O:(1/255):1;forindex=1:Lgain=gain+aCoeff(index,nframe)*exp(-i*2*pi*cft).Aindex;endgain=abs(Lgain);spec(:,nframe)=20*log10(gain(1:128),;plot(20*log10(gain);title(nframe);drawnow;end%Calculatethefilterresponse%fromthefilter'simpulse%response(tocheckabove).ifImpuIseResponse=filter(l,aCoeff(:,nframe),1zeros(1,255);FreqResponse=20*logl0(abs(fft(imulseResonse);plot(freqRcsponse);enderrSig=filter(lA'J,LframeData);%findexcitationnoiseG(nframe)=sqrl(e(L+l);%gainautoCorErr=xcorr(errSig);%calculatepitch&voicinginformationB,I=sort(autoCorErr);num=Iength(I);ifB(num-1)>.01*B(num)pitch(nframe)=abs(I(num)-I(num-1);elsepitch(nframe)=O;end%improvethecompressedsoundqualityresid(:,nframe)=crrSigG(nframe);if(frameindex=I)%addresidualframesusingatrapezoidalwindowstream=resid(1:msfr,nframe);elsestream=stream;overlap+resid(1:msoverlap,nframe).*ramp;resid(msoverlap+kmsfr.nframe);endif(framelndcx+msfr+msfs-1>duration)stream=stream;resid(msfr+l:msfs,nframe);elseoverlap=resid(msfr+1:msfs,nframe).*flipud(ramp);endendstream=filter(1,1-preemp,stream),;SpeechModelone1.PCVocoder:functionOutspeech=speechcoder1(inspeech)%Parameters:%inspeech:wavedatawithsamplingrateFs%outputs:%Outspeech:wavedatawithsamplingrateFs%(codedandresynthesized)if(nargin=1)error('argumentcheckfailed');end:Fs=8000;%samplingrateinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCaCoeff,resid,pitch,G,arcor,stream=roclc(inseech,Fs,Order);%decode/synthesizespeechusingLPCandimpulse-trainsasexcitationOutspeech=synlpc(aCoeff,pitch,Fs,G);%Voice-excitedLPCvocoderfunctionOutspeech=speechcodcr2(inspeech)%Parameters:%inspeech:wavedatawithsamplingrateFs%(Fscanbechangedunderneathifnecessary)%output:%Outspeech:wavedatawithsamplingrateFs%(codedandresynthesized)if(nargin=1)error('argumentcheckfailed');end;Fs=16000;%samplingrateinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCIaCoeff,resid,pitch,G,parcor,stream=proclpc(inspeech,Fs,Order);%performadiscretecosinetransformontheresidualresid=dct(resid);a,b=size(resid);%onlyusethefirst50DCT-Coefficientsthiscanbedone%becausemostoftheenergyofthesignalisconservedinthesecoeffsresid=resid(1:50,:);zeros(430,b)1;%quantizethedataresid=uencode(resid,4);resid=UdeCOde(resid,4);%performaninverseDCTresid=idct(resid);%addsomenoisetothesignaltomakeitsoundbetternoise=zeros(50,b);0.0l*randn(430,b);resid=resid+noise;%decode/SynlheSiZespeechusingLPCandthecompressedresidualasexcitationOutspeech=s