Try the FPGA DE0-Nano simulation environment (ghdl+quartus)

Take as a start point this file, BP_v3.vhd, and try to reduce the number of registers that are used (as was made with the file BP_v1.vhd). Use the ghdl testbench to verify that now we lose no sample, i.e. each Goertzel window uses 205 samples each 205 clk_en (instead of using 205 samples each 206 clk_en). Then sinthesyze the design with quartus and compare this compilation report with the last one 234/60/6 (combinational functions/logic registers/9-bit multipliers). Don't forget to connect at least one bit of X_dtf to an output.