Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to / Reading of AxiRam from the hardware DUT hang when adding when loops to the test bench #85

Open
JeffreyWong20 opened this issue Apr 8, 2024 · 0 comments

Comments

@JeffreyWong20
Copy link

JeffreyWong20 commented Apr 8, 2024

Cocotb version: 0.1.24
Hardware simulator: QuestaSim

Hello, I've encountered an issue while working on my project. I've instantiated an AxiRam in Cocotb to serve as the model RAM for the DUT (Device Under Test), enabling me to initialize its values from the Cocotb test bench. Initially, everything works well, especially with smaller-scale test cases. However, when attempting to test with a larger input scale, I encountered a problem.

To handle the larger input scale, I introduced a while loop in the test bench to wait for responses. Unfortunately, this seems to cause the write and read operations of the AxiRam to hang.

Due to the complexity of the DUT, I'm unable to provide the entire code. However, I can confirm that I haven't altered any inputs to the DUT itself. The only change made was the inclusion of a while loop to maintain the test bench's execution. I've observed the write and read operations hanging from the output console.

For reference, here's an example of the test bench, where the DUT completes a read operation from the AxiRam and do calculation on top and write data back in the end:

        dut.weight_prefetcher_req_valid.value = 1                               # enable the prefetcher
        dut.weight_prefetcher_req.req_opcode.value   = 0                        # 00 is for weight bank requests
        dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration)   # start address of the weight bank
        dut.weight_prefetcher_req.in_features.value  = weight_matrix_size[1]    # number of input features                     
        dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0]    # number of output features
        dut.weight_prefetcher_req.nodeslot.value     = 0                        # not used for weight bank requests
        dut.weight_prefetcher_req.nodeslot_precision.value = 1                  # 01 is for fixed 8-bit precision
        dut.weight_prefetcher_req.neighbour_count.value = 0                     # not used for weight bank requests
        # --------------------------------------------------
        dut.feature_prefetcher_req_valid.value = 1                              # enable the prefetcher
        dut.feature_prefetcher_req.req_opcode.value   = 0                       # 00 is for weight bank requests
        dut.feature_prefetcher_req.start_address.value  = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration)   # start address of the feature bank
        dut.feature_prefetcher_req.in_features.value  = input_matrix_size[1]    # number of input features
        dut.feature_prefetcher_req.out_features.value = input_matrix_size[0]    # number of output features
        dut.feature_prefetcher_req.nodeslot.value     = 0                       # not used for weight bank requests
        dut.feature_prefetcher_req.nodeslot_precision.value = 1                 # 01 is for fixed 8-bit precision
        dut.feature_prefetcher_req.neighbour_count.value = 0                    # not used for weight bank requests
        # --------------------------------------------------
        dut.nsb_fte_req_valid.value = 1                                         # enable the fte
        dut.nsb_fte_req.precision.value = 1                                     # 01 is for fixed 8-bit precision
        dut.layer_config_out_channel_count.value = input_matrix_size[0]         # here we used the first dimension of the input matrix as output channel count
        dut.layer_config_out_features_count.value = weight_matrix_size[0]       # here we used the first dimension of the weight matrix as output features count       
        dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11        # 2 is for the msb of 34 bits address
        dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF          # 0 for the rest of the address
        dut.writeback_offset.value = offset                                     # 0 for the writeback offset
        #---------------------------------------------------
        print("Done instructing fte")
        i = 0
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.nsb_fte_resp_valid.value == 1:
                done = True
                break
            
            if i==1000000:
                done = False
                break
            i+=1
        reset_fte(dut)

This test bench passed successfully, and all the reading and writing logs from the console appear to be correct. However, upon introducing a while loop as shown below:

        dut.weight_prefetcher_req_valid.value = 1                               # enable the prefetcher
        dut.weight_prefetcher_req.req_opcode.value   = 0                        # 00 is for weight bank requests
        dut.weight_prefetcher_req.start_address.value = byte_per_weight_block * (i % weight_matrix_iteration)   # start address of the weight bank
        dut.weight_prefetcher_req.in_features.value  = weight_matrix_size[1]    # number of input features                     
        dut.weight_prefetcher_req.out_features.value = weight_matrix_size[0]    # number of output features
        dut.weight_prefetcher_req.nodeslot.value     = 0                        # not used for weight bank requests
        dut.weight_prefetcher_req.nodeslot_precision.value = 1                  # 01 is for fixed 8-bit precision
        dut.weight_prefetcher_req.neighbour_count.value = 0                     # not used for weight bank requests
        # --------------------------------------------------
        dut.feature_prefetcher_req_valid.value = 1                              # enable the prefetcher
        dut.feature_prefetcher_req.req_opcode.value   = 0                       # 00 is for weight bank requests
        dut.feature_prefetcher_req.start_address.value  = weigth_address_range + byte_per_input_block * (i // weight_matrix_iteration)   # start address of the feature bank
        dut.feature_prefetcher_req.in_features.value  = input_matrix_size[1]    # number of input features
        dut.feature_prefetcher_req.out_features.value = input_matrix_size[0]    # number of output features
        dut.feature_prefetcher_req.nodeslot.value     = 0                       # not used for weight bank requests
        dut.feature_prefetcher_req.nodeslot_precision.value = 1                 # 01 is for fixed 8-bit precision
        dut.feature_prefetcher_req.neighbour_count.value = 0                    # not used for weight bank requests
        # --------------------------------------------------
        await Timer(10, units="ns")
        p = 0
        fetched_weight, fetched_input = False, False
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.weight_prefetcher_resp_valid.value == 1:
                fetched_weight = True
            if dut.feature_prefetcher_resp_valid.value == 1:
                fetched_input = True
            if fetched_weight and fetched_input:
                break
            elif p==1000000:
                raise ValueError("Deadlock detected: weight_prefetcher_req_ready and feature_prefetcher_req_ready are not ready")
            p+=1
        reset_nsb_prefetcher(dut)
        # --------------------------------------------------
        dut.nsb_fte_req_valid.value = 1                                         # enable the fte
        dut.nsb_fte_req.precision.value = 1                                     # 01 is for fixed 8-bit precision
        dut.layer_config_out_channel_count.value = input_matrix_size[0]         # here we used the first dimension of the input matrix as output channel count
        dut.layer_config_out_features_count.value = weight_matrix_size[0]       # here we used the first dimension of the weight matrix as output features count       
        dut.layer_config_out_features_address_msb_value.value = (writeback_address >> 32) & 0b11        # 2 is for the msb of 34 bits address
        dut.layer_config_out_features_address_lsb_value.value = writeback_address & 0xFFFFFFFF          # 0 for the rest of the address
        dut.writeback_offset.value = offset                                     # 0 for the writeback offset
        #---------------------------------------------------
        print("Done instructing fte")
        i = 0
        while True:
            await RisingEdge(dut.clk)
            await Timer(10, units="ns")
            if dut.nsb_fte_resp_valid.value == 1:
                done = True
                break
            
            if i==1000000:
                done = False
                break
            i+=1
        reset_fte(dut)

It appears that introducing the while loop has caused the read operation to hang, as depicted in the provided image.image
Comparing it with the completed read and write operations, which occurred without the while loop, everything seems to function correctly, as shown in the second image.
image

For reference, this is how I connected AxiRam to my hardware:

**cocotb:** 
  self.axi_ram = AxiRam(AxiBus.from_prefix(dut, "axi"), dut.clk, dut.rst, size=2**34)
**system verilog:** 
axi_interface axi_ram (
    .clk                        (clk),
    .rst                        (rst),

    .axi_awid                   (c0_ddr4_s_axi_awid),
    .axi_awaddr                 (c0_ddr4_s_axi_awaddr),
    .axi_awlen                  (c0_ddr4_s_axi_awlen),
    .axi_awsize                 (c0_ddr4_s_axi_awsize),
    .axi_awburst                (c0_ddr4_s_axi_awburst),
    .axi_awlock                 (c0_ddr4_s_axi_awlock),
    .axi_awcache                (c0_ddr4_s_axi_awcache),
    .axi_awprot                 (c0_ddr4_s_axi_awprot),
    .axi_awqos                  (c0_ddr4_s_axi_awqos), // not used 
    .axi_awregion               (), // not used
    .axi_awvalid                (c0_ddr4_s_axi_awvalid),
    .axi_awready                (c0_ddr4_s_axi_awready),
    .axi_wdata                  (c0_ddr4_s_axi_wdata),
    .axi_wstrb                  (c0_ddr4_s_axi_wstrb),
    .axi_wlast                  (c0_ddr4_s_axi_wlast),
    .axi_wvalid                 (c0_ddr4_s_axi_wvalid),
    .axi_wready                 (c0_ddr4_s_axi_wready),
    .axi_bid                    (c0_ddr4_s_axi_bid),
    .axi_bresp                  (c0_ddr4_s_axi_bresp),
    .axi_bvalid                 (c0_ddr4_s_axi_bvalid),
    .axi_bready                 (c0_ddr4_s_axi_bready),
    .axi_arid                   (c0_ddr4_s_axi_arid),
    .axi_araddr                 (c0_ddr4_s_axi_araddr),
    .axi_arlen                  (c0_ddr4_s_axi_arlen),
    .axi_arsize                 (c0_ddr4_s_axi_arsize),
    .axi_arburst                (c0_ddr4_s_axi_arburst),
    .axi_arlock                 (c0_ddr4_s_axi_arlock),
    .axi_arcache                (c0_ddr4_s_axi_arcache),
    .axi_arprot                 (c0_ddr4_s_axi_arprot),
    .axi_arqos                  (c0_ddr4_s_axi_arqos), // not used prefetcher_weight_bank_rm_axi_interconnect_axi_arqos
    .axi_arregion               (), // not used
    .axi_arvalid                (c0_ddr4_s_axi_arvalid),
    .axi_arready                (c0_ddr4_s_axi_arready),
    .axi_rid                    (c0_ddr4_s_axi_rid),
    .axi_rdata                  (c0_ddr4_s_axi_rdata),
    .axi_rresp                  (c0_ddr4_s_axi_rresp),
    .axi_rlast                  (c0_ddr4_s_axi_rlast),
    .axi_rvalid                 (c0_ddr4_s_axi_rvalid),
    .axi_rready                 (c0_ddr4_s_axi_rready)
);

May I ask if there is anyway to work around this? Thank you very very much for your help.

@JeffreyWong20 JeffreyWong20 changed the title Writing to / Reading from the AxiRam component from the hardware DUT hands if while loops is added in the test bench Writing to / Reading from the AxiRam component from the hardware DUT hang if while loops is added in the test bench Apr 8, 2024
@JeffreyWong20 JeffreyWong20 changed the title Writing to / Reading from the AxiRam component from the hardware DUT hang if while loops is added in the test bench Writing to / Reading of AxiRam from the hardware DUT hang when adding when loops to the test bench Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant