# Partitioning FPGA-Optimized Systolic Arrays We provide a fast optimization algorithm and a step-to-step guide on how to generate the dataset for a specific board and topologies to be used by our optimization tool. > *Long Chung Chan, Gurshaant Singh Malik and Nachiket Kapre* > [**"Partitioning FPGA-Optimized Systolic Arrays for Fun and Profit"**](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/paper/PID6211513.pdf) > 2019 International Conference on Field-Programmable Technology ## Demo The following demos use pre-generated datasets and topologies that can be found in: - [topologies](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/topologies/) contains all the topologies descriping their respective CNN structures - [data_source](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/data_source/) contains all the cycle-accurate data generated using [SCALE sim](https://github.com/ARM-software/SCALE-Sim) The instruction below will do a sweep run on each of the following networks: - FasterRCNN - Mobilenet - Yolo tiny - Googlenet - Alexnet - AlphaGoZero - NCF_rec - Resnet_50_v1 To obtain individual optimization result for a specific network and a specfic number of partition, please refer to the section below. To get optimization result with: 1. Covariance Matrix Adaptation Evolution Strategy (CMA-es) ```bash cd optimization_algo/scripts ./sweep_nets_cma.sh ``` 2. Genetic Algorithm (GA) ```bash cd optimization_algo/scripts ./sweep_nets_ga.sh ``` 3. Hyperparameter Optimiztion ```bash cd optimization_algo/scripts ./sweep_nets_ho.sh ``` 4. Brute Force ```bash cd optimization_algo/scripts ./sweep_nets_brute.sh ``` Result of the optimization will be added to the corresponding csv file under this [folder](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/resulting_csv). ## Step-by-step detail guide ### 1. Custom topologies SCALE-sim requires a `.csv` file containing the following attribut for each layer in the network: 1. Layer name 2. IFMAP Height 3. IFMAP Width 4. Filter Height 5. Filter Width 6. Channels 7. Number of Filters 8. Strides Examples can be found under [topologies](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/topologies/). If you have problem figuring out the correct topology file of a specific network, you can check out the [Netscope CNN Analyzer](https://dgschwend.github.io/netscope/quickstart.html). ### 2. Custom hardware model SCALE-sim also requires a config file containing the description of your hardware model. The config file used to generate all the data in the paper is [`US_sim.cfg`](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/scaleSim/configs/US_sim.cfg). Please refer to the [SCALE sim](https://github.com/ARM-software/SCALE-Sim) for more detail on how to create your own topology file. ### 3. Generate data source using SCALE-sim A small modification is done on SCALE-sim to: 1. Enable multi-processing to obtain a faster generation speed on the dataset 2. Sweep through every layer with increment resource For this reason, we have create another bash script so you don't have to worry about running SCALE-sim by yourself. For example, to obtain the dataset for `US_sim.cfg` with `Alexnet`: ```bash cd scaleSim ./generate_data_set.sh configs/US_sim.cfg ../topologies/960_DNN/Alexnet.csv ``` The default value for the number of processes in parallel is `6`. This can be changed in line 257 of [scale.py](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/scaleSim/scale.py). However, SCALE-sim creates temporary csv files for caching purpose, please be careful on adjusting the number to avoid `DiskOutOfSpace` error. ```python ... pool = Pool(processes = 6) # RIGHT HERE !!! for pro in pool.imap_unordered(self.run_mp_once, all_arr_dim_list): self.run_name = net_name + "_" + self.dataflow + "_" + str(pro[0]) + "x" + str(pro[1]) self.cleanup(pro) pool.close() ... ``` After all the data are generated, all the data are spread into different files under the `outputs` directory. Run the following script to repack them into one csv file: ```bash cd scaleSim ./generate_final_csv.sh Alexnet ../optimization_algo/data_source/alexnet_mem_bound.csv 10 ``` Here are the assumption for the file name: 1. The csv file containing cycle accurate data generated from SCALE-sim: `{topology name}_mem_bound.csv` 2. The csv file containing the topology of the CNN: `{topology name}.csv` ### 4. Running script targeting specific approach Under the `optimization_algo/scripts` directory 1. Covariance Matrix Adaptation Evolution Strategy (CMA-es) !! Please uncomment the line 349 - 364 in `cma_approach.py` to see the output ```bash # python3 ../approaches/cma_approach.py ${network name} ${number of partitions} ${population size} ${resource unit available} ${strategy} ${optimization target} python3 ../approaches/cma_approach.py alexnet 3 100 960 allzeros DRAM_cycle ``` Result Screenshoot:  2. Genetic Algorithm (GA) ```bash # python3 ../approaches/ga_approach.py ${network name} ${number of partitions} ${elite population size} ${population size} ${resource unit available} ${optimization target} python3 ../approaches/ga_approach.py alexnet 3 10 100 960 DRAM_cycle ``` 3. Hyperparameter Optimiztion ```bash # python3 ../approaches/hyper_parameter_ga.py ${network name} ${number of partitions} ${resource unit available} ${target} ${max iteration} python3 ../approaches/hyper_parameter_ga.py alexnet 3 960 DRAM_cycle 2500 ``` 4. Brute Force ```bash # python3 ../approaches/brute_force_approach.py ${network name} ${number of partitions} ${resource unit available} ${target} python3 ../approaches/brute_force_approach.py alexnet 3 960 DRAM_cycle ``` Result of the optimization are also added to the corresponding csv file under this [folder](https://git.uwaterloo.ca/watcag-public/fpga-syspart/blob/master/optimization_algo/resulting_csv). ## Repo Breakdown ## License This tool is distributed under MIT license. Copyright (c) 2019 Long Chung Chan, Gurshaant Singh Malik, Nachiket Kapre