This repository contains the source code of C++ solutions mined from the Code Submission Evaluation System (CSES).
It contains folders related to 15 different CSES problems, where each folder contains around 100 different solutions (70 C++, 30 Java). The complete list of CSES problems is available in CSES_problems.
The main folder contains 18 subfolders:
- 15 subfolders (one for each CSES problem).
- A subfolder
RAPL
, which contains the code related to the time/energy measurement framework. The running time of a program is measured based on a wall-clock timer. - A subfolder
RAPL-time
, which contains the code related to the time/energy measurement framework. The running time of a program is measured based on a cpu timer. - A subfolder
analysis-script
, which contains Python scripts to analyze the data generated by the time/energy measuring tool.
The code in subfolders RAPL
and RAPL-time
is mostly based on the one provided by the Green Software Lab.
Each folder related to a CSES problem has a Makefile
, a subfolder test
, with input files related to the problem, a c++
folder, with the C++ solutions, and java
folder,
with the Java solutions.
Each c++
folder has the subfolders slow
, fast
, rand
, rand30
and control
, with the following configuration:
slow
: the 10 slowest C++ solutions.fast
: the 10 fastest C++ solutions.rand
: 10 C++ solutions chosen at random.rand30
: 30 C++ solutions (different fromrand
) chosen at random.control
: 10 C++ solutions (different fromrand
andrand30
) chosen at random.
Each java
folder has only the subfolder rand30
, with 30 Java solutions chosen at random.
During our experiments, we considered the following datasets:
SFR C++
: which consists of the C++ solutions in subfoldersslow
,fast
andrand
Rand30 C++
: which consists of the C++ solutions in subfolderrand30
Control
: which consists of the C++ solutions in subfoldercontrol
Rand30 Java
: which consists of the Java solutions in subfolderrand30
In the folder of each CSES problem there are .csv
files related to the energy measurements performed
regarding sections 4.1.1, 4.2 and 4.3. The names of these files have the following structure: PROBLEM_NUMBER-MACHINE-DATASET-TIME_MEASUREMENT.
Below, we present some examples of these files:
- 1621-HPELITE-control-time.csv: data related to CSES problem 1621, where the measurement was performed at machine HPELITE for the control dataset, using a cpu-based timer.
- 1082-HPTHINK-slow-clock.csv: data related to CSES problem 1082, where the measurement was performed at machine HPTHINK for the slow dataset, using a wall-clock timer.
The measurements related to section 4.1.2 are in folder results
, where there
is a subfolder for each machine (elite
, think
and xeon
), and then a
subfolder for each measurement framework (perf
and rapl
).
The names of the files in these folders have the following structure: PROBLEM_NUMBER-LANGUAGE-CORES-MACHINE-DATASET-TIME_MEASUREMENT. Below, we present some examples of these files:
- 1643-Maximum_Subarray_Sum-java-mult-rapl-elite-rand30-24-07-2024-18-11.csv
- 2185-Prime_Multiples-c++-sing-perf-xeon-rand30-26-07-2024-19-22.csv
The files in this repository were compiled in a Linux/Ubuntu environment
using versions of the g++
compiler with support to the C++17 standard.
To start measuring the time/energy of the CSES solutions, the first step is to compile the measurement framework.
You should enter the RAPL
and RAPL-time
folders and type make
.
After this, you should enter a folder related to a CSES problem (e.g., cses-1084_Apartments) and edit the Makefile to select the subfolders with C++ solutions that will be measured.
Then, we should log in as root
(this is necessary for increasing the amount of memory that a
program can use, and for the energy measurements) , and then type ./faztudo
at the command line.
This will compile and run all solutions for the given problem in the selected subfolders against
each test file of the corresponding test
subfolder. By default, we will run each solution ten times against
the corresponding testset.
The energy measurements of the solutions for a given problem will be stored in a .csv file
whose name can be configured by changing the value of the variable PROBLEM
in the
first line of the corresponding outermost Makefile
. For example, if we associate the
name "1084" with variable PROBLEM
, our measurements will be stored at a file 1084.csv
.
Each line of the .csv containg six columns related to the following information:
Name of the executable file , PKG (Joules) , CPU (J) , GPU (J) , DRAM (J) , Time (ms)
RAPL will also report values for the columns PKG and CPU, but the measurements related to GPU and DRAM may not be available in some machines.
To use the analysis script, enter in the folder of a problem and type the following command:
python ../analysis-script/analisacsvs.py <file>.csv
You can also provide multiple .csv files to the analysis script. In this case, you should provide an even number 2 * N of files. The script will consider that the first N files are related to a machine, while the others are related to a different machine, and it will compare the outliers in files 1, 2, ..., N with the outliers in files N+1, N+2, ..., N+N.
The analysis script creates a subfoder analysis_results
where it will store several
auxiliary files generated during the analysis.
The paper Analyzing the Time x Energy Relation in C++ Solutions Mined from a Programming Contest Site, by Sérgio Queiroz de Medeiros, Marcelo Borges Nogueira and Gustavo Quezado, from the 27th Brazilian Symposium on Programming Languages (SBLP'2023), discusses a previous experiment based on the data available in this repository.
You can contact @Sérgio Medeiros and @Marcelo Nogueira about this repository.