for the paper Code Cloning in Smart Contracts on the Ethereum Platform: An Extended Replication Study.
This paper is an extended replication of the paper Code cloning in smart contracts: a case study on verified contracts from the Ethereum blockchain platform by M. Kondo, G. Oliva, Z.M. Jiang, A. Hassan, and O. Mizuno. For the replication package of the original study, please, visit https://github.com/SAILResearch/suppmaterial-18-masanari-smart_contract_cloning. To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
/01_data
/clonedata
– Results of the clone analysis by the NiCad extension developed for this study./raw
– Raw results from the analysis./duplicates
– Cleaned data.openzeppelin.zip
– OpenZeppelin data. Requires unzipping into folderopenzeppelin
.
/metadata
– Metadata about the authors, creation date and transactions of the contracts in the corpus./prepared
- Prepared pickle files for data analysis.
/02_prepare
- Scripts for preparing the data in/01_data/prepared
. Contains potentially long-running scripts. In such cases, the approximate execution times are reported in the source files./03_analysis
- Analysis scripts for the automated analysis of data./04_results
- Results of the analyses, including charts and numeric results. Some of these results are discussed in the paper in great detail. Every analysis result corresponds to a particular observation in the paper, clearly identified in the name of the generated observation file.
The following describes four reproduction scenarios. Any of the scenarios can be executed independently from the others.
- Reproduction of the analyses: reproduces the analysis results in
/04_results
, including charts and numeric results. The scripts use the prepared data contained in the/01_data/prepared
folder. - Reproduction of the prepared data: reproduces the prepared data in
/01_data/prepared
by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file. - Reproduction of the cleaned data: reproduces the cleaned data in
/01_data/clonedata/duplicates
from the raw data in/01_data/clonedata/raw
by bringing the contents of the.xml
files into a consolidated form. - Reproduction of the raw data: reproduces the raw data
/01_data/clonedata/raw
by running the NiCad extension developed for this study.
NOTE: The following steps have been tested with python>=3.7 && python<3.10
.
Follow the steps below to reproduce the analysis results in /04_results
, including charts and numeric results. The scripts use the prepared data contained in the /01_data/prepared folder.
- Clone this repository.
- Install dependencies by running
pip install -r requirements.txt
in the root folder. - Extract
/01_data/clonedata/openzeppelin.zip
into folder/01_data/clonedata/openzeppelin
, or runpython 01_unzip.py
in the02_prepare
folder. - Run
python analysis.py
in the/03_analysis
folder.- Run
python analysis.py -o [observationId]
to run the analysis of a specific observation. - Use the
-s
flag to stash the folder of the previous analyses.
- Run
Follow the steps below to reproduce the prepared data in /01_data/prepared
by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
- Run
python 03_mergeMetadata.py
in the/02_prepare
folder. - Run
python 04_prepareAnalysisData.py
in the/02_prepare
folder.- Run
python 04_prepareAnalysisData.py -p [RQ or observation ID]
to prepare data for a specific RQ or observation.
- Run
Some preparation steps can take up to hours to complete. Please find the benchmarked execution times commented in the source code.
Follow the steps below to reproduce the cleaned data in /01_data/clonedata/duplicates
from the raw data in /01_data/clonedata/raw
by bringing the contents of the .xml
files into a consolidated form.
The cleaned data is used in the data preparation scripts. The cleaned data is included in this replication package in folder /01_data/clonedata/duplicates
, but it can be reproduced from the raw data by following the steps below.
- Run
python 02_cleanup.py
in the/02_prepare
folder.
Follow the steps below to reproduce the raw clone data in /01_data/clonedata/raw
by running the NiCad extension developed for this study.
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
A Docker image is maintained on Docker Hub and can be obtained by running: docker pull faizank/nicad6:TSE
.
The following process assumes docker is installed and working correctly, and the image is pulled. You can verify that image by issuing docker images
from the terminal and see that there is an image named faizank/nicad6
available in the list.
NOTE: The following steps have been tested with docker_engine==20.10.17(build==100c701)
- Create a new folder
/systems/source-code
and move the corpus to this folder. - Create a new folder
/output
to store the result of clone analysis. - Execute the analysis by issuing the following command:
docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems faizank/nicad6
. This will generate the output artefacts inside theoutput
folder. - Move the contents of the
/output
folder to/01_data
and use the python scripts discussed above for the rest of the replication.
Should you prefer to build the image from scratch, please, refer to the repository of the NiCad extension developed for this study.
To experiment with the tool, issue docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems -it faizank/nicad6 bash
.