-
Notifications
You must be signed in to change notification settings - Fork 82
CSM
Paul An edited this page Aug 22, 2022
·
1 revision
- 필요한 이유
- DAG의 병합 커밋에서 부모로 돌아가는 것을 추적하는 것은 힘듬
- DAG를 STEM으로 변환하는 것은 복잡도를 낮출 수 있으나 STEM의 수를 줄이지는 못함
- 방식
- 병합 커밋을 단순성을 위해서 단일 노드에 합침
- 컨텍스트를 보존하기 위해서 메시지를 가져옴
- ! 커밋이 여러 CSM 기반의 부모인 경우 가장 왼쪽에 있는 커밋을 기준으로 선택
- Author, Commit Type, Log Message 등을 수집해 필드 끝에 추가
- 병합된 PR의 경우 PR의 추가 정보들 (Pull Number, Message, Content)를 포함
Python
with open("../log/" + repo_name + ".pulls_raw.json", "r", encoding="utf-8") as pulls_json:
raw_pulls = json.load(pulls_json)
pulls_compact_data = []
for item in raw_pulls:
newItem = {}
newItem["number"] = item["number"]
newItem["state"] = item["state"]
newItem["title"] = item["title"]
newItem["body"] = item["body"]
newItem["message"] = item["title"] if item["body"] == None else item["title"] + " " + item["body"]
newItem["merge_commit_sha"] = item["merge_commit_sha"]
newItemHead = {}
newItemHead["sha"] = item["head"]["sha"]
newItem["head"] = newItemHead
newItemBase = {}
newItemBase["sha"] = item["base"]["sha"]
newItem["base"] = newItemBase
newItem["commitsLink"] = item["_links"]["commits"]["href"]
newItem["merged"] = item["merged"]
pulls_compact_data.append(newItem)
with open("../log/"+ repo_name + "." + "pulls_compress.json","w", encoding="utf-8") as info_json:
json.dump(pulls_compact_data, info_json, indent="\t")
JS (porting sample)
return pull_requests.map({number, state, title, body, merge_commit_sha, head, base, _link: {commits: {href}}, merged} => {
number,
state,
title,
body,
message: body ? `${title} ${body}` : title,
merge_commit_sha,
head,
base,
commitsLink: href,
merged,
}
Remark
개발 단계에서 commitsLink를 그대로 사용할지 의사결정 필요 (좋은 단어 선택이 있다면 변경 가능)
Python
with open('./token.txt', "r") as token_file:
access_token = "?access_token=" + token_file.readline()
def add_issue():
origin_file_name = "../log/" + repo_name + ".nlp.json"
with open(origin_file_name) as origin_commit_file:
origin_commits = json.load(origin_commit_file)
for commit in origin_commits:
message = commit["message"]
issue_reg = re.compile("#\d+")
m = issue_reg.findall(message)
related_issues = []
if m:
for issue in m:
related_issues.append(issue[1:])
commit["issues"] = related_issues
return origin_commits
def add_pull(origin_commits):
origin_pull_file_name = "../log/" + repo_name + ".pulls_compress.json"
final_file_name = "../log/" + repo_name + ".nlp.withissue.json"
sha2Index = {}
for (idx, commit) in enumerate(origin_commits):
sha2Index[commit["id"]] = idx
with open(origin_pull_file_name) as pull_info_file:
pulls_info = json.load(pull_info_file)
print("Total pull #: " + str(len(pulls_info)))
for (idx, pull) in enumerate(pulls_info):
link = pull["commitsLink"]
r = requests.get(link + access_token)
if (r.ok):
repoItem = json.loads(r.text or r.content)
for commit_info in repoItem:
try:
index = sha2Index[commit_info["sha"]]
except:
continue
if "pulls" not in origin_commits[index].keys():
origin_commits[index]["pulls"] = [int(pull["number"])]
else:
origin_commits[index]["pulls"].append(int(pull["number"]))
else:
while True:
print("Wait until the api rate restores...[3 minutes]")
time.sleep(180)
remaining_rate = retreive_rate(access_token)
print("Remaining API Rate: " + str(remaining_rate) + " times")
if(remaining_rate > 2000):
break
if idx % 10 == 0:
print("Pull #" + str(idx) + " handled")
final_file = open(final_file_name, "w")
final_file.write(json.dumps(origin_commits, indent=4, separators=(',', ': ')))
add_pull(add_issue())
JS (porting sample)
const regex = new Regex('#\d+');
const commits = origin_commits;
for(const commit of commits) {
const {message} = commit;
const pullRequestMessages = message.filter(m => regex.test(m));
const related_issues = [];
for(const pullRequestMessage of pullRequestMessages) {
related_issues.push(pullRequestMessage.slice(1));
}
commit.issues = related_issues;
}
const add_pull = async (origin_commits) => {
const sha2Index = {};
origin_commits.foreach((commit, index) => {
sha2Index[commit.id] = index;
});
const pullRequests = pulls_compression;
for(const [index, pullRequest] of pullRequests.entries()) {
const response = await axios.get(pullRequest.commitsLink) // TODO : request as octokit
if(response.ok) {
const targetCommit = response.text || response.content;
targetCommit.foreach(info => {
let index;
if(info.sha) {
index = sha2Index[info.sha];
} else {
continue;
}
if(origin_commits[index].keys().has("pulls"))
origin_commits[index]["pulls"].push(+pullRequest["number"]);
else
origin_commits[index]["pulls"] = +pullRequest["number"];
});
} else {
// TODO : retry as octokit
}
}
}
Remark
regex를 사용하는 이유는 #35
와 같은 PR로 구성되어 있는 커밋 메시지를 찾기 위함
개발 단계에서 확인이 필요한 사항
- message가 어떤 Type인지 (Array | Object)일 가능성 높음
- slice(1)을 하는 행위
- axios call의 response Type
- 현재 python코드에서 exception처리되어 있는 부분을 핸들링하기
Squash Merge