Skip to content

Commit

Permalink
vjtop 更好的失败原因输出 #78
Browse files Browse the repository at this point in the history
顺便包含quit时更好的输出
  • Loading branch information
calvin1978 committed Aug 10, 2018
1 parent 19e6b3b commit 9fb38df
Show file tree
Hide file tree
Showing 7 changed files with 91 additions and 93 deletions.
51 changes: 37 additions & 14 deletions vjtop/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
# 1. 概述

若你习惯以Top观察“OS指标及繁忙的进程”,也推荐以VJTop观看 “JVM指标及CPU最繁忙,内存占用最多的线程”。
若你习惯以Top观察“OS指标及繁忙的进程”,也推荐以VJTop观看 “JVM进程指标 及 CPU最繁忙,内存占用最多的线程”。

另一用途是,监控系统发现指标超阈值时,钩子脚本调VJTop纪录事发地状况。
JVM进程信息,一次拉取了JVM在操作系统层面和JVM层面的所有重要指标。

繁忙线程信息, 对比于“先top -H 列出线程,再执行一次jstack拿到全部线程,再手工换算threadId”的繁琐过程,既方便,又可以连续跟踪,更不会因为jstack造成JVM停顿。

因此,VJTop也常用于监控系统发现指标超阈值时,用钩子脚本调VJTop来纪录事发地的状况。

[jvmtop](https://github.com/patric-r/jvmtop) 的基础上二次开发,结合 [SJK](https://github.com/aragozin/jvm-tools)的优点,从/proc , PerfData,JMX等处,以更高的性能,获取更多的信息。

Expand All @@ -16,11 +20,13 @@

[Maven Central 下载](http://repo1.maven.org/maven2/com/vip/vjtools/vjtop/1.0.2/vjtop-1.0.2.zip)

必须与目标JVM使用相同的JDK版本运行,必须与目标JVM使用相同用户运行,或root用户执行 (sudo -E vjmap.sh ...,)
必须与目标JVM使用相同的JDK版本运行,必须与目标JVM使用相同用户运行,或root权限用户sudo执行 (sudo -E vjtop.sh ...,)

如果仍有问题,请看最后的执行问题章节。

```
// 占用CPU最多的线程
./vjtop.sh <PID>
./vjtop.sh <PID>
```

## 2.2 原理:
Expand Down Expand Up @@ -65,16 +71,13 @@
### 2.3.2 输出示例:

```
VJTop 1.0.0 - 11:38:02, UPTIME: 3d01h
PID: 127197, JVM: 1.7.0_79, USER: even.liang
PID: 57789 - 15:37:07, JVM: 1.8.0_144, USER: calvin, UPTIME: 01h07m
PROCESS: 0.99% cpu ( 0.04% of 24 core), 2491m rss, 0m swap
IO: 24k rchar, 1k wchar, 0 read_bytes, 0 write_bytes
THREAD: 97 active, 89 daemon, 99 peak, 461 created, CLASS: 12243 loaded, 0 unloaded
HEAP: 160m/819m eden, 0m/102m sur, 43m/1024m old
NON-HEAP: 55m/256m cms perm gen, 8m/96m codeCache
OFF-HEAP: 0m/0m direct, 0m/0m map
NON-HEAP: 55m/256m cms perm gen, 8m/96m codeCache, 0m/0m direct, 0m/0m map
GC: 0/0ms ygc, 0/0ms fgc, SAFE-POINT: 6 count, 1ms time, 1ms syncTime
THREADS-CPU: 1.01% (user= 0.31%, sys= 0.70%)
TID NAME STATE CPU SYSCPU TOTAL TOLSYS
43 metrics-mercury-metric-logger-1-thread-1 TIMED_WAIT 0.38% 0.28% 25.48% 9.13%
Expand All @@ -88,7 +91,7 @@
70 Proxy-Worker-5-5 RUNNABLE 0.00% 0.00% 0.78% 0.26%
102 Proxy-Worker-5-20 RUNNABLE 0.00% 0.00% 0.80% 0.27%
Note: Only top 10 threads (according cpu load) are shown!
Total cpu: 1.01% (user= 0.31%, sys= 0.70%), top 10 threads are shown, order by CPU
Cost time: 46ms, CPU time: 60ms
```
进程区数据解释:
Expand Down Expand Up @@ -210,9 +213,29 @@
./vjtop.sh -n 20 <PID>
```

# 3. 改进点
# 3. 执行问题排查

首先,运行vjtop的JDK,与目标JDK的版本必须一致

其次,vjtop 使用JVM attach机制 连入PID 并获得JMX的本地连接地址,如果出现如下出错,可能的原因有

```
ERROR: Could not attach to process.
```


1. 执行vjtop的用户,对/tmp/.java_pid$PID 文件有读写权限,该文件权限为srw------- 1,所以需要相同用户或root权限用户 sudo执行。

2. /tmp/.java_pid$PID 文件再首次连接时会生成,但如果生成之后被/tmp 目录的清理程序错误删除,JVM将不再能连入,只能重启应用。

3. 目标JVM使用启动参数-Djava.io.tmpdir,重定向了tmp目录路径

4. 目标JVM使用启动参数-XX:+DisableAttachMechanism禁止了attach


# 4. 改进点

### 3.1 热点线程页:
### 4.1 热点线程页:

* 新功能:线程内存分配速度的展示与排序 (from SJK)
* 新功能:线程SYS CPU的展示与排序,应用启动以来线程的总CPU间的排序 (from SJK)
Expand All @@ -221,12 +244,12 @@
* 新配置项:打印间隔,展示线程数
* 性能优化:减少了几倍的耗时,通过批量获取线程CPU时间(from SJK)等方法

### 3.2 实时交互(since 1.0.1)
### 4.2 实时交互(since 1.0.1)

* 新功能: 选择打印某条线程的stack trace
* 新功能: 实时切换显示模式和排序,刷新频率和显示线程数

### 3.3 为在生产环境运行优化:
### 4.3 为在生产环境运行优化:

* 删除jvmtop会造成应用停顿的Profile页面
* 删除jvmtop获取所有Java进程信息,有着不确定性的Overview页面
Expand Down
12 changes: 8 additions & 4 deletions vjtop/src/main/java/com/vip/vjtools/vjtop/InteractiveTask.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ public void run() {
try {
String command = readLine();
handleCommand(command);
tty.print(" Input command (h for help):");
if (!app.view.shouldExit()) {
tty.print(" Input command (h for help):");
}
} catch (Exception e) {
e.printStackTrace(tty);
}
Expand Down Expand Up @@ -100,11 +102,13 @@ private void changeDisplayMode() {
" Input number of Display Mode(1.cpu, 2.syscpu 3.total cpu 4.total syscpu 5.memory 6.total memory): ");
String mode = readLine();
DetailMode detailMode = DetailMode.parse(mode);
if (detailMode != null && detailMode != app.view.mode) {
if (detailMode == null) {
tty.println(" Wrong option for display mode(1-6)");
} else if (detailMode == app.view.mode) {
tty.println(" Nothing be changed");
} else {
app.view.mode = detailMode;
tty.println(" Display mode changed to " + app.view.mode + " for next flush");
} else {
tty.println(" Nothing be changed");
}
app.continueFlush();
}
Expand Down
28 changes: 18 additions & 10 deletions vjtop/src/main/java/com/vip/vjtools/vjtop/VJTop.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
import java.io.PrintStream;
import java.util.Arrays;

import com.vip.vjtools.vjtop.VMInfo.VMInfoState;

import joptsimple.OptionParser;
import joptsimple.OptionSet;

Expand Down Expand Up @@ -86,6 +88,11 @@ public static void main(String[] args) {
}

VMInfo vminfo = VMInfo.processNewVM(pid);
if (vminfo.state != VMInfoState.ATTACHED) {
System.out.println("\nERROR: Could not attach to process, please find reason in README\n");
return;
}

VMDetailView view = new VMDetailView(vminfo, displayMode, width);

if (optionSet.hasArgument("limit")) {
Expand Down Expand Up @@ -120,13 +127,16 @@ public static void main(String[] args) {
// 5. run views
app.run(view);
} catch (Exception e) {
e.printStackTrace(System.err);
e.printStackTrace(System.out);
System.out.flush();
}
}

private void run(VMDetailView view) throws Exception {
try {
// System.out 设为Buffered,需要使用System.out.flush刷新
System.setOut(new PrintStream(new BufferedOutputStream(new FileOutputStream(FileDescriptor.out)), false));

int iterations = 0;
while (!view.shouldExit()) {

Expand All @@ -143,20 +153,19 @@ private void run(VMDetailView view) throws Exception {
break;
}

// 第一次只等待2秒
int sleepTime = iterations == 0 ? 2 : interval;
// 第一次只等待1秒
int sleepTime = iterations == 0 ? 1 : interval;

++iterations;

Utils.sleep(sleepTime * 1000);
}
} catch (NoClassDefFoundError e) {
e.printStackTrace(System.err);

System.err.println("");
System.err.println("ERROR: Some JDK classes cannot be found.");
System.err.println(" Please check if the JAVA_HOME environment variable has been set to a JDK path.");
System.err.println("");
e.printStackTrace(System.out);
System.out.println("ERROR: Some JDK classes cannot be found.");
System.out.println(" Please check if the JAVA_HOME environment variable has been set to a JDK path.");
System.out.println("");
System.out.flush();
}
}

Expand Down Expand Up @@ -218,7 +227,6 @@ private static void clearTerminal() {
public void exit() {
view.exit();
mainThread.interrupt();
System.err.println(" Quit.");
}

public void preventFlush() {
Expand Down
27 changes: 13 additions & 14 deletions vjtop/src/main/java/com/vip/vjtools/vjtop/VMDetailView.java
Original file line number Diff line number Diff line change
Expand Up @@ -71,44 +71,45 @@ public void printView() throws Exception {
/**
* 打印单条线程的stack strace
*/
public void printStack(long tid) throws NumberFormatException, IOException {
public void printStack(long tid) throws IOException {
ThreadInfo info = vmInfo.getThreadMXBean().getThreadInfo(tid, 20);
if (info == null) {
System.err.println(" TID not exist:" + tid);
return;
}
StackTraceElement[] trace = info.getStackTrace();
System.err.println(" " + info.getThreadId() + ":" + info.getThreadName());
System.out.println(" " + info.getThreadId() + ":" + info.getThreadName());
for (StackTraceElement traceElement : trace) {
System.err.println("\tat " + traceElement);
System.out.println("\tat " + traceElement);
}
System.out.flush();
}

public void printAllThreads() throws Exception {
public void printAllThreads() throws IOException {
long tids[] = vmInfo.getThreadMXBean().getAllThreadIds();
ThreadInfo[] threadInfos = vmInfo.getThreadMXBean().getThreadInfo(tids);
for (ThreadInfo info : threadInfos) {
System.err.println(" " + info.getThreadId() + "\t:" + info.getThreadName());
System.out.println(" " + info.getThreadId() + "\t:" + info.getThreadName());
}
System.out.flush();
}

private boolean checkState() {
if (vmInfo.state == VMInfoState.ATTACHED_UPDATE_ERROR) {
System.err.println("ERROR: Could not fetch telemetries - Process terminated?");
exit();
System.out.println("ERROR: Could not fetch data - Process terminated?");
return false;
}

if (vmInfo.state != VMInfoState.ATTACHED) {
System.err.println("ERROR: Could not attach to process.");
System.out.println("ERROR: Could not attach to process. ");
exit();
return false;
}

return true;
}

private void printJvmInfo() throws Exception {
private void printJvmInfo() {
System.out.printf(" PID: %s - %8tT, JVM: %s, USER: %s, UPTIME: %s%n", vmInfo.pid, new Date(), vmInfo.jvmVersion,
vmInfo.osUser, Utils.toTimeUnit(vmInfo.lastUpTimeMills));

Expand Down Expand Up @@ -153,7 +154,7 @@ private void printJvmInfo() throws Exception {
}
}

private void printTopCpuThreads(DetailMode mode) throws Exception {
private void printTopCpuThreads(DetailMode mode) throws IOException {
if (!vmInfo.threadCpuTimeSupported) {
System.out.printf("%n -Thread CPU telemetries are not available on the monitored jvm/platform-%n");
return;
Expand Down Expand Up @@ -272,10 +273,10 @@ private void printTopCpuThreads(DetailMode mode) throws Exception {
lastThreadSysCpuTotalTimes = threadSysCpuTotalTimes;
}

private void printTopMemoryThreads(DetailMode mode) throws Exception {
private void printTopMemoryThreads(DetailMode mode) throws IOException {

if (!vmInfo.threadMemoryAllocatedSupported) {
System.err.printf(
System.out.printf(
"%n -Thread Memory Allocated telemetries are not available on the monitored jvm/platform-%n");
return;
}
Expand Down Expand Up @@ -343,7 +344,6 @@ private void printTopMemoryThreads(DetailMode mode) throws Exception {
getThreadMemoryUtilization(threadMemoryTotalBytesMap.get(tid), totalBytes));
}


// 打印线程汇总信息,这里因为最后单位是精确到秒,所以bytes除以毫秒以后要乘以1000才是按秒统计
System.out.printf("%n Total memory allocate rate : %5s/s",
Utils.toSizeUnit((totalDeltaBytes * 1000) / vmInfo.deltaUptimeMills));
Expand Down Expand Up @@ -431,7 +431,6 @@ public static DetailMode parse(String mode) {
case "6":
return totalmemory;
default:
System.err.println(" Wrong option for display mode(1-6)");
return null;
}
}
Expand Down
34 changes: 6 additions & 28 deletions vjtop/src/main/java/com/vip/vjtools/vjtop/VMInfo.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,10 @@
import java.io.IOException;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.rmi.ConnectException;
import java.util.Locale;
import java.util.Map;
import java.util.logging.Level;
import java.util.logging.Logger;

import com.sun.management.ThreadMXBean;
import com.sun.tools.attach.AttachNotSupportedException;
import com.vip.vjtools.vjtop.data.PerfData;
import com.vip.vjtools.vjtop.data.ProcFileData;
import com.vip.vjtools.vjtop.data.jmx.JmxClient;
Expand Down Expand Up @@ -128,11 +124,6 @@ public static VMInfo processNewVM(String pid) {
final JmxClient jmxClient = new JmxClient(pid);
jmxClient.connect();

if (!jmxClient.isConnected()) {
Logger.getLogger("vjtop").log(Level.SEVERE, "connection refused (PID=" + pid + ")");
return createDeadVM(pid, VMInfoState.ERROR_DURING_ATTACH);
}

// 注册JMXClient注销的钩子
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
@Override
Expand All @@ -142,21 +133,8 @@ public void run() {
}));

return new VMInfo(jmxClient, pid);
} catch (ConnectException rmiE) {
if (rmiE.getMessage().contains("refused")) {
Logger.getLogger("vjtop").log(Level.SEVERE, "connection refused (PID=" + pid + ")", rmiE);
return createDeadVM(pid, VMInfoState.CONNECTION_REFUSED);
}
rmiE.printStackTrace(System.err);
} catch (IOException e) {
if ((e.getCause() != null && e.getCause() instanceof AttachNotSupportedException)
|| e.getMessage().contains("Permission denied")) {
Logger.getLogger("vjtop").log(Level.SEVERE, "could not attach (PID=" + pid + ")", e);
return createDeadVM(pid, VMInfoState.CONNECTION_REFUSED);
}
e.printStackTrace(System.err);
} catch (Exception e) {
Logger.getLogger("vjtop").log(Level.SEVERE, "could not attach (PID=" + pid + ")", e);
e.printStackTrace(System.out);
}

return createDeadVM(pid, VMInfoState.ERROR_DURING_ATTACH);
Expand Down Expand Up @@ -204,8 +182,7 @@ private void init() throws IOException {
* Updates all jvm metrics to the most recent remote values
*/
public void update() throws Exception {
if (state == VMInfoState.ERROR_DURING_ATTACH || state == VMInfoState.DETACHED
|| state == VMInfoState.CONNECTION_REFUSED) {
if (state == VMInfoState.ERROR_DURING_ATTACH || state == VMInfoState.DETACHED) {
return;
}

Expand Down Expand Up @@ -394,9 +371,10 @@ public ThreadMXBean getThreadMXBean() throws IOException {
}

private void handleFetchDataError(Throwable e) {
Logger.getLogger("vjtop").log(Level.INFO, "error during update", e);
e.printStackTrace(System.out);
updateErrorCount++;
if (updateErrorCount > 10) {

if (updateErrorCount > 3) {
state = VMInfoState.DETACHED;
} else {
state = VMInfoState.ATTACHED_UPDATE_ERROR;
Expand All @@ -423,6 +401,6 @@ private int getJavaMajorVersion() {
}

public enum VMInfoState {
INIT, ERROR_DURING_ATTACH, ATTACHED, ATTACHED_UPDATE_ERROR, DETACHED, CONNECTION_REFUSED
INIT, ERROR_DURING_ATTACH, ATTACHED, ATTACHED_UPDATE_ERROR, DETACHED
}
}
Loading

0 comments on commit 9fb38df

Please sign in to comment.