Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail fails to read journal silently #3779

Closed
jfolz opened this issue May 30, 2021 · 34 comments
Closed

Promtail fails to read journal silently #3779

jfolz opened this issue May 30, 2021 · 34 comments
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.

Comments

@jfolz
Copy link
Contributor

jfolz commented May 30, 2021

Describe the bug
I found that one machine was sending logs from /var/logs/*.log, but nothing was coming in from its journal.
journalctl --since today spits lots of messages, but promtail is silent.
Meanwhile on another machine running the same OS, the same kernel, ... it works as expected.
I'm having a hard time figuring out what could possibly be different about these machine that it wouldn't work on one of them.

To Reproduce
Use the official promtail 2.2.1 binary downloaded from here and this config:

server:
  http_listen_port: 9081
  grpc_listen_port: 0

positions:
  filename: positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: journal
    journal:
      json: false
      max_age: 24h
      path: /var/log/journal
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

Run promtail in dry-run mode. It starts without error but no logs are printed:

# ./promtail-linux-amd64 --dry-run --config.file=promtail-journal-config.yml
Clients configured:
----------------------
url: http://localhost:3100/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
backoff_config:
  min_period: 500ms
  max_period: 5m0s
  max_retries: 10
timeout: 10s
tenant_id: ""

level=info ts=2021-05-30T19:16:00.188961827Z caller=server.go:229 http=[::]:9081 grpc=[::]:42151 msg="server listening on addresses"
level=info ts=2021-05-30T19:16:00.189351455Z caller=main.go:112 msg="Starting Promtail" version="(version=2.2.1, branch=HEAD, revision=babea82e)"

Expected behavior
Logs start spewing out like crazy ;)

Environment:

  • Infrastructure: bare-metal
  • Deployment tool: just testing in an ssh session

If it's of any help the affected machine is running Ubuntu 18.04 (NVIDIA DGX-1 Version 4.7.0) and all default for journald.conf.
Some more details:

# uname -a
Linux [unhappy machine] 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# systemctl --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid
@dannykopping
Copy link
Contributor

Hey @jfolz

Could you please try running promtail with -log.level=debug? There may be some useful diagnostic messages in there.
At the same time as running promtail, please run journalctl -f and ensure that new messages are being appended to the journal.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

@dannykopping thanks for the hint. Unfortunately promtail's output seems identical to me. Meanwhile messages are indeed popping up in journalctl -f.

# ./promtail-linux-amd64 --dry-run -log.level=debug --config.file=promtail-journal-config.yml
Clients configured:
----------------------
url: http://localhost:3100/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
backoff_config:
  min_period: 500ms
  max_period: 5m0s
  max_retries: 10
timeout: 10s
tenant_id: ""

level=info ts=2021-05-31T07:14:40.861114122Z caller=server.go:229 http=[::]:9081 grpc=[::]:37731 msg="server listening on addresses"
level=info ts=2021-05-31T07:14:40.861385257Z caller=main.go:112 msg="Starting Promtail" version="(version=2.2.1, branch=HEAD, revision=babea82e)"

@dannykopping
Copy link
Contributor

OK, that's a pity.

Next thing to try:
While promtail is running, please get its PID and then run:

sudo lsof -p <pid> |& grep /var/log/journal
# or sudo lsof -p <pid> 2>&1 | grep /var/log/journal if |& doesn't work in your shell

This will show if the promtail process has any file handles to the journal. If not, then something is not working as expected.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

No open files :(

# ps aux | grep promtail
root     20073  0.3  0.0 2962604 55604 pts/0   Sl+  09:23   0:05 ./promtail-linux-amd64 -log.level=debug -dry-run -config.file=promtail-journal-config.yml
# lsof -p 20073 |& grep /var/log/journal
#

Just to be sure system.journal does indeed exist and is being written to ;)
And I am logged in as root obviously so promtail should be allowed to read.

# ls -lhtr /var/log/journal/[machine id]/system*
-rw-r-----+ 1 root systemd-journal 40M May 31 09:46 /var/log/journal/[machine id]/system.journal

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Here's what the web interface says about the journal target:

Key Value
position

The other machine where it's working shows this:

Key Value
position s=f1314626db71485aa536fe4c994a679a;i=a16b7d;b=905c04389fee4ffe80ab4390164e7087;m=7c49e37d29d;t=5c39ca5804138;x=5419077c89f77291

@dannykopping
Copy link
Contributor

Does your positions file exist, and can promtail write to it?

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

As per config shown above it should put it in the current directory, but there is none. Changing to an absolute path has no effect, as does changing it to the example path /tmp/positions.yaml. The file is never created.

The promtail process running as a service does write to its configured positions.yaml, but there are only references to /var/log/*.log in it.

@dannykopping
Copy link
Contributor

dannykopping commented May 31, 2021

Please provide a little more info on how you're running promtail...
Are you running it as a systemd unit?

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Apart from the setup given in the first post that I used for manual testing (starting the promtail binary from the command line), there's another promtail running as a systemd unit, which is happily pushing everything to Loki as configured - apart from journal logs. I just stopped this service in case it somehow interfered with my manual testing, but there was no difference.

Config:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://[loki machine]:3100/loki/api/v1/push

scrape_configs:
  - job_name: slurm
    static_configs:
      - labels:
          job: slurmlogs
          __path__: /var/log/slurm/*log
    pipeline_stages:
      - match:
          selector: '{job="slurmlogs"}'
          stages:
            - regex:
                expression: '\[(?P<timestamp>[T\d-:.Z]+)\] (?P<message>.+)'
            - timestamp:
                source: 'timestamp'
                format: '2006-01-02T15:04:05.999'
                location: 'Europe/Berlin'
            - output:
                source: 'message'
  - job_name: system
    static_configs:
    - labels:
       job: varlogs
       __path__: /var/log/*.log
    pipeline_stages:
    - match:
        selector: '{job="varlogs"}'
        stages:
        - regex:
            expression: '(?P<prefix>[^A-Za-z]*)(?P<timestamp>[A-Z][a-z]+[ ]?\d{2} \d{2}:\d{2}:\d{2}|\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.?\d{0,3}) (?P<message>.+)'
        - timestamp:
            source: 'timestamp'
            format: 'Jan 2 15:04:05'
            fallback_formats: ['Jan2 15:04:05', '2006-01-02 15:04:05.999', '2006-01-02T15:04:05.999']
            location: 'Europe/Berlin'
        - output:
            source: 'message'
  - job_name: journal
    journal:
      json: false
      max_age: 24h
      path: /var/log/journal
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

Unit file:

[Unit]
Description=Promtail log tailing service
After=network.target

[Service]
Type=simple
StartLimitBurst=5
StartLimitInterval=0
Restart=on-failure
RestartSec=1
User=root
Group=root
PIDFile=/run/promtail/promtail.pid
ExecStart=/usr/local/bin/promtail-linux-amd64 -config.file /etc/loki/promtail-config.yml --client.external-labels="host=%H"
ExecStop=/bin/kill -s SIGTERM $MAINPID

[Install]
WantedBy=multi-user.target

@dannykopping
Copy link
Contributor

OK, thanks for that detail.

I'm guessing there's some permission issue here that's being swallowed up.
Please prefix your promtail command with strace -f -y -o /tmp/debug and grep /tmp/debug for your positions file path and shoot those back to me.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

I think this will be much easier if I just provide the trace ;)
Edit: trace removed

There is only one mention of positions.yaml around here:

34703 <... futex resumed> )             = 0
34707 openat(AT_FDCWD, "/tmp/positions.yaml", O_RDONLY|O_CLOEXEC <unfinished ...>
34703 futex(0xc000133d48, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
34707 <... openat resumed> )            = -1 ENOENT (No such file or directory)
34703 <... futex resumed> )             = 1
34711 <... futex resumed> )             = 0
34703 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
34707 futex(0x7f3b62c580c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
34711 futex(0xc000133d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
34703 <... nanosleep resumed> NULL)     = 0

It also does appear to repeatedly open the journal files, but closes them immediately after.

@dannykopping
Copy link
Contributor

I thought you might want to keep the rest private as strace can leak some system details, but thanks for sending 👍

I assume it's trying to recreate the file each time? Seeing -1 ENOENT (No such file or directory) is not what I'd expect since that positions file should persist between runs. Try touch the file beforehand or use /var/run or something other than /tmp?

@dannykopping
Copy link
Contributor

Looking at the source, I've got another hunch:
Try adding -positions.ignore-invalid-yaml=true to your command and also keep -log.level=debug in place

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Oh well, I did a quick search and couldn't find anything too spicy, but I may remove it later ;)

After touch /var/run/positions.yaml:

38199 futex(0xc000132d48, FUTEX_WAKE_PRIVATE, 1) = 1
38200 <... futex resumed> )             = 0
38199 openat(AT_FDCWD, "/var/run/positions.yaml", O_RDONLY|O_CLOEXEC <unfinished ...>
38200 futex(0x444ce68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
38199 <... openat resumed> )            = 7</run/positions.yaml>
38200 <... futex resumed> )             = 1
38199 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7</run/positions.yaml>, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1679767128, u64=140142167668312}} <unfinished ...>
38197 <... futex resumed> )             = 0
38200 futex(0xc000132d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
38199 <... epoll_ctl resumed> )         = -1 EPERM (Operation not permitted)
38197 futex(0x444ce68, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
38199 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 7</run/positions.yaml>, 0xc000c9e3dc) = -1 EPERM (Operation not permitted)
38199 fstat(7</run/positions.yaml>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
38199 read(7</run/positions.yaml>, "", 512) = 0
38199 close(7</run/positions.yaml>)     = 0
38199 futex(0x444ce68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
38197 <... futex resumed> )             = 0
38199 <... futex resumed> )             = 1
38197 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
38199 futex(0x7f758c3340c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
38197 <... nanosleep resumed> NULL)     = 0

File is empty afterwards.

Note: /var/run is a symlink to /run

Edit: with -positions.ignore-invalid-yaml=true

38817 futex(0xc000180148, FUTEX_WAKE_PRIVATE, 1) = 1
38817 openat(AT_FDCWD, "/var/run/positions.yaml", O_RDONLY|O_CLOEXEC <unfinished ...>
38821 <... futex resumed> )             = 0
38817 <... openat resumed> )            = 7</run/positions.yaml>
38821 futex(0xc000180548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
38817 epoll_ctl(3<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7</run/positions.yaml>, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1345476184, u64=140292157232728}} <unfinished ...>
38821 <... futex resumed> )             = 1
38817 <... epoll_ctl resumed> )         = -1 EPERM (Operation not permitted)
38823 <... futex resumed> )             = 0
38817 epoll_ctl(3<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 7</run/positions.yaml>, 0xc000e543dc <unfinished ...>
38821 futex(0xc000180148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
38817 <... epoll_ctl resumed> )         = -1 EPERM (Operation not permitted)
38823 futex(0xc000180548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
38817 fstat(7</run/positions.yaml>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
38817 read(7</run/positions.yaml>, "", 512) = 0
38817 close(7</run/positions.yaml>)     = 0
38817 futex(0xc000180548, FUTEX_WAKE_PRIVATE, 1) = 1
38823 <... futex resumed> )             = 0
38823 futex(0xc000180148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
38817 futex(0x7f98794680c8, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
38823 <... futex resumed> )             = 1
38817 <... futex resumed> )             = 0
38821 <... futex resumed> )             = 0
38817 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 7</etc/ld.so.cache>
38823 futex(0xc000180548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

@dannykopping
Copy link
Contributor

Looks like the file was created now at least?

You're getting an EPERM from epoll_ctl which I haven't seen before:
https://man7.org/linux/man-pages/man2/epoll_ctl.2.html

Let's try use a non-tmpfs location for the promtail target (/tmp and /run are both tmpfs). Stick it somewhere else and see if things resolve?

If that doesn't work, I may have to defer to one of my colleagues who are more familiar with promtail than I am.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

I did touch /var/run/positions.yaml beforehand. promtail didn't touch it afterwards ;)
/tmp is not a tmpfs in our case, but I made /stuff with read/write permissions for everyone and pointed promtail there:

41243 futex(0xc000132d48, FUTEX_WAKE_PRIVATE, 1) = 1
41244 <... futex resumed> )             = 0
41243 openat(AT_FDCWD, "/stuff/positions.yaml", O_RDONLY|O_CLOEXEC <unfinished ...>
41244 futex(0x444ce68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
41243 <... openat resumed> )            = -1 ENOENT (No such file or directory)
41244 <... futex resumed> )             = 1
41241 <... futex resumed> )             = 0
41244 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
41241 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
41243 futex(0x7f6eb91240c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
41244 <... nanosleep resumed> NULL)     = 0
41241 <... nanosleep resumed> NULL)     = 0

Then with touch /stuff/positions.yaml:

42083 <... futex resumed> )             = 0
42081 openat(AT_FDCWD, "/stuff/positions.yaml", O_RDONLY|O_CLOEXEC <unfinished ...>
42083 futex(0x444ce68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
42081 <... openat resumed> )            = 7</stuff/positions.yaml>
42083 <... futex resumed> )             = 1
42081 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7</stuff/positions.yaml>, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=4057482840, u64=139715048654424}} <unfinished ...>
42079 <... futex resumed> )             = 0
42083 futex(0xc000180148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
42081 <... epoll_ctl resumed> )         = -1 EPERM (Operation not permitted)
42079 futex(0x444ce68, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
42081 epoll_ctl(4<anon_inode:[eventpoll]>, EPOLL_CTL_DEL, 7</stuff/positions.yaml>, 0xc000d343dc) = -1 EPERM (Operation not permitted)
42081 fstat(7</stuff/positions.yaml>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
42081 read(7</stuff/positions.yaml>, "", 512) = 0
42081 close(7</stuff/positions.yaml>)   = 0
42081 futex(0x444ce68, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
42079 <... futex resumed> )             = 0
42081 <... futex resumed> )             = 1
42079 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
42081 futex(0x7f121aec70c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
42079 <... nanosleep resumed> NULL)     = 0

@dannykopping
Copy link
Contributor

Just to double-check:
Same behaviour from promtail as before? Is any content written to the positions file over time? tail -f would be useful here.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Yes, no output from tail -f /stuff/positions.yaml either.

Edit: I wasn't sure what the expected behavior here is, so I removed --dry-run from the command line thinking maybe that prevents it from modifying the file, and it now does create the positions.yaml, though all it writes is this:

positions: {}

@dannykopping
Copy link
Contributor

Ha! Did you have dry-run on this whole time? 😄

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Yes... sorry that wasn't clear. I've been using the same command from the initial report (with things options added along the way). At least that explains the mystery of the untouched positions file. What remains is why promtail looks at the journal, but doesn't read anything from it :/

@dannykopping
Copy link
Contributor

No problem!

Please repeat all the debugging steps (lsof/strace) above and send both the command and its output - hopefully we'll spot something.

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Nothing in lsof (files are probably closed by the time I can run the command), so I ran with strace and grep "positions\|/var/log/journal" /tmp/debug | grep -v "user-" (lots of user journals that don't matter) both on the machine that's not working and one where it is working.

When it's not working:

52130 openat(AT_FDCWD, "/var/log/journal/a3e44098f0c4a3c97af9137d5865ee17/system.journal", O_RDONLY|O_CLOEXEC) = 9</var/log/journal/a3e44098f0c4a3c97af9137d5865ee17/system.journal>
52130 fstat(9</var/log/journal/a3e44098f0c4a3c97af9137d5865ee17/system.journal>, {st_mode=S_IFREG|0640, st_size=41943040, ...}) = 0
52130 mmap(NULL, 4096, PROT_READ, MAP_SHARED, 9</var/log/journal/a3e44098f0c4a3c97af9137d5865ee17/system.journal>, 0) = 0x7f60c5894000
52130 close(9</var/log/journal/a3e44098f0c4a3c97af9137d5865ee17/system.journal>) = 0

VS when it's working:

9626  openat(AT_FDCWD, "/var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal", O_RDONLY|O_CLOEXEC) = 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>
9626  fstat(59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, {st_mode=S_IFREG|0640, st_size=134217728, ...}) = 0
9626  mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, 0) = 0x7f352c000000
9626  fstatfs(59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=115086610, f_bfree=110004376, f_bavail=104152536, f_files=29245440, f_ffree=28882647, f_fsid={val=[3404676125, 2987217199]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
9626  getdents(8</var/log/journal/df22e14b1f83428292fe17f518feaebb>, /* 0 entries */, 32768) = 0
9626  fstatfs(8</var/log/journal/df22e14b1f83428292fe17f518feaebb>, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=115086610, f_bfree=110004365, f_bavail=104152525, f_files=29245440, f_ffree=28882647, f_fsid={val=[3404676125, 2987217199]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
9626  close(8</var/log/journal/df22e14b1f83428292fe17f518feaebb>) = 0
9626  getdents(7</var/log/journal>, /* 0 entries */, 32768) = 0
9626  fstatfs(7</var/log/journal>, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=115086610, f_bfree=110004365, f_bavail=104152525, f_files=29245440, f_ffree=28882647, f_fsid={val=[3404676125, 2987217199]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
9626  close(7</var/log/journal>)        = 0
9626  mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, 0x73b000 <unfinished ...>
9626  mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, 0x1686000 <unfinished ...>
9626  mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, 0x446d000 <unfinished ...>
9626  mmap(NULL, 4403200, PROT_READ, MAP_SHARED, 59</var/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal>, 0x7bcd000 <unfinished ...>
...

So it's looking at system.journal, doesn't like what it sees and closes the file.

@dannykopping
Copy link
Contributor

Are you running promtail as a privileged user in the case that it isn't working?
Have you got debug logs enabled on promtail?

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Yes, all commands are run as root:

# strace -f -y -o /tmp/debug9 ./promtail-linux-amd64 -log.level=debug -positions.ignore-invalid-yaml=true -config.file=promtail-journal-config.yml
level=info ts=2021-05-31T15:48:55.211874752Z caller=server.go:229 http=[::]:9081 grpc=[::]:45495 msg="server listening on addresses"
level=info ts=2021-05-31T15:48:55.212319591Z caller=main.go:112 msg="Starting Promtail" version="(version=2.2.1, branch=HEAD, revision=babea82e)"
^Clevel=info ts=2021-05-31T15:49:44.217038243Z caller=signals.go:55 msg="=== received SIGINT/SIGTERM ===\n*** exiting"
level=debug ts=2021-05-31T15:49:44.72102977Z caller=positions.go:147 msg="positions saved"

@dannykopping
Copy link
Contributor

Thanks @jfolz. It's looking fine to me.

I suspect this issue is down to a quirk of your local system since, from your statements above, you already have a working instance running as a systemd unit.

I'd recommend creating another systemd unit for this second instance and see if that works.
Please let me know either way, but I think we can close this issue. Thoughts?

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Sorry for taking up so much of your time, but I have to clarify that compared to the original report nothing has changed really.
Promtail is working just fine on several machines, but on this one specific machine (that should be identical to others, but apparently isn't) it doesn't read from the journal.

@dannykopping
Copy link
Contributor

Just so I understand correctly (not being passive-aggressive here): you already have a working promtail running in a systemd unit on this same machine that you're having trouble with, correct?

@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

No we did not, but we do now, finally.

After staring at the strace of the working machine for a long time I figured it out. Promtail is looking for libsystemd-journal.so.0.10.2 all over the place. When it doesn't find it it loads libsystemd.so.0 instead. On the broken machine it found libsystemd-journal.so.0.10.2 immediately and that was its downfall. dpkg says this file belongs to systemd version 204, while 237 is what is actually running. Probably a leftover from an earlier OS upgrade. My best guess is that reading the journal fails silently due to this version mismatch.

Anyway, thanks again for the big help. In the end the strace did the trick ;)

My only suggestion would be some more debug logging around the journal scraper. Right now it looks like it never outputs anything even when working as intended.

@jfolz jfolz closed this as completed May 31, 2021
@dannykopping
Copy link
Contributor

That's awesome, great find @jfolz! 🎉

I'll keep this issue open and modify it slightly to reflect your findings.
We appreciate your patience and willingness to get to the bottom of this!

@dannykopping dannykopping reopened this May 31, 2021
@dannykopping dannykopping added the keepalive An issue or PR that will be kept alive and never marked as stale. label May 31, 2021
@dannykopping dannykopping changed the title Promtail does not read from journal - but only on one specific machine Promtail fails to read journal silently using systemd v204 May 31, 2021
@dannykopping
Copy link
Contributor

@jfolz one more thing: can I ask you to please send the output of apt-cache showpkg libsystemd-dev?

@dannykopping dannykopping changed the title Promtail fails to read journal silently using systemd v204 Promtail fails to read journal silently May 31, 2021
@jfolz
Copy link
Contributor Author

jfolz commented May 31, 2021

Sure thing:

# apt-cache showpkg libsystemd-dev
Package: libsystemd-dev
Versions:
237-3ubuntu10.45 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-amd64_Packages)
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language: en
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_i18n_Translation-en
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22

237-3ubuntu10.38 (/var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-amd64_Packages)
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language: en
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_i18n_Translation-en
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22

237-3ubuntu10 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-amd64_Packages)
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language: en
                 File: /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_bionic-security_main_i18n_Translation-en
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-amd64_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22
 Description Language:
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic-updates_main_binary-i386_Packages
                  MD5: 7122bf9ab5ca8703093117a2cda0fe22


Reverse Depends:
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10.45
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10.45
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10
  xrdp-pulseaudio-installer,libsystemd-dev
  golang-github-coreos-go-systemd-dev,libsystemd-dev
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10.38
  libblockdev-dm-dev,libsystemd-dev
  libsystemd-dev:i386,libsystemd-dev 237-3ubuntu10.38
Dependencies:
237-3ubuntu10.45 - libsystemd0 (5 237-3ubuntu10.45) libsystemd-dev:i386 (35 237-3ubuntu10.45) libsystemd-dev:i386 (38 237-3ubuntu10.45)
237-3ubuntu10.38 - libsystemd0 (5 237-3ubuntu10.38) libsystemd-dev:i386 (35 237-3ubuntu10.38) libsystemd-dev:i386 (38 237-3ubuntu10.38)
237-3ubuntu10 - libsystemd0 (5 237-3ubuntu10) libsystemd-dev:i386 (35 237-3ubuntu10) libsystemd-dev:i386 (38 237-3ubuntu10)
Provides:
237-3ubuntu10.45 -
237-3ubuntu10.38 -
237-3ubuntu10 -
Reverse Provides:

@tonyplovich
Copy link

tonyplovich commented Jul 1, 2021

Hello,

I'm seeing this, or a similar issue, on RHEL8. In my case, Promtail's journal scraping silently fails when run it as a non-privileged user.

When I run it as root, it initially sends journald logs, but then no other logs are sent until I restart the process.

RHEL 8.3 (Ootpa)

 # ls -l /lib64/ | grep systemd
libnss_systemd.so.2
libsystemd.so.0 -> libsystemd.so.0.23.0
libsystemd.so.0.23.0

# rpm -qa | grep systemd
systemd-239-41.el8_3.2.x86_64
systemd-libs-239-41.el8_3.2.x86_64
rpm-plugin-systemd-inhibit-4.14.3-4.el8.x86_64
systemd-pam-239-41.el8_3.2.x86_64
systemd-udev-239-41.el8_3.2.x86_64

@owen-d
Copy link
Member

owen-d commented Jan 13, 2022

Wow, this was a read! I'm not sure what we can do about this issue, but we can keep an eye open for potential cases where we swallow related errors and try to make them more visible.

@fzyzcjy
Copy link

fzyzcjy commented Jul 12, 2023

Hi, I also see promtail not sending journal logs at all (at #9922). I am using promtail inside docker (indeed inside Kubernetes DaemonSet pod), so I guess there is no such version problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.
Projects
None yet
Development

No branches or pull requests

5 participants