Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluentbit not picking up file #6077

Closed
tracyliuzw opened this issue Sep 21, 2022 · 10 comments
Closed

fluentbit not picking up file #6077

tracyliuzw opened this issue Sep 21, 2022 · 10 comments
Labels
question Stale waiting-for-user Waiting for more information, tests or requested changes Windows Bugs and requests about Windows platforms

Comments

@tracyliuzw
Copy link

1.When I collect data, I use the Flentdbit + Fluentd architecture, and the data is finally written to BigQuery. Fluentbit uses the tail plug-in, and the directory files I collect will generate a large number of log files in the unit of hours every hour, each time when the file is written to BigQuery Wait for an anomaly, feel field dislocation


2.Is there a limit on the length of the event or record?
The data I collected using the Tail plug-in was about 300 bytes long, but the collection failed

[2022/09/21 02:42:05] [ info] [fluent bit] version=1.9.8, commit=97a5e9dcf3, pid=2052
[2022/09/21 02:42:05] [debug] [engine] coroutine stack size: 98302 bytes (96.0K)
[2022/09/21 02:42:05] [ info] [storage] version=1.2.0, type=memory+filesystem, sync=normal, checksum=disabled, max_chunks_up=128
[2022/09/21 02:42:05] [ info] [storage] backlog input plugin: storage_backlog.1
[2022/09/21 02:42:05] [ info] [cmetrics] version=0.3.6
[2022/09/21 02:42:05] [debug] [tail:tail.0] created event channels: read=440 write=584
[2022/09/21 02:42:05] [debug] [input:tail:tail.0] flb_tail_fs_stat_init() initializing stat tail input
[2022/09/21 02:42:05] [debug] [input:tail:tail.0] inode=1125899906846935 with offset=1203 appended as D:\log\log\battle_report.2022091300.log
[2022/09/21 02:42:05] [debug] [input:tail:tail.0] 1 new files found on path 'D:\log\log\battle_report.2022091300.log'
[2022/09/21 02:42:05] [debug] [storage_backlog:storage_backlog.1] created event channels: read=624 write=628
[2022/09/21 02:42:05] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 15.3M
[2022/09/21 02:42:05] [debug] [emitter:re_emitted] created event channels: read=632 write=636
[2022/09/21 02:42:05] [debug] [stdout:stdout.0] created event channels: read=644 write=648
[2022/09/21 02:42:05] [ info] [sp] stream processor started
[2022/09/21 02:42:05] [ info] [output:stdout:stdout.0] worker #0 started
[2022/09/21 02:42:05] [debug] [input:tail:tail.0] inode=1125899906846935 file=D:\log\log\battle_report.2022091300.log promote to TAIL_EVENT
[2022/09/21 02:42:05] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2022/09/21 02:42:15] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log'
[2022/09/21 02:42:25] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log'
[2022/09/21 02:42:35] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log'
[2022/09/21 02:42:45] [debug] [input:tail:tail.0] 0 new files found on path 'D:\log\log\battle_report.2022091300.log'

@patrick-stephens
Copy link
Contributor

What is the actual problem here for 1.? I don't really understand the issue from what is written so please can you provide some reproducer of the issue or more details.

For 2, tail functions as per tail -f so it will only pick up new entries in a file since it started tailing - you need to configure read_from_head on if you want to read existing data already present in a file when the file is opened.
If you have configured a db setting then this will persist what data it has already sent: you do not want duplicate log entries in most cases. Have you done this?
https://docs.fluentbit.io/manual/pipeline/inputs/tail
Please provide your full configuration (as per the issue template) so we can see what options you are using.
I'm guessing this is on Windows but also check file permissions - the user Fluent Bit runs as needs to be able to read the file otherwise it cannot see it.

@patrick-stephens patrick-stephens added question waiting-for-user Waiting for more information, tests or requested changes Windows Bugs and requests about Windows platforms and removed status: waiting-for-triage labels Sep 21, 2022
@patrick-stephens patrick-stephens changed the title fluentbit fluentbit not picking up file Sep 21, 2022
@tracyliuzw
Copy link
Author

My business scenario: About 50 files are generated per hour in the tail directory. Data is occasionally lost in the first minute of each hour.
Can you give me some advice on how to configure the tail directory when there are many files in it
My configuration is as follows:

[SERVICE]
flush 5
daemon Off
log_level warn
Log_File /var/log/td-agent-bit/td-agent-bit.log
parsers_file parsers.conf
Decode_Field_as escaped_utf8 log
storage.path /var/log/td-agent-bit/storage/
storage.sync normal
storage.checksum Off
storage.max_chunks_up 103
storage.backlog.mem_limit 16M
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 5
[INPUT]
Name tail
Path /data/fjserver/log/gamelog/..log
Tag tp
Key tp
Path_Key filename
Buffer_Chunk_Size 24m
Buffer_Max_Size 32m
Refresh_Interval 10
Rotate_Wait 5
Ignore_Older 2h
Read_from_Head false
storage.type filesystem
storage.pause_on_chunks_overlimit on
Skip_Empty_Lines On
Skip_Long_Lines On
DB /var/log/td-agent-bit/db/tp.db
DB.sync normal
DB.journal_mode WAL
Mem_Buf_Limit 48m
Exit_On_Eof false
Inotify_Watcher true
[INPUT]
Name tail
Path /data/fjserver/log/serverlog/*.log
Db /var/log/td-agent-bit/db/tpserver.db
Tag tp.serverlog
Key tp
Mem_Buf_Limit 48m
Buffer_Chunk_Size 24m
Buffer_Max_Size 32m
Refresh_Interval 10
Rotate_Wait 5
Ignore_Older 2h
Read_from_Head false
storage.type filesystem
storage.pause_on_chunks_overlimit on
Skip_Empty_Lines On
Skip_Long_Lines On
Exit_On_Eof false
Inotify_Watcher true
[FILTER]
Name rewrite_tag
Match tp
Rule $filename /(\S+)/([a-zA-Z]+.\d{6})(\d{4}).log $TAG.$2 false
Emitter_Name re_emitted
Emitter_Mem_Buf_Limit 10M
Emitter_Storage.type filesystem
[FILTER]
Name record_modifier
Match .
Record hostname ${HOSTNAME}
[OUTPUT]
Name Forward
Match .
Upstream upstream.conf
Require_ack_response true
Send_options False
Compress gzip
Workers 2
storage.total_limit_size 48M

@tracyliuzw
Copy link
Author

fluentd configuration file :
<filter tp.vip.*>
@type parser
key_name tp
reserve_data true
remove_key_name_field true
replace_invalid_sequence false
emit_invalid_record_to_error true

@type csv
##common##
types eventtype:string,eventtime:string,iggid:integer,f4:string,f5:string,f6:string,f7:string,f8:string
null_empty_string true
estimate_current_event true
##csv##
keys eventtype,eventtime,iggid,f4,f5,f6,f7,f8
delimiter "\t"
parser_type normal

<match tp.vip.*>

@type file
path /var/log/td-agent/buffer/tp/vip
timekey_use_utc true
chunk_limit_size 256MB
#chunk_limit_records 3000
total_limit_size 512MB
chunk_full_threshold 0.5
queued_chunks_limit_size 20
flush_at_shutdown true
flush_mode interval
flush_interval 5s
flush_thread_interval 1
flush_thread_count 1
flush_thread_burst_interval 1
delayed_commit_timeout 60
overflow_action block
retry_type exponential_backoff
retry_timeout 24h
retry_forever true
retry_max_times 20
retry_wait 2

@type bigquery_insert
auth_method json_key
json_key /etc/td-agent/style-saga-aa24aba4a06d.json
project style-saga
dataset gamedata
table vip_${yyyymm}
fetch_schema_table vip
fetch_schema true
auto_create_table true
ignore_unknown_values true
schema_cache_expire 600
allow_retry_insert_errors true
request_timeout_sec 120
request_open_timeout_sec 120
skip_invalid_rows true

@tracyliuzw
Copy link
Author

When data is finally written to BigQuery, an error log is reported:
2022-09-20 22:00:24 -0500 [trace]: #5 enqueueing all chunks in buffer instance=2220
2022-09-20 22:00:24 -0500 [trace]: #13 enqueueing all chunks in buffer instance=4100
2022-09-20 22:00:24 -0500 [trace]: #13 enqueueing all chunks in buffer instance=5480
2022-09-20 22:00:24 -0500 [trace]: #3 enqueueing all chunks in buffer instance=5420
2022-09-20 22:00:24 -0500 [trace]: #12 enqueueing all chunks in buffer instance=5180
2022-09-20 22:00:24 -0500 [debug]: #12 insert rows project_id="style-saga" dataset="gamedata" table="dressup_202209" count=1
2022-09-20 22:00:24 -0500 [warn]: #12 insert errors project_id="style-saga" dataset="gamedata" table="dressup_202209" insert_errors="[#<Google::Apis::BigqueryV2::InsertAllTableDataResponse::InsertError:0x00007fdf13c905f0 @errors=[#<Google::Apis::BigqueryV2::ErrorProto:0x00007fdf13e8fc98 @debug_info="", @location="eventtime", @message="Invalid datetime string \"8921\"", @Reason="invalid">], @index=0>]"
2022-09-20 22:00:24 -0500 [debug]: #12 taking back chunk for errors. chunk="5e92725249ac4b3c4d09a3c00c42891c"
2022-09-20 22:00:24 -0500 [trace]: #12 taking back a chunk instance=2720 chunk_id="5e92725249ac4b3c4d09a3c00c42891c"
2022-09-20 22:00:24 -0500 [trace]: #12 chunk taken back instance=2720 chunk_id="5e92725249ac4b3c4d09a3c00c42891c" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables={:yyyymm=>"202209"}, seq=0>
2022-09-20 22:00:24 -0500 [error]: #12 Hit limit for retries. dropping all chunks in the buffer queue. retry_times=0 records=1 error_class=Fluent::BigQuery::UnRetryableError error="failed to insert into bigquery(insert errors), and cannot retry"
2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/bigquery/writer.rb:99:in insert_rows' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/out_bigquery_insert.rb:102:in insert'
2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-bigquery-2.3.0/lib/fluent/plugin/out_bigquery_insert.rb:98:in write' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:1180:in try_flush'
2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:1501:in flush_thread_run' 2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin/output.rb:501:in block (2 levels) in start'
2022-09-20 22:00:24 -0500 [error]: #12 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-09-20 22:00:24 -0500 [trace]: #12 clearing queue instance=2720
2022-09-20 22:00:24 -0500 [debug]: #12 buffer queue cleared
2022-09-20 22:00:24 -0500 [trace]: #5 enqueueing all chunks in buffer instance=3020

@tracyliuzw
Copy link
Author

help me ,Give me some advice!!!!!!!!!!!!!!!!!!!!!!!!!!!

@patrick-stephens
Copy link
Contributor

It looks to me like the issue is with Fluentd sending to BigQuery so you probably want to drill down on that and raise in the Fluentd repository for that plugin where there will be expertise on that.

Is there some issue with the Fluent Bit side of things specifically? There is an output plugin already to send to BigQuery from Fluent Bit directly so does that work? https://docs.fluentbit.io/manual/pipeline/outputs/bigquery

The tail inputs you have seem to be ok but I can't really comment as you know the specific log files you have.

I did note one seems to have a strange path, is that right or did you mean a wildcard?

[INPUT]
Name tail
Path /data/fjserver/log/gamelog/..log
Tag tp

Also only the server logs have a DB set so the other will just read from the end (not the beginning) when Fluent Bit starts: only new data added after Fluent Bit is watching the file will be picked up. This is what I mean by tail -f, it functions as that does.

The server logs have a DB so will record which offset they got up to last and start from there:

[INPUT]
Name tail
Path /data/fjserver/log/serverlog/*.log
Db /var/log/td-agent-bit/db/tpserver.db
Tag tp.serverlog

Unrelated but you do not have to provide all configuration options, only the ones required or different from the defaults.
The Slack channel is likely better for discussion around best practice and the various options vs your requrements.

@tracyliuzw
Copy link
Author

thank you your help !!!
sorry,Path /data/fjserver/log/gamelog/..log, I copy wrong, the correct is: the Path /data/fjserver/log/gamelog/..log
In the current two-tier architecture, Fluentbit is mainly put together with the production environment server. For fear of occupying too much production server resources during collection, the data is only processed slightly by forward to FluentD, and then processed by FluentD to Bigquery.
My original data is CSV. Is it possible that the fields are missing during the parsing process? I found that the fields are missing and misplaced in the error log of writing BigQuery, but the BigQuery fails to be written.

@tracyliuzw
Copy link
Author

  • The asterisk cannot be entered??

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Dec 21, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Stale waiting-for-user Waiting for more information, tests or requested changes Windows Bugs and requests about Windows platforms
Projects
None yet
Development

No branches or pull requests

2 participants