Skip to content

Commit

Permalink
[file-format] add enable_text_validate_utf8 doc (#1960)
Browse files Browse the repository at this point in the history
  • Loading branch information
morningman authored Feb 5, 2025
1 parent f083014 commit c8abb3c
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 0 deletions.
18 changes: 18 additions & 0 deletions docs/lakehouse/file-formats/text.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,21 @@ This document introduces the support for reading and writing text file formats i

Import functionality supports JSON formats. See the import documentation for details.

## Character Set

Currently, Doris only supports the UTF-8 character set encoding. However, some data, such as the data in Hive Text-formatted tables, may contain content encoded in non-UTF-8 encoding, which will cause reading failures and result in the following error:

```text
Only support csv data in utf8 codec
```

In this case, you can set the session variable as follows:

```text
SET enable_text_validate_utf8 = false
```

This will ignore the UTF-8 encoding check, allowing you to read this content. Note that this parameter is only used to skip the check, and non-UTF-8 encoded content will still be displayed as garbled text.

This parameter has been supported since version 3.0.4.

Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,20 @@ under the License.

导入功能支持的 JSON 格式,详见导入相关文档。

## 字符集

Doris 目前仅支持 UTF-8 编码的字符集。而某些数据,如 Hive Text 格式表中的数据会包含非 UFT-8 编码的内容,会导致读取失败,并报错:

```text
Only support csv data in utf8 codec
```

此时,可以通过设置会话变量:

```sql
SET enable_text_validate_utf8 = false
```

来忽略 UFT-8 编码检查,以便能够读取这些内容。注意,这个参数仅用于忽略检查,非 UTF-8 编码的内容仍会显示为乱码。

此参数自 3.0.4 版本支持。

0 comments on commit c8abb3c

Please sign in to comment.