Skip to content

Commit

Permalink
Merge pull request #131 from dart-lang/merge-characters-package
Browse files Browse the repository at this point in the history
Merge `package:characters`
  • Loading branch information
mosuem authored Oct 16, 2024
2 parents 0237f43 + 94061ca commit 279afbc
Show file tree
Hide file tree
Showing 46 changed files with 22,904 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
- changed-files:
- any-glob-to-any-file: 'pkgs/async/**'

"package:characters":
- changed-files:
- any-glob-to-any-file: 'pkgs/characters/**'

"package:convert":
- changed-files:
- any-glob-to-any-file: 'pkgs/convert/**'
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/characters.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: package:characters

on:
# Run CI on pushes to the main branch, and on PRs against main.
push:
branches: [ main ]
paths:
- '.github/workflows/characters.yaml'
- 'pkgs/characters/**'
pull_request:
branches: [ main ]
paths:
- '.github/workflows/characters.yaml'
- 'pkgs/characters/**'
schedule:
- cron: "0 0 * * 0"
env:
PUB_ENVIRONMENT: bot.github

defaults:
run:
working-directory: pkgs/characters/

jobs:
# Check code formatting and static analysis on a single OS (linux)
# against dev, stable, and 2.19.0 (the package's lower bound).
analyze:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
sdk: [dev, stable, 3.4]
steps:
- uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938
- uses: dart-lang/setup-dart@0a8a0fc875eb934c15d08629302413c671d3f672
with:
sdk: ${{ matrix.sdk }}
- id: install
name: Install dependencies
run: dart pub get
- name: Check formatting
run: dart format --output=none --set-exit-if-changed .
if: matrix.sdk == 'dev' && steps.install.outcome == 'success'
- name: Analyze code
run: dart analyze --fatal-infos
if: always() && steps.install.outcome == 'success'

# Run tests on a matrix consisting of two dimensions:
# 1. OS: ubuntu-latest
# 2. Release channel: dev, stable, and 2.19.0 (the package's lower bound)
test:
needs: analyze
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
sdk: [dev, stable, 3.4]
steps:
- uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938
- uses: dart-lang/setup-dart@0a8a0fc875eb934c15d08629302413c671d3f672
with:
sdk: ${{ matrix.sdk }}
- id: install
name: Install dependencies
run: dart pub get
- name: Run VM tests
run: dart test --platform vm
if: always() && steps.install.outcome == 'success'
- name: Run Chrome tests
run: dart test --platform chrome
if: always() && steps.install.outcome == 'success'
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This repository is home to various Dart packages under the [dart.dev](https://pu
|---|---|---|
| [args](pkgs/args/) | Library for defining parsers for parsing raw command-line arguments into a set of options and values. | [![pub package](https://img.shields.io/pub/v/args.svg)](https://pub.dev/packages/args) |
| [async](pkgs/async/) | Utility functions and classes related to the 'dart:async' library.| [![pub package](https://img.shields.io/pub/v/async.svg)](https://pub.dev/packages/async) |
| [characters](pkgs/characters/) | String replacement with operations that are Unicode/grapheme cluster aware. | [![pub package](https://img.shields.io/pub/v/characters.svg)](https://pub.dev/packages/characters) |
| [convert](pkgs/convert/) | Utilities for converting between data representations. | [![pub package](https://img.shields.io/pub/v/convert.svg)](https://pub.dev/packages/convert) |
| [crypto](pkgs/crypto/) | Implementations of SHA, MD5, and HMAC cryptographic functions. | [![pub package](https://img.shields.io/pub/v/crypto.svg)](https://pub.dev/packages/crypto) |
| [fixnum](pkgs/fixnum/) | Library for 32- and 64-bit signed fixed-width integers. | [![pub package](https://img.shields.io/pub/v/fixnum.svg)](https://pub.dev/packages/fixnum) |
Expand Down
4 changes: 4 additions & 0 deletions pkgs/characters/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.dart_tool/
.packages
pubspec.lock
doc/api/
6 changes: 6 additions & 0 deletions pkgs/characters/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Below is a list of people and organizations that have contributed
# to the Dart project. Names should be added to the list like so:
#
# Name/Organization <email address>

Google LLC
67 changes: 67 additions & 0 deletions pkgs/characters/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
## 1.3.1

* Fixed README rendering on pub.dev and API docs.
* Require Dart `^3.4.0`.
* Move to `dart-lang/core` monorepo.

## 1.3.0

* Updated to use Unicode 15.0.0.

## 1.2.1

* Update the value of the pubspec `repository` field.

## 1.2.0

* Fix `Characters.where` which unnecessarily did the iteration and test twice.
* Adds `Characters.empty` constant and makes `Characters("")` return it.
* Changes the argument type of `Characters.contains` to (covariant) `String`.
The implementation still accepts `Object?`, so it can be cast to
`Iterable<Object?>`, but you get warned if you try to call directly with a
non-`String`.

## 1.1.0

* Stable release for null safety.
* Added `stringBeforeLength` and `stringAfterLength` to `CharacterRange`.
* Added `CharacterRange.at` constructor.
* Added `getRange(start, end)` and `characterAt(pos)` to `Characters`
as alternative to `.take(end).skip(start)` and `getRange(pos, pos + 1)`.
* Change some positional parameter names from `other` to `characters`.

## 1.0.0

* Core APIs deemed stable; package version set to 1.0.0.
* Added `split` methods on `Characters` and `CharacterRange`.

## 0.5.0

* Change [codeUnits] getter to [utf16CodeUnits] which returns an iterable.
This avoids leaking that the underlying string has efficient UTF-16
code unit access in the API, and allows the same interface to be
just as efficiently implemented on top of UTF-8.

## 0.4.0

* Added an extension method on `String` to allow easy access to the `Characters`
of the string:

```dart
print('The first character is: ' + myString.characters.first)
```

* Updated Dart SDK dependency to Dart 2.6.0

## 0.3.1

* Added small example in `example/main.dart`
* Enabled pedantic lints and updated code to resolve issues.

## 0.3.0

* Updated API which does not expose the underlying string indices.

## 0.1.0

* Initial release
27 changes: 27 additions & 0 deletions pkgs/characters/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Copyright 2019, the Dart project authors.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of Google LLC nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
133 changes: 133 additions & 0 deletions pkgs/characters/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
[![Build Status](https://github.com/dart-lang/core/actions/workflows/characters.yaml/badge.svg)](https://github.com/dart-lang/core/actions/workflows/characters.yaml)
[![pub package](https://img.shields.io/pub/v/characters.svg)](https://pub.dev/packages/characters)
[![package publisher](https://img.shields.io/pub/publisher/characters.svg)](https://pub.dev/packages/characters/publisher)

[`Characters`][Characters] are strings viewed as
sequences of **user-perceived character**s,
also known as [Unicode (extended) grapheme clusters][Grapheme Clusters].

The [`Characters`][Characters] class allows access to
the individual characters of a string,
and a way to navigate back and forth between them
using a [`CharacterRange`][CharacterRange].

## Unicode characters and representations

There is no such thing as plain text.

Computers only know numbers,
so any "text" on a computer is represented by numbers,
which are again stored as bytes in memory.

The meaning of those bytes are provided by layers of interpretation,
building up to the *glyph*s that the computer displays on the screen.

| Abstraction | Dart Type | Usage | Example |
| --------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Bytes | [`ByteBuffer`][ByteBuffer],<br />[`Uint8List`][Uint8List] | Physical layout: Memory or network communication. | `file.readAsBytesSync()` |
| [Code units][] | [`Uint8List`][Uint8List] (UTF&#x2011;8)<br />[`Uint16List`][Uint16List], [`String`][String] (UTF&#x2011;16) | Standard formats for<br /> encoding code points in memory.<br />Stored in memory using one (UTF&#x2011;8) or more (UTF&#x2011;16) bytes. One or more code units encode a code point. | `string.codeUnits`<br />`string.codeUnitAt(index)`<br />`utf8.encode(string)` |
| [Code points][] | [`Runes`][Runes] | The Unicode unit of meaning. | `string.runes` |
| [Grapheme Clusters][] | [`Characters`][Characters] | Human perceived character. One or more code points. | `string.characters` |
| [Glyphs][] | | Visual rendering of grapheme clusters. | `print(string)` |

A Dart `String` is a sequence of UTF-16 code units,
just like strings in JavaScript and Java.
The runtime system decides on the underlying physical representation.

That makes plain strings inadequate
when needing to manipulate the text that a user is viewing, or entering,
because string operations are not working at the grapheme cluster level.

For example, to abbreviate a text to, say, the 15 first characters or glyphs,
a string like "A 🇬🇧 text in English"
should abbreviate to "A 🇬🇧 text in Eng&mldr; when counting characters,
but will become "A 🇬🇧 text in &mldr;"
if counting code units using [`String`][String] operations.

Whenever you need to manipulate strings at the character level,
you should be using the [`Characters`][Characters] type,
not the methods of the [`String`][String] class.

## The Characters class

The [`Characters`][Characters] class exposes a string
as a sequence of grapheme clusters.
All operations on [`Characters`][Characters] operate
on entire grapheme clusters,
so it removes the risk of splitting combined characters or emojis
that are inherent in the code-unit based [`String`][String] operations.

You can get a [`Characters`][Characters] object for a string using either
the constructor [`Characters(string)`][Characters constructor]
or the extension getter `string.characters`.

At its core, the class is an [`Iterable<String>`][Iterable]
where the element strings are single grapheme clusters.
This allows sequential access to the individual grapheme clusters
of the original string.

On top of that, there are operations mirroring the operations
of [`String`][String] that are not index, code-unit or code-point based,
like [`startsWith`][Characters.startsWith]
or [`replaceAll`][Characters.replaceAll].
There are some differences between these and the [`String`][String] operations.
For example the replace methods only accept characters as pattern.
Regular expressions are not grapheme cluster aware,
so they cannot be used safely on a sequence of characters.

Grapheme clusters have varying length in the underlying representation,
so operations on a [`Characters`][Characters] sequence cannot be index based.
Instead, the [`CharacterRange`][CharacterRange] *iterator*
provided by [`Characters.iterator`][Characters.iterator]
has been greatly enhanced.
It can move both forwards and backwards,
and it can span a *range* of grapheme cluster.
Most operations that can be performed on a full [`Characters`][Characters]
can also be performed on the grapheme clusters
in the range of a [`CharacterRange`][CharacterRange].
The range can be contracted, expanded or moved in various ways,
not restricted to using [`moveNext`][CharacterRange.moveNext],
to move to the next grapheme cluster.

Example:

```dart
// Using String indices.
String? firstTagString(String source) {
var start = source.indexOf('<') + 1;
if (start > 0) {
var end = source.indexOf('>', start);
if (end >= 0) {
return source.substring(start, end);
}
}
return null;
}
// Using CharacterRange operations.
Characters? firstTagCharacters(Characters source) {
var range = source.findFirst('<'.characters);
if (range != null && range.moveUntil('>'.characters)) {
return range.currentCharacters;
}
return null;
}
```

[ByteBuffer]: https://api.dart.dev/dart-typed_data/ByteBuffer-class.html "ByteBuffer class"
[CharacterRange.moveNext]: https://pub.dev/documentation/characters/latest/characters/CharacterRange/moveNext.html "CharacterRange.moveNext"
[CharacterRange]: https://pub.dev/documentation/characters/latest/characters/CharacterRange-class.html "CharacterRange class"
[Characters constructor]: https://pub.dev/documentation/characters/latest/characters/Characters/Characters.html "Characters constructor"
[Characters.iterator]: https://pub.dev/documentation/characters/latest/characters/Characters/iterator.html "CharactersRange get iterator"
[Characters.replaceAll]: https://pub.dev/documentation/characters/latest/characters/Characters/replaceAll.html "Characters.replaceAlle"
[Characters.startsWith]: https://pub.dev/documentation/characters/latest/characters/Characters/startsWith.html "Characters.startsWith"
[Characters]: https://pub.dev/documentation/characters/latest/characters/Characters-class.html "Characters class"
[Code Points]: https://unicode.org/glossary/#code_point "Unicode Code Point"
[Code Units]: https://unicode.org/glossary/#code_unit "Unicode Code Units"
[Glyphs]: https://unicode.org/glossary/#glyph "Unicode Glyphs"
[Grapheme Clusters]: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries "Unicode (Extended) Grapheme Cluster"
[Iterable]: https://api.dart.dev/dart-core/Iterable-class.html "Iterable class"
[Runes]: https://api.dart.dev/dart-core/Runes-class.html "Runes class"
[String]: https://api.dart.dev/dart-core/String-class.html "String class"
[Uint16List]: https://api.dart.dev/dart-typed_data/Uint16List-class.html "Uint16List class"
[Uint8List]: https://api.dart.dev/dart-typed_data/Uint8List-class.html "Uint8List class"
5 changes: 5 additions & 0 deletions pkgs/characters/analysis_options.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include: package:dart_flutter_team_lints/analysis_options.yaml

analyzer:
errors:
prefer_single_quotes: ignore
Loading

0 comments on commit 279afbc

Please sign in to comment.