Skip to content
shouya edited this page Aug 5, 2024 · 14 revisions

Config syntax

The config file is written in yaml. Here's an example:

auth:
  username: admin
  password: 'secr3tp@ssw0rd'

endpoints:
  - path: /tokio-blog.xml
    note: Full text of Tokio blog
    source: https://tokio.rs/_next/static/feed.xml
    filters:
      - full_text: {}
      - simplify_html: {}

  - path: /hackernews.xml
    note: Full text of Hacker News
    source: https://news.ycombinator.com/rss
    filters:
      - full_text:
          simplify: true
          append_mode: true

Endpoints

The most crucial part of the configuration is the definition of endpoints. Each endpoint corresponds to an RSS feed ready for consumption.

Properties:

  • path (required): The path of the endpoint. The path should start with a forward slash /.
  • note (optional): A note describing the endpoint. Used for display purposes only.
  • on_the_fly_filters (optional): Enable On‐the‐Fly filters. Defaults to false.
  • source (optional): The source URL of the RSS feed.
    • If not specified, the source is dynamic. To use this endpoint, you must include ?source=<url> query in the request. This allows applying the same filters to different feeds.
    • If the source points to an HTML page, rss-funnel will attempt to generate an RSS feed from the page with a single article. You can then use the split filter to divide the single article into multiple articles. See Cookbook: Hacker News Top Links for an example.
  • filters (required): A list of filters to apply to the feed.
    • The feed from the source goes through the filters in the specified order. Each filter corresponds to a transformation on the Feed object.
    • Each filter is specified as a YAML object with the filter's name as the key and its configuration as the value.
    • For example, in the filter definition: - keep_element: .p_mainnew
      • The filter's name is keep_element.
      • The configuration is the string value .p_mainnew. The configuration type depends on the filter.
    • The Feed object from the last filter is returned as the response.
  • client (optional): The configuration for the HTTP client used to fetch the source, such as the user agent. See Client config for details.

Client

Requests need to be made to remote servers in two places:

  • The initial fetch of the source.
  • In the full_text filter.
  • In the merge_feed filter.

You may want to specify certain HTTP configurations for these requests. You can specify these configurations through the client field.

Available fields:

  • timeout (optional, Duration): The timeout for each individual request. You can specify a string value like 20s (supported formats). Defaults to 10 seconds.
  • user_agent (optional, String): The user agent used for fetching the URL. Defaults to rss-funnel/<version>.
  • accept (optional, String): The Accept header value.
  • referer (optional, String): The Referer header value.
  • cookie (optional, String): The value for the Cookie header.
  • assume_content_type (optional, String): Assume the server returns this Content-Type. Useful for feeds whose server returns an incorrect Content-Type that prevents proper parsing.
  • proxy (optional, String): The proxy server to use for requests. The format is http://<host>:<port> or socks5://<host>:<port> (See Proxy in reqwest - Rust).

Source

The source is a configuration data structure that holds a future feed. This configuration is currently used in two places:

  • In the source field inside the endpoint's configuration.
  • In the merge filter, either implicitly as the configuration's value or via a source field. For the merge filter, you can specify multiple sources in an array, and they will be fetched in parallel.

The source can be written in three formats:

  1. An absolute URL to the RSS source, e.g., source: https://example.com/feed.xml.
  2. A relative URL, starting with a forward slash /, that refers to another endpoint on the instance, e.g., source: /another-endpoint.xml?source=https://example.com/feed.xml.
  3. A "from scratch" structure, which is an object with the following string fields. Such a source represents a blank feed created from scratch:
    • format (required, enum: "rss" or "atom")
    • title (required, string)
    • link (optional, string)
    • description (optional, string)
  4. A templated url, which is an object containing the following fields. This allows the endpoint to accept additional query parameters for the placeholders defined below and fill the template accordingly.
    • template (required, a string) The template should contain one or more placeholders in form of ${NAME} where NAME is the name of the placeholder defined below
    • placeholders (object, key: placeholder name, value: placeholder config)
      • default: (optional, string) The default value for the placeholder if unspecified in endpoint request.
      • validation: (optional, string) A regular expression to validate against.

The URL can point to a feed source in rss and atom format. But it can also point to a HTML page, in which case, a feed will be generated with the page's body as the only post, allowing for future customization with filters.

Please note that the source can be omitted in the config to enable dynamic source on this endpoint. In which case, the source must be dynamically specified in the endpoint like /endpoint.xml?source=https://website.com/rss.xml for the endpoint to function.

With a templated source defined, you can request the endpoint like /endpoint.xml?NAME=value to fill the placeholders with specific values. See https://github.com/shouya/rss-funnel/pull/139 for more examples.

Filter

See Filter config.

Authentication

You can specify the authentication info in the configuration file to protect the inspector UI behind a login page. The configuration syntax is as follows:

auth:
  username: admin
  password: hunter2

endpoints: ...

If auth config is not specified, the inspector ui will be public.

Clone this wiki locally