Merge pull request 'Implement YouTube back-end' (#12) from 5-add-youtube-backend into main

Add support for creating podcast feeds of YouTube channels and playlists.

* Add the YouTube back-end
* Update the documentation
* Use the MIME DB to determine the download URL file extensions

Reviewed-on: #12
This commit is contained in:
Paul van Tilburg 2022-12-24 13:19:51 +01:00
commit bec7fa850c
9 changed files with 900 additions and 211 deletions

656
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -12,6 +12,7 @@ async-trait = "0.1.57"
cached = { version = "0.39.0", features = ["async"] }
chrono = { version = "0.4.19", features = ["serde"] }
enum_dispatch = "0.3.8"
mime-db = "1.6.0"
reqwest = { version = "0.11.10", features = ["json"] }
rocket = { version = "0.5.0-rc.2", features = ["json"] }
rocket_dyn_templates = { version = "0.1.0-rc.2", features = ["tera"] }
@ -19,6 +20,7 @@ rss = "2.0.1"
thiserror = "1.0.31"
url = { version = "2.2.2", features = ["serde"] }
youtube_dl = { version = "0.7.0", features = ["tokio"] }
ytextract = "0.11.1"
[package.metadata.deb]
maintainer = "Paul van Tilburg <paul@luon.net>"
@ -29,7 +31,8 @@ Podbringer is a web service that provides podcasts for services that don't
offer them (anymore). It provides a way to get the RSS feed for your podcast
client and it facilites the downloads of the pods (enclosures).
It currently only supports [Mixcloud](https://mixcloud.com).
It currently only supports [Mixcloud](https://www.mixcloud.com) and
[YouTube](https://www.youtube.com).
Other back-ends might be added in the future.
"""
section = "net"

View File

@ -4,7 +4,8 @@ Podbringer is a web service that provides podcasts for services that don't
offer them (anymore). It provides a way to get the RSS feed for your podcast
client and it facilites the downloads of the pods (enclosures).
It currently only supports [Mixcloud](https://mixcloud.com).
It currently only supports [Mixcloud](https://www.mixcloud.com) and
[YouTube](https://www.youtube.com).
Other back-ends might be added in the future.
## Building & running
@ -25,8 +26,8 @@ builds when you don't add `--release`.)
### Configuration
For now, you will need to provide Rocket with configuration to tell it at which
public URL Podbringer is hosted. This needs to be done even if you are not using a
reverse proxy, in which case you need to provide it with the proxied URL. You
public URL Podbringer is hosted. This needs to be done even if you are not using
a reverse proxy, in which case you need to provide it with the proxied URL. You
can also use the configuration to configure a different address and/or port.
Just create a `Rocket.toml` file that contains (or copy `Rocket.toml.example`):
@ -44,17 +45,17 @@ configuration, see: <https://rocket.rs/v0.5-rc/guide/configuration/>.
Podbringer currently has no front-end or web interface yet that can help you
use it. Until then, you just have to enter the right service-specific RSS feed
URL in your favorite podcast client to start using it.
Given the Mixcloud URL <https://www.mixcloud.com/myfavouriteband/>, the URL you
need to use for Podbringer is comprised of the following parts:
URL in your favorite podcast client to start using it. For example:
```text
https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
|------------------------------| |-------||--------------|
The Podbringer public URL Service User @ service
|------------------------------| |------| |-------------|
The Podbringer public URL Service Service ID
```
So, the URL consists of the location of Podbringer, the fact that you want the feed,
the name of the service and the ID that identifies something list on that service.
### Feed item limit
To prevent feeds with a very large number of items, any feed that is returned
@ -62,7 +63,43 @@ contains at most 50 items by default. If you want to have more (or less) items,
provide the limit in the URL by setting the `limit` parameter.
For example, to get up until 1000 items the URL becomes:
`https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband?limit=1000`
```text
https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband?limit=1000`
```
### Service: Mixcloud
For Mixcloud, a feed can be constructed of everything that a user posted.
Given the Mixcloud URL like <https://www.mixcloud.com/myfavouriteband/>, the
`myfavouriteband` part of the URL is the Mixcloud username and can be used as
the service ID.
```text
https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
|------------------------------| |------| |-------------|
The Podbringer public URL Service Username
```
### Service: YouTube
For YouTube, a feed can either be constructed of a channel or a playlist.
Given the YouTube channel URL like <https://www.youtube.com/c/favouritechannel>,
the `favouritechannel` part of the URL is the YouTube channel ID.
Given the YouTube playlist URL
<https://www.youtube.com/playlist?list=PLsomeplaylistidentifier>, the
`PLsomeplaylistidentifier` part of the URL is the YouTube playlist ID.
Either the channel or playlist ID can be used as the service ID.
```text
https://my.domain.tld/podbringer/feed/youtube/favouritechannel
|------------------------------| |-----| |--------------|
The Podbringer public URL Service Channel ID
https://my.domain.tld/podbringer/feed/youtube/PLsomeplaylistidentifier
|------------------------------| |-----| |----------------------|
The Podbringer public URL Service Playlist ID
```
## License

View File

@ -15,19 +15,25 @@ use reqwest::Url;
use crate::{Error, Result};
pub(crate) mod mixcloud;
pub(crate) mod youtube;
/// Retrieves the back-end for the provided ID (if supported).
pub(crate) fn get(backend: &str) -> Result<Backends> {
match backend {
"mixcloud" => Ok(Backends::Mixcloud(mixcloud::backend())),
"youtube" => Ok(Backends::YouTube(youtube::backend())),
_ => Err(Error::UnsupportedBackend(backend.to_string())),
}
}
/// The support back-ends.
/// The supported back-ends.
#[enum_dispatch(Backend)]
pub(crate) enum Backends {
/// Mixcloud (<https://www.mixcloud.com>)
Mixcloud(mixcloud::Backend),
/// YouTube (<https://www.youtube.com>)
YouTube(youtube::Backend),
}
/// Functionality of a content back-end.

View File

@ -199,7 +199,8 @@ impl From<UserWithCloudcasts> for Channel {
impl From<Cloudcast> for Item {
fn from(cloudcast: Cloudcast) -> Self {
let mut file = PathBuf::from(cloudcast.key.trim_end_matches('/'));
file.set_extension("m4a"); // FIXME: Don't hardcoded the extension!
let extension = mime_db::extension(DEFAULT_FILE_TYPE).expect("MIME type has extension");
file.set_extension(extension);
// FIXME: Don't hardcode the description!
let description = Some(format!("Taken from Mixcloud: {0}", cloudcast.url));

342
src/backends/youtube.rs Normal file
View File

@ -0,0 +1,342 @@
//! The YouTube back-end.
//!
//! It uses the `ytextract` crate to retrieve the feed (channel or playlist) and items (videos).
use std::path::{Path, PathBuf};
use async_trait::async_trait;
use cached::proc_macro::cached;
use chrono::{DateTime, Utc};
use reqwest::Url;
use rocket::futures::StreamExt;
use ytextract::playlist::video::{Error as YouTubeVideoError, Video as YouTubePlaylistVideo};
use ytextract::{
Channel as YouTubeChannel, Client, Playlist as YouTubePlaylist, Stream as YouTubeStream,
Video as YouTubeVideo,
};
use super::{Channel, Enclosure, Item};
use crate::{Error, Result};
/// The base URL for YouTube channels.
const CHANNEL_BASE_URL: &str = "https://www.youtube.com/channel";
/// The default item limit.
const DEFAULT_ITEM_LIMIT: usize = 50;
/// The base URL for YouTube playlists.
const PLAYLIST_BASE_URL: &str = "https://www.youtube.com/channel";
/// The base URL for YouTube videos.
const VIDEO_BASE_URL: &str = "https://www.youtube.com/watch";
/// Creates a YouTube back-end.
pub(crate) fn backend() -> Backend {
Backend::new()
}
/// The YouTube back-end.
pub struct Backend {
/// The client capable of interacting with YouTube.
client: Client,
}
impl Backend {
/// Creates a new YouTube back-end.
fn new() -> Self {
let client = Client::new();
Self { client }
}
}
#[async_trait]
impl super::Backend for Backend {
fn name(&self) -> &'static str {
"YouTube"
}
async fn channel(&self, channel_id: &str, item_limit: Option<usize>) -> Result<Channel> {
// We assume it is a YouTube playlist ID if the channel ID starts with
// "PL"/"OLAK"/"RDCLAK"; it is considered to be a YouTube channel ID otherwise.
if channel_id.starts_with("PL")
|| channel_id.starts_with("OLAK")
|| channel_id.starts_with("RDCLAK")
{
let (yt_playlist, yt_videos_w_streams) =
fetch_playlist_videos(&self.client, channel_id, item_limit).await?;
Ok(Channel::from(YouTubePlaylistWithVideos(
yt_playlist,
yt_videos_w_streams,
)))
} else {
let (yt_channel, yt_videos_w_streams) =
fetch_channel_videos(&self.client, channel_id, item_limit).await?;
Ok(Channel::from(YouTubeChannelWithVideos(
yt_channel,
yt_videos_w_streams,
)))
}
}
async fn redirect_url(&self, file: &Path) -> Result<String> {
let id_part = file.with_extension("");
let video_id = id_part.to_string_lossy();
retrieve_redirect_url(&self.client, &video_id).await
}
}
/// A YouTube playlist with its videos.
#[derive(Clone, Debug)]
pub(crate) struct YouTubePlaylistWithVideos(YouTubePlaylist, Vec<YouTubeVideoWithStream>);
/// A YouTube channel with its videos.
#[derive(Clone, Debug)]
pub(crate) struct YouTubeChannelWithVideos(YouTubeChannel, Vec<YouTubeVideoWithStream>);
/// A YouTube video with its stream.
#[derive(Clone, Debug)]
struct YouTubeVideoWithStream {
/// The information of the YouTube video.
video: YouTubeVideo,
/// The metadata of the selected YouTube stream.
stream: YouTubeStream,
/// The content of the selected YouTube stream.
content_length: u64,
}
impl From<YouTubeChannelWithVideos> for Channel {
fn from(
YouTubeChannelWithVideos(yt_channel, yt_videos_w_streams): YouTubeChannelWithVideos,
) -> Self {
let mut link = Url::parse(CHANNEL_BASE_URL).expect("valid URL");
let title = format!("{0} (via YouTube)", yt_channel.name());
let description = yt_channel.description().to_string();
link.path_segments_mut()
.expect("valid URL")
.push(&yt_channel.id());
let author = Some(yt_channel.name().to_string());
// FIXME: Don't hardcode the category!
let categories = Vec::from([String::from("Channel")]);
let image = yt_channel
.avatar()
.max_by_key(|av| av.width * av.height)
.map(|av| av.url.clone());
let items = yt_videos_w_streams.into_iter().map(Item::from).collect();
Channel {
title,
link,
description,
author,
categories,
image,
items,
}
}
}
impl From<YouTubePlaylistWithVideos> for Channel {
fn from(
YouTubePlaylistWithVideos(yt_playlist, yt_videos_w_streams): YouTubePlaylistWithVideos,
) -> Self {
let title = format!("{0} (via YouTube)", yt_playlist.title());
let mut link = Url::parse(PLAYLIST_BASE_URL).expect("valid URL");
let description = yt_playlist.description().to_string();
link.query_pairs_mut()
.append_pair("list", &yt_playlist.id().to_string());
let author = yt_playlist.channel().map(|chan| chan.name().to_string());
// FIXME: Don't hardcode the category!
let categories = Vec::from([String::from("Playlist")]);
let image = yt_playlist
.thumbnails()
.iter()
.max_by_key(|tn| tn.width * tn.height)
.map(|tn| tn.url.clone());
let items = yt_videos_w_streams.into_iter().map(Item::from).collect();
Channel {
title,
link,
description,
author,
categories,
image,
items,
}
}
}
impl From<YouTubeVideoWithStream> for Item {
fn from(
YouTubeVideoWithStream {
video,
stream,
content_length: length,
}: YouTubeVideoWithStream,
) -> Self {
let id = video.id().to_string();
let mime_type = stream.mime_type().to_string();
// Ignore everything from MIME type parameter seperator on for extension look-up.
let mime_sep = mime_type.find(';').unwrap_or(mime_type.len());
let extension = mime_db::extension(&mime_type[..mime_sep]).unwrap_or_default();
let file = PathBuf::from(&id).with_extension(extension);
let enclosure = Enclosure {
file,
mime_type,
length,
};
let mut link = Url::parse(VIDEO_BASE_URL).expect("valid URL");
link.query_pairs_mut().append_pair("v", &id);
let video_description = video.description();
let description = Some(format!("{video_description}\n\nTaken from YouTube: {link}"));
let categories = video
.hashtags()
.filter(|hashtag| !hashtag.trim().is_empty())
.map(|hashtag| {
let url = Url::parse(&format!(
"https://www.youtube.com/hashtag/{}",
hashtag.trim_start_matches('#')
))
.expect("valid URL");
(hashtag.to_string(), url)
})
.collect();
let duration = Some(video.duration().as_secs() as u32);
let keywords = video.keywords().clone();
let image = video
.thumbnails()
.iter()
.max_by_key(|tn| tn.width * tn.height)
.map(|tn| tn.url.clone());
let timestamp = video
.date()
.and_hms_opt(12, 0, 0)
.expect("Invalid hour, minute and/or second");
let updated_at = DateTime::from_utc(timestamp, Utc);
Item {
title: video.title().to_string(),
link,
description,
categories,
enclosure,
duration,
guid: id,
keywords,
image,
updated_at,
}
}
}
/// Fetches the YouTube playlist videos for the given ID.
///
/// If the result is [`Ok`], the playlist will be cached for 24 hours for the given playlist ID.
#[cached(
key = "(String, Option<usize>)",
convert = r#"{ (playlist_id.to_owned(), item_limit) }"#,
time = 86400,
result = true
)]
async fn fetch_playlist_videos(
client: &Client,
playlist_id: &str,
item_limit: Option<usize>,
) -> Result<(YouTubePlaylist, Vec<YouTubeVideoWithStream>)> {
let id = playlist_id.parse()?;
let limit = item_limit.unwrap_or(DEFAULT_ITEM_LIMIT);
let yt_playlist = client.playlist(id).await?;
let yt_videos_w_streams = yt_playlist
.videos()
.filter_map(fetch_stream)
.take(limit)
.collect()
.await;
Ok((yt_playlist, yt_videos_w_streams))
}
/// Fetches the YouTube channel videos for the given ID.
#[cached(
key = "(String, Option<usize>)",
convert = r#"{ (channel_id.to_owned(), item_limit) }"#,
time = 86400,
result = true
)]
async fn fetch_channel_videos(
client: &Client,
channel_id: &str,
item_limit: Option<usize>,
) -> Result<(YouTubeChannel, Vec<YouTubeVideoWithStream>)> {
let id = channel_id.parse()?;
let limit = item_limit.unwrap_or(DEFAULT_ITEM_LIMIT);
let yt_channel = client.channel(id).await?;
let yt_videos_w_streams = yt_channel
.uploads()
.await?
.filter_map(fetch_stream)
.take(limit)
.collect()
.await;
Ok((yt_channel, yt_videos_w_streams))
}
/// Fetches the stream and relevant metadata for a YouTube video result.
///
/// If there is a error retrieving the metadata, the video is discarded/ignored.
/// If there are problems retrieving the streams or metadata, the video is also discarded.
async fn fetch_stream(
yt_video: Result<YouTubePlaylistVideo, YouTubeVideoError>,
) -> Option<YouTubeVideoWithStream> {
match yt_video {
Ok(video) => {
let video = video.upgrade().await.ok()?;
let stream = video
.streams()
.await
.ok()?
.filter(|v| v.is_audio())
.max_by_key(|v| v.bitrate())?;
let content_length = stream.content_length().await.ok()?;
Some(YouTubeVideoWithStream {
video,
stream,
content_length,
})
}
Err(_) => None,
}
}
/// Retrieves the redirect URL for the provided YouTube video ID.
///
/// If the result is [`Ok`], the redirect URL will be cached for 24 hours for the given video ID.
#[cached(
key = "String",
convert = r#"{ video_id.to_owned() }"#,
time = 86400,
result = true
)]
async fn retrieve_redirect_url(client: &Client, video_id: &str) -> Result<String> {
let video_id = video_id.parse()?;
let video = client.video(video_id).await?;
let stream = video
.streams()
.await?
.filter(|v| v.is_audio())
.max_by_key(|v| v.bitrate())
.ok_or(Error::NoRedirectUrlFound)?;
Ok(stream.url().to_string())
}

View File

@ -28,7 +28,9 @@ pub(crate) fn construct(backend_id: &str, config: &Config, channel: Channel) ->
.unwrap_or_default(),
)
.build();
let mut last_build = DateTime::<Utc>::from_utc(NaiveDateTime::from_timestamp(0, 0), Utc);
let unix_timestamp = NaiveDateTime::from_timestamp_opt(0, 0)
.expect("Out-of-range seconds or invalid nanoseconds");
let mut last_build = DateTime::from_utc(unix_timestamp, Utc);
let generator = String::from(concat!(
env!("CARGO_PKG_NAME"),
" ",

View File

@ -54,6 +54,26 @@ pub(crate) enum Error {
/// An error occurred in youtube-dl.
#[error("Youtube-dl failed: {0}")]
YoutubeDl(#[from] youtube_dl::Error),
/// An YouTube extract error occured.
#[error("YouTube extract error: {0}")]
YtExtract(#[from] ytextract::Error),
/// An YouTube extract ID parsing error occured.
#[error("YouTube extract ID parsing error: {0}")]
YtExtractId0(#[from] ytextract::error::Id<0>),
/// An YouTube extract ID parsing error occured.
#[error("YouTube extract ID parsing error: {0}")]
YtExtractId11(#[from] ytextract::error::Id<11>),
/// An YouTube extract ID parsing error occured.
#[error("YouTube extract ID parsing error: {0}")]
YtExtractId24(#[from] ytextract::error::Id<24>),
/// An YouTube extract playlist video error occured.
#[error("YouTube extract playlist video error: {0}")]
YtExtractPlaylistVideo(#[from] ytextract::playlist::video::Error),
}
impl<'r, 'o: 'r> rocket::response::Responder<'r, 'o> for Error {

View File

@ -5,15 +5,21 @@
URL in your favorite podcast client to start using it.
</p>
<p>
Given the Mixcloud URL <https://www.mixcloud.com/myfavouriteband/>, the URL you
need to use for Podbringer is comprised of the following parts:
The URL you need to use for Podbringer is comprised of the following parts:
<pre>
https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
|------------------------------| |-------||--------------|
The Podbringer public URL Service User @ service
|------------------------------| |------| |-------------|
The Podbringer public URL Service Service ID
</pre>
</p>
<p>
The Podbringer location URL of this instance is: {{ url }}
Supported services are:
<ul>
<li>Mixcloud (service ID is Mixcloud username)</li>
<li>YouTube (service ID is YouTube channel or playlist ID)</li>
</ul>
</p>
<p>
The Podbringer location URL of this instance is: <a href="{{ url }}">{{ url }}</a>.
</p>