Merge pull request 'Implement YouTube back-end' (#12) from 5-add-youtube-backend into main

Add support for creating podcast feeds of YouTube channels and playlists. * Add the YouTube back-end * Update the documentation * Use the MIME DB to determine the download URL file extensions Reviewed-on: #12
2022-12-24 13:19:51 +01:00 · 2022-12-24 13:19:51 +01:00 · bec7fa850c
parent 66452cc96d a6c9275d93
commit bec7fa850c
9 changed files with 900 additions and 211 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@ -12,6 +12,7 @@ async-trait = "0.1.57"
 cached = { version = "0.39.0", features = ["async"] }
 chrono = { version = "0.4.19", features = ["serde"] }
 enum_dispatch = "0.3.8"
+mime-db = "1.6.0"
 reqwest = { version = "0.11.10", features = ["json"] }
 rocket = { version = "0.5.0-rc.2", features = ["json"] }
 rocket_dyn_templates = { version = "0.1.0-rc.2", features = ["tera"] }
@ -19,6 +20,7 @@ rss = "2.0.1"
 thiserror = "1.0.31"
 url = { version = "2.2.2", features = ["serde"] }
 youtube_dl = { version = "0.7.0", features = ["tokio"] }
+ytextract = "0.11.1"

 [package.metadata.deb]
 maintainer = "Paul van Tilburg <paul@luon.net>"
@ -29,7 +31,8 @@ Podbringer is a web service that provides podcasts for services that don't
 offer them (anymore). It provides a way to get the RSS feed for your podcast
 client and it facilites the downloads of the pods (enclosures).

-It currently only supports [Mixcloud](https://mixcloud.com).
+It currently only supports [Mixcloud](https://www.mixcloud.com) and
+[YouTube](https://www.youtube.com).
 Other back-ends might be added in the future.
 """
 section = "net"
--- a/README.md
+++ b/README.md
@ -4,7 +4,8 @@ Podbringer is a web service that provides podcasts for services that don't
 offer them (anymore). It provides a way to get the RSS feed for your podcast
 client and it facilites the downloads of the pods (enclosures).

-It currently only supports [Mixcloud](https://mixcloud.com).
+It currently only supports [Mixcloud](https://www.mixcloud.com) and
+[YouTube](https://www.youtube.com).
 Other back-ends might be added in the future.

 ## Building & running
@ -25,8 +26,8 @@ builds when you don't add `--release`.)
 ### Configuration

 For now, you will need to provide Rocket with configuration to tell it at which
-public URL Podbringer is hosted. This needs to be done even if you are not using a
-reverse proxy, in which case you need to provide it with the proxied URL. You
+public URL Podbringer is hosted. This needs to be done even if you are not using
+a reverse proxy, in which case you need to provide it with the proxied URL. You
 can also use the configuration to configure a different address and/or port.
 Just create a `Rocket.toml` file that contains (or copy `Rocket.toml.example`):

@ -44,17 +45,17 @@ configuration, see: <https://rocket.rs/v0.5-rc/guide/configuration/>.

 Podbringer currently has no front-end or web interface yet that can help you
 use it. Until then, you just have to enter the right service-specific RSS feed
-URL in your favorite podcast client to start using it.
-
-Given the Mixcloud URL <https://www.mixcloud.com/myfavouriteband/>, the URL you
-need to use for Podbringer is comprised of the following parts:
+URL in your favorite podcast client to start using it. For example:

 ```text
  https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
-  |------------------------------|     |-------||--------------|
-   The Podbringer public URL            Service  User @ service
+  |------------------------------|      |------| |-------------|
+   The Podbringer public URL            Service   Service ID
 ```

+So, the URL consists of the location of Podbringer, the fact that you want the feed,
+the name of the service and the ID that identifies something list on that service.
+
 ### Feed item limit

 To prevent feeds with a very large number of items, any feed that is returned
@ -62,7 +63,43 @@ contains at most 50 items by default. If you want to have more (or less) items,
 provide the limit in the URL by setting the `limit` parameter.

 For example, to get up until 1000 items the URL becomes:
-`https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband?limit=1000`
+
+```text
+  https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband?limit=1000`
+```
+
+### Service: Mixcloud
+
+For Mixcloud, a feed can be constructed of everything that a user posted.
+Given the Mixcloud URL like <https://www.mixcloud.com/myfavouriteband/>, the
+`myfavouriteband` part of the URL is the Mixcloud username and can be used as
+the service ID.
+
+```text
+  https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
+  |------------------------------|      |------| |-------------|
+   The Podbringer public URL            Service   Username
+```
+
+### Service: YouTube
+
+For YouTube, a feed can either be constructed of a channel or a playlist.
+Given the YouTube channel URL like <https://www.youtube.com/c/favouritechannel>,
+the `favouritechannel` part of the URL is the YouTube channel ID.
+Given the YouTube playlist URL
+<https://www.youtube.com/playlist?list=PLsomeplaylistidentifier>, the
+`PLsomeplaylistidentifier` part of the URL is the YouTube playlist ID.
+Either the channel or playlist ID can be used as the service ID.
+
+```text
+  https://my.domain.tld/podbringer/feed/youtube/favouritechannel
+  |------------------------------|      |-----| |--------------|
+   The Podbringer public URL            Service  Channel ID
+
+  https://my.domain.tld/podbringer/feed/youtube/PLsomeplaylistidentifier
+  |------------------------------|      |-----| |----------------------|
+   The Podbringer public URL            Service  Playlist ID
+```

 ## License

--- a/src/backends.rs
+++ b/src/backends.rs
@ -15,19 +15,25 @@ use reqwest::Url;
 use crate::{Error, Result};

 pub(crate) mod mixcloud;
+pub(crate) mod youtube;

 /// Retrieves the back-end for the provided ID (if supported).
 pub(crate) fn get(backend: &str) -> Result<Backends> {
    match backend {
        "mixcloud" => Ok(Backends::Mixcloud(mixcloud::backend())),
+        "youtube" => Ok(Backends::YouTube(youtube::backend())),
        _ => Err(Error::UnsupportedBackend(backend.to_string())),
    }
 }

-/// The support back-ends.
+/// The supported back-ends.
 #[enum_dispatch(Backend)]
 pub(crate) enum Backends {
+    /// Mixcloud (<https://www.mixcloud.com>)
    Mixcloud(mixcloud::Backend),
+
+    /// YouTube (<https://www.youtube.com>)
+    YouTube(youtube::Backend),
 }

 /// Functionality of a content back-end.
--- a/src/backends/mixcloud.rs
+++ b/src/backends/mixcloud.rs
@ -199,7 +199,8 @@ impl From<UserWithCloudcasts> for Channel {
 impl From<Cloudcast> for Item {
    fn from(cloudcast: Cloudcast) -> Self {
        let mut file = PathBuf::from(cloudcast.key.trim_end_matches('/'));
-        file.set_extension("m4a"); // FIXME: Don't hardcoded the extension!
+        let extension = mime_db::extension(DEFAULT_FILE_TYPE).expect("MIME type has extension");
+        file.set_extension(extension);

        // FIXME: Don't hardcode the description!
        let description = Some(format!("Taken from Mixcloud: {0}", cloudcast.url));
--- a/src/backends/youtube.rs
+++ b/src/backends/youtube.rs
@ -0,0 +1,342 @@
+//! The YouTube back-end.
+//!
+//! It uses the `ytextract` crate to retrieve the feed (channel or playlist) and items (videos).
+
+use std::path::{Path, PathBuf};
+
+use async_trait::async_trait;
+use cached::proc_macro::cached;
+use chrono::{DateTime, Utc};
+use reqwest::Url;
+use rocket::futures::StreamExt;
+use ytextract::playlist::video::{Error as YouTubeVideoError, Video as YouTubePlaylistVideo};
+use ytextract::{
+    Channel as YouTubeChannel, Client, Playlist as YouTubePlaylist, Stream as YouTubeStream,
+    Video as YouTubeVideo,
+};
+
+use super::{Channel, Enclosure, Item};
+use crate::{Error, Result};
+
+/// The base URL for YouTube channels.
+const CHANNEL_BASE_URL: &str = "https://www.youtube.com/channel";
+
+/// The default item limit.
+const DEFAULT_ITEM_LIMIT: usize = 50;
+
+/// The base URL for YouTube playlists.
+const PLAYLIST_BASE_URL: &str = "https://www.youtube.com/channel";
+
+/// The base URL for YouTube videos.
+const VIDEO_BASE_URL: &str = "https://www.youtube.com/watch";
+
+/// Creates a YouTube back-end.
+pub(crate) fn backend() -> Backend {
+    Backend::new()
+}
+
+/// The YouTube back-end.
+pub struct Backend {
+    /// The client capable of interacting with YouTube.
+    client: Client,
+}
+
+impl Backend {
+    /// Creates a new YouTube back-end.
+    fn new() -> Self {
+        let client = Client::new();
+
+        Self { client }
+    }
+}
+
+#[async_trait]
+impl super::Backend for Backend {
+    fn name(&self) -> &'static str {
+        "YouTube"
+    }
+
+    async fn channel(&self, channel_id: &str, item_limit: Option<usize>) -> Result<Channel> {
+        // We assume it is a YouTube playlist ID if the channel ID starts with
+        // "PL"/"OLAK"/"RDCLAK"; it is considered to be a YouTube channel ID otherwise.
+        if channel_id.starts_with("PL")
+            || channel_id.starts_with("OLAK")
+            || channel_id.starts_with("RDCLAK")
+        {
+            let (yt_playlist, yt_videos_w_streams) =
+                fetch_playlist_videos(&self.client, channel_id, item_limit).await?;
+
+            Ok(Channel::from(YouTubePlaylistWithVideos(
+                yt_playlist,
+                yt_videos_w_streams,
+            )))
+        } else {
+            let (yt_channel, yt_videos_w_streams) =
+                fetch_channel_videos(&self.client, channel_id, item_limit).await?;
+
+            Ok(Channel::from(YouTubeChannelWithVideos(
+                yt_channel,
+                yt_videos_w_streams,
+            )))
+        }
+    }
+
+    async fn redirect_url(&self, file: &Path) -> Result<String> {
+        let id_part = file.with_extension("");
+        let video_id = id_part.to_string_lossy();
+
+        retrieve_redirect_url(&self.client, &video_id).await
+    }
+}
+
+/// A YouTube playlist with its videos.
+#[derive(Clone, Debug)]
+pub(crate) struct YouTubePlaylistWithVideos(YouTubePlaylist, Vec<YouTubeVideoWithStream>);
+
+/// A YouTube channel with its videos.
+#[derive(Clone, Debug)]
+pub(crate) struct YouTubeChannelWithVideos(YouTubeChannel, Vec<YouTubeVideoWithStream>);
+
+/// A YouTube video with its stream.
+#[derive(Clone, Debug)]
+struct YouTubeVideoWithStream {
+    /// The information of the YouTube video.
+    video: YouTubeVideo,
+
+    /// The metadata of the selected YouTube stream.
+    stream: YouTubeStream,
+
+    /// The content of the selected YouTube stream.
+    content_length: u64,
+}
+
+impl From<YouTubeChannelWithVideos> for Channel {
+    fn from(
+        YouTubeChannelWithVideos(yt_channel, yt_videos_w_streams): YouTubeChannelWithVideos,
+    ) -> Self {
+        let mut link = Url::parse(CHANNEL_BASE_URL).expect("valid URL");
+        let title = format!("{0} (via YouTube)", yt_channel.name());
+        let description = yt_channel.description().to_string();
+        link.path_segments_mut()
+            .expect("valid URL")
+            .push(&yt_channel.id());
+        let author = Some(yt_channel.name().to_string());
+        // FIXME: Don't hardcode the category!
+        let categories = Vec::from([String::from("Channel")]);
+        let image = yt_channel
+            .avatar()
+            .max_by_key(|av| av.width * av.height)
+            .map(|av| av.url.clone());
+        let items = yt_videos_w_streams.into_iter().map(Item::from).collect();
+
+        Channel {
+            title,
+            link,
+            description,
+            author,
+            categories,
+            image,
+            items,
+        }
+    }
+}
+
+impl From<YouTubePlaylistWithVideos> for Channel {
+    fn from(
+        YouTubePlaylistWithVideos(yt_playlist, yt_videos_w_streams): YouTubePlaylistWithVideos,
+    ) -> Self {
+        let title = format!("{0} (via YouTube)", yt_playlist.title());
+        let mut link = Url::parse(PLAYLIST_BASE_URL).expect("valid URL");
+        let description = yt_playlist.description().to_string();
+        link.query_pairs_mut()
+            .append_pair("list", &yt_playlist.id().to_string());
+        let author = yt_playlist.channel().map(|chan| chan.name().to_string());
+        // FIXME: Don't hardcode the category!
+        let categories = Vec::from([String::from("Playlist")]);
+        let image = yt_playlist
+            .thumbnails()
+            .iter()
+            .max_by_key(|tn| tn.width * tn.height)
+            .map(|tn| tn.url.clone());
+        let items = yt_videos_w_streams.into_iter().map(Item::from).collect();
+
+        Channel {
+            title,
+            link,
+            description,
+            author,
+            categories,
+            image,
+            items,
+        }
+    }
+}
+
+impl From<YouTubeVideoWithStream> for Item {
+    fn from(
+        YouTubeVideoWithStream {
+            video,
+            stream,
+            content_length: length,
+        }: YouTubeVideoWithStream,
+    ) -> Self {
+        let id = video.id().to_string();
+
+        let mime_type = stream.mime_type().to_string();
+        // Ignore everything from MIME type parameter seperator on for extension look-up.
+        let mime_sep = mime_type.find(';').unwrap_or(mime_type.len());
+        let extension = mime_db::extension(&mime_type[..mime_sep]).unwrap_or_default();
+        let file = PathBuf::from(&id).with_extension(extension);
+        let enclosure = Enclosure {
+            file,
+            mime_type,
+            length,
+        };
+
+        let mut link = Url::parse(VIDEO_BASE_URL).expect("valid URL");
+        link.query_pairs_mut().append_pair("v", &id);
+        let video_description = video.description();
+        let description = Some(format!("{video_description}\n\nTaken from YouTube: {link}"));
+        let categories = video
+            .hashtags()
+            .filter(|hashtag| !hashtag.trim().is_empty())
+            .map(|hashtag| {
+                let url = Url::parse(&format!(
+                    "https://www.youtube.com/hashtag/{}",
+                    hashtag.trim_start_matches('#')
+                ))
+                .expect("valid URL");
+
+                (hashtag.to_string(), url)
+            })
+            .collect();
+        let duration = Some(video.duration().as_secs() as u32);
+        let keywords = video.keywords().clone();
+        let image = video
+            .thumbnails()
+            .iter()
+            .max_by_key(|tn| tn.width * tn.height)
+            .map(|tn| tn.url.clone());
+        let timestamp = video
+            .date()
+            .and_hms_opt(12, 0, 0)
+            .expect("Invalid hour, minute and/or second");
+        let updated_at = DateTime::from_utc(timestamp, Utc);
+
+        Item {
+            title: video.title().to_string(),
+            link,
+            description,
+            categories,
+            enclosure,
+            duration,
+            guid: id,
+            keywords,
+            image,
+            updated_at,
+        }
+    }
+}
+
+/// Fetches the YouTube playlist videos for the given ID.
+///
+/// If the result is [`Ok`], the playlist will be cached for 24 hours for the given playlist ID.
+#[cached(
+    key = "(String, Option<usize>)",
+    convert = r#"{ (playlist_id.to_owned(), item_limit) }"#,
+    time = 86400,
+    result = true
+)]
+async fn fetch_playlist_videos(
+    client: &Client,
+    playlist_id: &str,
+    item_limit: Option<usize>,
+) -> Result<(YouTubePlaylist, Vec<YouTubeVideoWithStream>)> {
+    let id = playlist_id.parse()?;
+    let limit = item_limit.unwrap_or(DEFAULT_ITEM_LIMIT);
+    let yt_playlist = client.playlist(id).await?;
+    let yt_videos_w_streams = yt_playlist
+        .videos()
+        .filter_map(fetch_stream)
+        .take(limit)
+        .collect()
+        .await;
+
+    Ok((yt_playlist, yt_videos_w_streams))
+}
+
+/// Fetches the YouTube channel videos for the given ID.
+#[cached(
+    key = "(String, Option<usize>)",
+    convert = r#"{ (channel_id.to_owned(), item_limit) }"#,
+    time = 86400,
+    result = true
+)]
+async fn fetch_channel_videos(
+    client: &Client,
+    channel_id: &str,
+    item_limit: Option<usize>,
+) -> Result<(YouTubeChannel, Vec<YouTubeVideoWithStream>)> {
+    let id = channel_id.parse()?;
+    let limit = item_limit.unwrap_or(DEFAULT_ITEM_LIMIT);
+    let yt_channel = client.channel(id).await?;
+    let yt_videos_w_streams = yt_channel
+        .uploads()
+        .await?
+        .filter_map(fetch_stream)
+        .take(limit)
+        .collect()
+        .await;
+
+    Ok((yt_channel, yt_videos_w_streams))
+}
+
+/// Fetches the stream and relevant metadata for a YouTube video result.
+///
+/// If there is a error retrieving the metadata, the video is discarded/ignored.
+/// If there are problems retrieving the streams or metadata, the video is also discarded.
+async fn fetch_stream(
+    yt_video: Result<YouTubePlaylistVideo, YouTubeVideoError>,
+) -> Option<YouTubeVideoWithStream> {
+    match yt_video {
+        Ok(video) => {
+            let video = video.upgrade().await.ok()?;
+            let stream = video
+                .streams()
+                .await
+                .ok()?
+                .filter(|v| v.is_audio())
+                .max_by_key(|v| v.bitrate())?;
+            let content_length = stream.content_length().await.ok()?;
+
+            Some(YouTubeVideoWithStream {
+                video,
+                stream,
+                content_length,
+            })
+        }
+        Err(_) => None,
+    }
+}
+
+/// Retrieves the redirect URL for the provided YouTube video ID.
+///
+/// If the result is [`Ok`], the redirect URL will be cached for 24 hours for the given video ID.
+#[cached(
+    key = "String",
+    convert = r#"{ video_id.to_owned() }"#,
+    time = 86400,
+    result = true
+)]
+async fn retrieve_redirect_url(client: &Client, video_id: &str) -> Result<String> {
+    let video_id = video_id.parse()?;
+    let video = client.video(video_id).await?;
+    let stream = video
+        .streams()
+        .await?
+        .filter(|v| v.is_audio())
+        .max_by_key(|v| v.bitrate())
+        .ok_or(Error::NoRedirectUrlFound)?;
+
+    Ok(stream.url().to_string())
+}
--- a/src/feed.rs
+++ b/src/feed.rs
@ -28,7 +28,9 @@ pub(crate) fn construct(backend_id: &str, config: &Config, channel: Channel) ->
                .unwrap_or_default(),
        )
        .build();
-    let mut last_build = DateTime::<Utc>::from_utc(NaiveDateTime::from_timestamp(0, 0), Utc);
+    let unix_timestamp = NaiveDateTime::from_timestamp_opt(0, 0)
+        .expect("Out-of-range seconds or invalid nanoseconds");
+    let mut last_build = DateTime::from_utc(unix_timestamp, Utc);
    let generator = String::from(concat!(
        env!("CARGO_PKG_NAME"),
        " ",
--- a/src/lib.rs
+++ b/src/lib.rs
@ -54,6 +54,26 @@ pub(crate) enum Error {
    /// An error occurred in youtube-dl.
    #[error("Youtube-dl failed: {0}")]
    YoutubeDl(#[from] youtube_dl::Error),
+
+    /// An YouTube extract error occured.
+    #[error("YouTube extract error: {0}")]
+    YtExtract(#[from] ytextract::Error),
+
+    /// An YouTube extract ID parsing error occured.
+    #[error("YouTube extract ID parsing error: {0}")]
+    YtExtractId0(#[from] ytextract::error::Id<0>),
+
+    /// An YouTube extract ID parsing error occured.
+    #[error("YouTube extract ID parsing error: {0}")]
+    YtExtractId11(#[from] ytextract::error::Id<11>),
+
+    /// An YouTube extract ID parsing error occured.
+    #[error("YouTube extract ID parsing error: {0}")]
+    YtExtractId24(#[from] ytextract::error::Id<24>),
+
+    /// An YouTube extract playlist video error occured.
+    #[error("YouTube extract playlist video error: {0}")]
+    YtExtractPlaylistVideo(#[from] ytextract::playlist::video::Error),
 }

 impl<'r, 'o: 'r> rocket::response::Responder<'r, 'o> for Error {
--- a/templates/index.html.tera
+++ b/templates/index.html.tera
@ -5,15 +5,21 @@
  URL in your favorite podcast client to start using it.
 </p>
 <p>
-  Given the Mixcloud URL <https://www.mixcloud.com/myfavouriteband/>, the URL you
-  need to use for Podbringer is comprised of the following parts:
+  The URL you need to use for Podbringer is comprised of the following parts:

  <pre>
    https://my.domain.tld/podbringer/feed/mixcloud/myfavouriteband
-    |------------------------------|     |-------||--------------|
-     The Podbringer public URL            Service  User @ service
+    |------------------------------|      |------| |-------------|
+     The Podbringer public URL            Service   Service ID
  </pre>
 </p>
 <p>
-  The Podbringer location URL of this instance is: {{ url }}
+  Supported services are:
+  <ul>
+    <li>Mixcloud (service ID is Mixcloud username)</li>
+    <li>YouTube (service ID is YouTube channel or playlist ID)</li>
+  </ul>
+</p>
+<p>
+  The Podbringer location URL of this instance is: <a href="{{ url }}">{{ url }}</a>.
 </p>