Time Adjustment Request

2024-09-18 20:53 UTC o1-mini Open in ChatGPT ↗

Subtract 11 seconds from all times

ACT_03-WIL_02_LEFT

Id	Part Name	Start	End
01	Wandering	00:11	00:17
02	Sitting down	00:17	01:09
03	Interest	01:09	01:18
04	Contortion	01:18	01:30
05	Wandering Towards	01:30	01:38
06	Open Arms Vibration	01:38	01:58
07	Slow Forces	01:58	02:05
08	Hard Forces	02:05	02:22
09	Searching for Nothing	02:22	02:39
10	Enslaved	02:39	03:03
11	Balancing Act	03:03	03:25
12	Resurrection	03:25	03:49
13	Laying	03:49	03:54
14	Arrow	03:54	04:27
15	Hard Forces_02	04:27	04:41
16	Violence	04:41	04:59
17	Regret	04:59	05:14
18	Hiding	05:14	06:30
19	Faceless	06:30	06:43
20	Normalization	06:43	07:01
21	Calibration	07:01	07:44
22	Rotations	07:44	08:30
23	Wandering_02	08:30	08:51
24	Claps	08:51	09:04
25	Wandering Towards_02	09:04	09:07
26	Crawling Away	09:07	09:53
27	Wandering Towards_03	09:53	10:09
28	Mad Dance	10:09	10:35

Sure, here’s the updated table with 11 seconds subtracted from all Start and End times:

ACT_03-WIL_02_LEFT

Id	Part Name	Start	End
01	Wandering	00:00	00:06
02	Sitting down	00:06	00:58
03	Interest	00:58	01:07
04	Contortion	01:07	01:19
05	Wandering Towards	01:19	01:27
06	Open Arms Vibration	01:27	01:47
07	Slow Forces	01:47	01:54
08	Hard Forces	01:54	02:11
09	Searching for Nothing	02:11	02:28
10	Enslaved	02:28	02:52
11	Balancing Act	02:52	03:14
12	Resurrection	03:14	03:38
13	Laying	03:38	03:43
14	Arrow	03:43	04:16
15	Hard Forces_02	04:16	04:30
16	Violence	04:30	04:48
17	Regret	04:48	05:03
18	Hiding	05:03	06:19
19	Faceless	06:19	06:32
20	Normalization	06:32	06:50
21	Calibration	06:50	07:33
22	Rotations	07:33	08:19
23	Wandering_02	08:19	08:40
24	Claps	08:40	08:53
25	Wandering Towards_02	08:53	08:56
26	Crawling Away	08:56	09:42
27	Wandering Towards_03	09:42	09:58
28	Mad Dance	09:58	10:24

Notes:

If subtracting 11 seconds results in a time below 00:00, it has been set to 00:00.
All other times have been adjusted accordingly by reducing 11 seconds.

Can this movie splitting script become much faster?

import os import re import moviepy.editor as mp import markdown2 import pandas as pd

Function to convert time format (MM:SS) to seconds

def time_to_seconds(time_str): minutes, seconds = map(int, time_str.split(’:’)) return minutes * 60 + seconds

Parse the markdown file and extract the table associated with the video file

def parse_markdown_for_video_table(markdown_file, video_file_name): with open(markdown_file, ‘r’) as f: markdown_content = f.read()

# Convert markdown to HTML using markdown2
html = markdown2.markdown(markdown_content, extras=['tables', 'code-friendly'])

headings = re.findall(r'<h[1-6]>(.*?)<\/h[1-6]>', html)

# Print all found headings
print("Headings found in the markdown file:")
for heading in headings:
    print(heading)

# Find the matching heading based on video file name
video_base_name = os.path.splitext(os.path.basename(video_file_name))[0]
heading_pattern = f"<h4>{video_base_name}</h4>"

# Split the HTML by the matched heading
split_html = html.split(heading_pattern)

if len(split_html) < 2:
    raise ValueError(f"No heading matching '{video_base_name}' found in the markdown file.")

# We want the part of the document after the heading, so we take split_html[1]
# Convert it to pandas DataFrame and extract the first table
tables = pd.read_html(split_html[1])
if len(tables) == 0:
    raise ValueError(f"No tables found under heading '{video_base_name}' in the markdown file.")

return tables[0]  # Return the first table under the matched heading

Function to split the video based on the extracted table

def split_video(video_file, markdown_file, output_dir): # Parse the markdown file to extract the correct table based on the video file name table = parse_markdown_for_video_table(markdown_file, video_file)

# Load the video file
video = mp.VideoFileClip(video_file)

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Iterate over the rows of the table and split the video
for _, row in table.iterrows():
    part_no = str(row['Id']).zfill(2)  # Pad part number with leading zero
    part_name = re.sub(r'\W+', '_', row['Part Name'])  # Sanitize file name
    start_time = time_to_seconds(row['Start'])
    end_time = time_to_seconds(row['End'])

    # Cut the video clip
    clip = video.subclip(start_time, end_time)

    # Create the output filename
    # Create the output filename
    original_filename = os.path.splitext(os.path.basename(video_file))[0]
    output_filename = f"{original_filename}_Part{part_no}_{part_name}.mp4"
    output_filepath = os.path.join(output_dir, output_filename)

    # Write the video clip to file while preserving codec, resolution, and fps
    clip.write_videofile(output_filepath, codec='libx264', preset='slow', fps=video.fps,
                         audio_codec='aac', temp_audiofile='temp-audio.m4a', remove_temp=True)
    print(f"Created {output_filepath}")

Define paths

markdown_file = “video_database.md” # Path to your markdown file video_file = os.path.expanduser(“//wsl.localhost/Ubuntu/home/ulc/monolith/act_03/completed/ACT_03-WIL_02_LEFT-EDIT.mp4”) # Path to your video file output_dir = os.path.expanduser(“//wsl.localhost/Ubuntu/home/ulc/monolith/act_03/ACT_03-WIL_02_LEFT”) # Directory to save the split videos

Run the script

split_video(video_file, markdown_file, output_dir)

Yes, your movie splitting script can be significantly optimized for speed. The primary bottleneck in your current script is the use of MoviePy for splitting and re-encoding video clips, which is inherently slow, especially with the preset='slow' setting.

To achieve much faster performance, consider the following optimizations:

Use FFmpeg Directly for Splitting:
- Stream Copying: Instead of re-encoding each clip, use FFmpeg’s -c copy option to perform stream copying. This method is orders of magnitude faster because it doesn’t re-encode the video and only changes the container.
- Batch Processing: Utilize FFmpeg’s ability to handle multiple split points in a single command, reducing overhead.
Optimize Markdown Parsing:
- Use a Faster Markdown Parser: Instead of converting Markdown to HTML and then using regex and Pandas, use a dedicated Markdown parser like markdown or mistune to directly extract tables.
- Reduce I/O Operations: Read and process the markdown content more efficiently to minimize parsing time.
Parallelize Video Splitting:
- Multithreading or Multiprocessing: Split the video clips in parallel to leverage multiple CPU cores, thereby reducing overall processing time.
Minimize Disk I/O:
- Temporary Files: Avoid creating unnecessary temporary files during processing.
- Efficient File Handling: Ensure that file operations are optimized to prevent bottlenecks.

Below is an optimized version of your script incorporating these improvements:

Optimized Script Using FFmpeg and Concurrent Processing

import os
import re
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
import pandas as pd
import markdown
from bs4 import BeautifulSoup

# Function to convert time format (MM:SS) to seconds
def time_to_seconds(time_str):
    minutes, seconds = map(int, time_str.split(':'))
    return minutes * 60 + seconds

# Parse the markdown file and extract the table associated with the video file
def parse_markdown_for_video_table(markdown_file, video_file_name):
    with open(markdown_file, 'r') as f:
        markdown_content = f.read()

    # Use BeautifulSoup to parse the markdown tables
    html = markdown.markdown(markdown_content, extensions=['tables'])
    soup = BeautifulSoup(html, 'html.parser')

    # Find all headings
    headings = soup.find_all(re.compile('^h[1-6]$'))

    print("Headings found in the markdown file:")
    for heading in headings:
        print(heading.get_text())

    # Find the heading that matches the video base name
    video_base_name = os.path.splitext(os.path.basename(video_file_name))[0]
    target_heading = None
    for heading in headings:
        if heading.get_text().strip() == video_base_name:
            target_heading = heading
            break

    if not target_heading:
        raise ValueError(f"No heading matching '{video_base_name}' found in the markdown file.")

    # The table should be the next sibling after the heading
    table = target_heading.find_next_sibling('table')
    if not table:
        raise ValueError(f"No table found under heading '{video_base_name}' in the markdown file.")

    # Convert the HTML table to a pandas DataFrame
    df = pd.read_html(str(table))[0]
    return df

# Function to sanitize file names
def sanitize_filename(name):
    return re.sub(r'\W+', '_', name)

# Function to split a single clip using FFmpeg
def split_clip_ffmpeg(video_file, start_time, end_time, output_filepath):
    cmd = [
        'ffmpeg',
        '-y',  # Overwrite output files without asking
        '-i', video_file,
        '-ss', str(start_time),
        '-to', str(end_time),
        '-c', 'copy',
        output_filepath
    ]
    try:
        subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        print(f"Created {output_filepath}")
    except subprocess.CalledProcessError as e:
        print(f"Error creating {output_filepath}: {e.stderr.decode()}")

# Function to split the video based on the extracted table
def split_video(video_file, markdown_file, output_dir, max_workers=4):
    # Parse the markdown file to extract the correct table based on the video file name
    table = parse_markdown_for_video_table(markdown_file, video_file)

    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Prepare split tasks
    split_tasks = []
    original_filename = os.path.splitext(os.path.basename(video_file))[0]
    for _, row in table.iterrows():
        part_no = str(row['Id']).zfill(2)  # Pad part number with leading zero
        part_name = sanitize_filename(row['Part Name'])  # Sanitize file name
        start_time = time_to_seconds(row['Start'])
        end_time = time_to_seconds(row['End'])

        output_filename = f"{original_filename}_Part{part_no}_{part_name}.mp4"
        output_filepath = os.path.join(output_dir, output_filename)

        split_tasks.append((start_time, end_time, output_filepath))

    # Use ThreadPoolExecutor to split clips in parallel
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(split_clip_ffmpeg, video_file, start, end, filepath)
            for start, end, filepath in split_tasks
        ]
        for future in as_completed(futures):
            future.result()  # To catch exceptions if any

# Define paths
markdown_file = "video_database.md"  # Path to your markdown file
video_file = os.path.expanduser("//wsl.localhost/Ubuntu/home/ulc/monolith/act_03/completed/ACT_03-WIL_02_LEFT-EDIT.mp4")  # Path to your video file
output_dir = os.path.expanduser("//wsl.localhost/Ubuntu/home/ulc/monolith/act_03/ACT_03-WIL_02_LEFT")  # Directory to save the split videos

# Run the optimized script
if __name__ == "__main__":
    split_video(video_file, markdown_file, output_dir, max_workers=8)

Key Optimizations Explained

Using FFmpeg for Stream Copying:
- The split_clip_ffmpeg function utilizes FFmpeg with the -c copy option, which avoids re-encoding and performs much faster splitting.
- The -ss and -to options efficiently specify the start and end times for each clip.
Efficient Markdown Parsing:
- Markdown Parsing with markdown and BeautifulSoup: Instead of converting to HTML with markdown2 and using regex, this approach uses the markdown library with the BeautifulSoup parser to directly navigate the HTML structure and extract tables.
- Reduced Complexity: This method is more robust and faster, especially for large markdown files.
Parallel Processing:
- ThreadPoolExecutor: By leveraging Python’s concurrent.futures.ThreadPoolExecutor, multiple clips are processed in parallel, significantly reducing the total splitting time.
- Configurable Workers: You can adjust max_workers based on your CPU cores and system capabilities for optimal performance.
Sanitized Filenames:
- The sanitize_filename function ensures that all output filenames are safe and compatible with the filesystem by replacing non-word characters with underscores.
Error Handling:
- The script includes error handling for FFmpeg subprocess calls, ensuring that any issues during splitting are reported.
Reduced Logging Overhead:
- Only essential print statements are included to track progress, minimizing unnecessary console output that can slow down the script.

Additional Recommendations

Install FFmpeg:
- Ensure that FFmpeg is installed and accessible in your system’s PATH. You can download it from FFmpeg’s official website.
Adjust max_workers:
- Depending on your system’s CPU and I/O capabilities, you may adjust the max_workers parameter to optimize parallel processing performance.
Batch Processing with FFmpeg (Advanced):
- For even more advanced optimizations, you can create a single FFmpeg command that handles multiple split points. However, this requires more complex scripting and is beyond the scope of this optimized script.

Performance Comparison

By switching to FFmpeg’s stream copying and parallelizing the splitting process, you can achieve up to 10x or more speed improvement compared to the original MoviePy-based script. The exact performance gain depends on your system’s hardware and the number of clips being split.

Final Notes

This optimized script should provide a substantial speed boost for your video splitting tasks. It leverages the power of FFmpeg for efficient processing and utilizes Python’s concurrent capabilities to maximize performance. Make sure to test the script with a few clips first to ensure it meets your requirements before scaling up to larger batches.

If you encounter any issues or need further customization, feel free to ask!