Python Developer Needed to Debug & Stabilize Slide Generator Pipeline (DOCX → JSON → HTML → PDF) - Contract to Hire
Overview I’ve built a custom ABA Slide Generator in Python that converts structured content (DOCX / JSON) into branded slide PDFs using a multi-layer pipeline: Inputs → Planning → Pagination → Templating → Rendering → Outputs Right now, the system works conceptually, but I need an experienced Python developer to: ✅ Troubleshoot bugs ✅ Fix broken logic ✅ Improve reliability ✅ Make the pipeline production-ready You will work directly with an existing codebase and system architecture (illustrated in provided diagrams). System Architecture (Stack) The project is structured into 5 layers: Layer 1: Inputs (Files) data/ (modules + topics in DOCX or JSON) config.json (branding + metadata) welcome.json (optional) templates/*.html (Jinja2 templates) Layer 2: Planning (Python logic) Core functions include: discover_modules build_slides_from_data_dir parse_topic_from_docx topicjson_to_template_data This layer converts content into structured slide data. Layer 3: Pagination / Overflow Logic estimate_outline_height split_outline_into_pages Handles long content that must overflow into continuation slides. Layer 4: Templating Uses Jinja2 to render HTML slides from templates + data. Layer 5: Rendering & Outputs Playwright Chromium renders HTML → PDF Optional PyPDF2 PdfMerger combines pages Optional JSON export (sample_slides.json) Outputs: Individual slide PDFs Combined PDF JSON representation Helper Script: build_sample_slides.py This script orchestrates the pipeline: Resolves paths Extracts data.zip if needed Imports slide_generator Loads config.json (+ optional welcome.json) Calls build_slides_from_data_dir() Writes output to sample_slides.json Then runs slide_generator.py to render PDFs Command example: python slide_generator.py --slides-config sample_slides.json --output --combined ️ What I Need You To Do Your job will be to: ✅ Debug existing Python code ✅ Fix crashes, logic bugs, and edge cases ✅ Ensure overflow logic works correctly ✅ Validate DOCX → JSON → HTML → PDF flow ✅ Improve structure and error handling ✅ Make the pipeline stable and repeatable ✅ Document fixes briefly This is not a greenfield build — this is a troubleshooting and refactoring job on a real pipeline. Deliverables Working pipeline from input files → final PDFs Fixed build_sample_slides.py Fixed slide_generator.py Stable pagination / overflow logic Clean console output (no crashes) Short explanation of changes Required Skills Must have: Strong Python (3.x) Experience with file pipelines Jinja2 templating JSON + DOCX parsing Playwright or Puppeteer for PDF generation Debugging complex scripts Git / code review comfort Nice to have: PDF generation experience Python CLI tools Workflow automation Data-driven templating systems How to Apply (Important) Please include in your proposal: Your experience debugging Python pipelines Have you worked with: Jinja2? Playwright / Puppeteer? DOCX or PDF generation? A short explanation of how you’d approach debugging this stack Any similar project you’ve done Bonus points if you mention: “multi-layer pipeline debugging” “templating + rendering systems” “overflow / pagination logic” Project Type Fixed price or hourly (open to suggestion) Expected scope: 1–3 days of focused work Potential for long-term work if successful Why This Project Is Interesting This is a real-world document automation system that: Turns curriculum into slide decks Handles pagination automatically Produces professional PDFs Uses a modern stack (Python + Jinja2 + Playwright) If you like debugging intelligent pipelines and making systems rock-solid, this will be a fun challenge. If you want, I can next provide: ✅ A shorter Upwork version (1-page) ✅ A technical version for senior devs ✅ A budget-friendly version ✅ A copy/paste formatted Upwork post ✅ A version with bullet-point scope & milestones Apply tot his job