Sfoglia il codice sorgente

Add table handling

Josh Rosario 6 mesi fa
parent
commit
87d493075e
1 ha cambiato i file con 175 aggiunte e 3 eliminazioni
  1. 175 3
      example/ChatGPTMarkdownExport.md

+ 175 - 3
example/ChatGPTMarkdownExport.md

@@ -1,12 +1,12 @@
 # ChatGPT Markdown Export
 
-*ChatGPT conversation saved Sat May 10 09:33:40 2025 converted to Markdown*
+*ChatGPT conversation saved Fri Jun 20 07:50:14 2025 converted to Markdown*
 
 ---
 
 **You:**
 
-I would like to save individual ChatGPT discussions formatted in Markdown so may view them in my git repositories. Would you be able to generate a Python script that visits a public link, such as https://chatgpt.com/share/681f8d88-bbf0-8006-b630-b6910fbfd26a , reformats with Markdown, and outputs a .md file?
+I would like to save individual ChatGPT discussions formatted in Markdown so may view them in my git repositories. Would you be able to generate a Python script that visits a public link, such as https://chatgpt.com/share/681e3b7e-9638-8006-9a44-047912d6b7a1 , reformats with Markdown, and outputs a .md file?
 
 ---
 
@@ -1764,4 +1764,176 @@ Will now correctly become:
 This is **bold** and *italic* and a [link](https://example.com).
 ```
 
-Would you like me to give you the fully consolidated final version of the script with all this applied?
+Would you like me to give you the fully consolidated final version of the script with all this applied?
+
+---
+
+**You:**
+
+How do I add Tables to the formatting?
+
+---
+
+**ChatGPT**
+
+To support **HTML tables** and convert them into **Markdown table syntax**, you can extend your `extract_markdown_from_prose()` function with a new `elif` block for `<table>`.
+
+### ✅ Add This to `extract_markdown_from_prose()`:
+
+Insert this just above the `# Preformatted code blocks` section:
+
+```python
+# Tables
+        elif tag == "table":
+            rows = element.find_all("tr")
+            if not rows:
+                continue
+
+            table_lines = []
+            for row_idx, tr in enumerate(rows):
+                cells = tr.find_all(["th", "td"])
+                row = [get_text_with_formatting(cell).strip() for cell in cells]
+                line = "| " + " | ".join(row) + " |"
+                table_lines.append(line)
+
+                # After the first row (usually headers), add separator line
+                if row_idx == 0:
+                    separator = "| " + " | ".join(["---"] * len(row)) + " |"
+                    table_lines.append(separator)
+
+            md_lines.extend(table_lines)
+```
+
+### Example
+
+Given this HTML:
+
+```html
+<table>
+  <tr><th>Name</th><th>Age</th></tr>
+  <tr><td>Alice</td><td>30</td></tr>
+  <tr><td>Bob</td><td>25</td></tr>
+</table>
+```
+
+The Markdown output will be:
+
+```markdown
+| Name | Age |
+| --- | --- |
+| Alice | 30 |
+| Bob | 25 |
+```
+
+### Tips
+
+This handles basic tables with headers (`<th>`) and body rows (`<td>`).
+
+
+If you need advanced handling like nested formatting, cell alignment, or multiline cells, I can help you extend it further.
+
+Let me know if you'd like those edge cases supported too!
+
+---
+
+**You:**
+
+Let’s handle all types of tables
+
+---
+
+**ChatGPT**
+
+Great — to support **all types of HTML tables** robustly in Markdown, we’ll handle:
+
+### Goals:
+
+`<thead>`, `<tbody>`, `<tfoot>` sections
+
+
+Mixed use of `<th>` and `<td>` in any row
+
+
+Inline formatting inside cells (`<strong>`, `<a>`, etc.)
+
+
+Variable column widths
+
+
+(Optional) Empty or malformed rows
+
+### ✅ Updated Table Handler (drop-in)
+
+Add or replace this block inside `extract_markdown_from_prose()`:
+
+```python
+# Tables
+        elif tag == "table":
+            def extract_rows(section):
+                return section.find_all("tr") if section else []
+
+            thead = element.find("thead")
+            tbody = element.find("tbody")
+            tfoot = element.find("tfoot")
+
+            rows = extract_rows(thead) + extract_rows(tbody) + extract_rows(element) + extract_rows(tfoot)
+            seen = set()
+            filtered_rows = []
+            for tr in rows:
+                if tr not in seen:
+                    seen.add(tr)
+                    filtered_rows.append(tr)
+
+            if not filtered_rows:
+                continue
+
+            table_lines = []
+            for row_idx, tr in enumerate(filtered_rows):
+                cells = tr.find_all(["th", "td"])
+                row = [get_text_with_formatting(cell).strip() for cell in cells]
+                line = "| " + " | ".join(row) + " |"
+                table_lines.append(line)
+
+                # After the first row, add separator (assuming it's the header)
+                if row_idx == 0:
+                    separator = "| " + " | ".join(["---"] * len(row)) + " |"
+                    table_lines.insert(1, separator)
+
+            md_lines.append("\n".join(table_lines))
+```
+
+### Markdown Output Examples
+
+**From complex HTML like:**
+
+```html
+<table>
+  <thead>
+    <tr><th>Name</th><th>Details</th></tr>
+  </thead>
+  <tbody>
+    <tr><td>Alice</td><td><strong>Lead</strong> Engineer</td></tr>
+    <tr><td>Bob</td><td><a href="#">Junior</a></td></tr>
+  </tbody>
+</table>
+```
+
+**You get:**
+
+```markdown
+| Name | Details |
+| --- | --- |
+| Alice | **Lead** Engineer |
+| Bob | [Junior](#) |
+```
+
+### Additional Edge Case Support
+
+If you ever want to add:
+
+**Column alignment** (`:---`, `:---:`) — Markdown supports this but requires semantic hinting (we can auto-center or left-align).
+
+
+**Rowspan/colspan** — not directly supported in Markdown, but we can simulate with repeated cells or notes.
+
+Let me know if you want any of those added!