Overview
These functions convert Shamela HTML content to Markdown format, making it easier to work with the content in markdown-based systems and pattern matching workflows.htmlToMarkdown()
Converts Shamela HTML to Markdown format for easier pattern matching.Signature
Parameters
HTML content from Shamela
Returns
Markdown-formatted content
Transformations
-
Title spans to headers
<span data-type="title">text</span>→## text- No extra newlines added (content already has proper line breaks)
-
Narrator links stripped
<a href="inr://...">text</a>→text- Removes narrator reference links but preserves text
-
All other HTML tags
- Stripped using
stripHtmlTags()
- Stripped using
Example
Notes
- Line breaks are preserved from the original content
- Line ending normalization should be handled by calling functions
- Works in conjunction with
normalizeTitleSpans()for consecutive titles
convertContentToMarkdown()
Converts Shamela HTML content to Markdown format using a standardized pipeline.Signature
Parameters
Raw HTML content from Shamela
Optional configuration for title span normalization. Defaults to
{ strategy: 'splitLines' }Returns
Markdown-formatted content with normalized line endings
Processing Pipeline
This function applies the following transformations in order:- Normalize consecutive title spans - Using
normalizeTitleSpans() - Move pre-title text into spans - Using
moveContentAfterLineBreakIntoSpan() - Convert to Markdown format - Using
htmlToMarkdown() - Normalize line endings - Using
normalizeLineEndings()
Example
Strategy Options
Default (splitLines)
Merge Strategy
Hierarchy Strategy
Complete Example
Use Cases
- Export to Markdown files - Convert books for markdown-based systems
- Pattern matching - Easier to match patterns in markdown than HTML
- Documentation generation - Use with static site generators
- Search indexing - Index markdown content for better search
- LLM processing - Provide cleaner format for AI models
Related Functions
normalizeTitleSpans()- Normalize consecutive titlesmoveContentAfterLineBreakIntoSpan()- Fix pre-title textnormalizeLineEndings()- Normalize line endingsstripHtmlTags()- Remove HTML tags