ParsedDocument

The ParsedDocument class defines a parsed Markdown document with its text, metadata, and convenience methods.

Definition

Source Code

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

using module ./DocumentLink.psm1

class ParsedDocument {
  [System.IO.FileInfo]$FileInfo
  [string]$RawContent
  [Markdig.Syntax.MarkdownDocument]$ParsedMarkdown
  [System.Collections.Specialized.OrderedDictionary]$FrontMatter
  [string]$Body
  [DocumentLink[]]$Links

  hidden [bool]$HasParsedLinks

  ParsedDocument() {}

  hidden ParseLinksFromBody() {
    $this.Links = [DocumentLink]::Parse($this.Body)
    | ForEach-Object -Process {
      # Add the file info to each link
      $_.Position.FileInfo = $FileInfo
      # Emit the link for the list
      $_
    }

    $this.HasParsedLinks = $true
  }

  [DocumentLink[]] ParsedLinks() {
    if (!$this.HasParsedLinks) {
      $this.ParseLinksFromBody()
    }

    return $this.Links
  }

  [DocumentLink[]] ParsedLinks([bool]$Force) {
    if (!$this.HasParsedLinks -or $Force) {
      $this.ParseLinksFromBody()
    }

    return $this.Links
  }

  [DocumentLink[]] InlineLinks() {
    return [DocumentLink]::FilterForInlineLinks($this.Links)
  }

  [DocumentLink[]] ReferenceLinks() {
    return [DocumentLink]::FilterForReferenceLinks($this.Links)
  }

  [DocumentLink[]] ReferenceDefinitions() {
    return [DocumentLink]::FilterForReferenceDefinitions($this.Links)
  }

  [DocumentLink[]] ReferenceLinksAndDefinitions() {
    return [DocumentLink]::FilterForReferenceLinksAndDefinitions($this.Links)
  }

  [DocumentLink[]] UndefinedReferenceLinks() {
    return [DocumentLink]::FilterForUndefinedReferenceLinks($this.Links)
  }

  [DocumentLink[]] UnusedReferenceLinkDefinitions() {
    return [DocumentLink]::FilterForUnusedReferenceLinkDefinitions($this.Links)
  }

  [DocumentLink[]] ValidReferenceLinksAndDefinitions() {
    return [DocumentLink]::FilterForValidReferenceLinksAndDefinitions($this.Links)
  }

  [string] ToDecoratedString() {
    return $this.Body
    | ConvertFrom-Markdown -AsVT100EncodedString
    | Select-Object -ExpandProperty VT100EncodedString
  }
}

The ParsedDocument class is used throughout the Documentarian module as the model and interface representing a Markdown file. It includes the file’s metadata, raw content, the Markdown AST for the document, its front matter, body text, and the list of links in the document. It also includes several convenience methods for inspecting the document.

Examples

1. Getting the parsed changelog

This example creates a ParsedDocument from the project’s changelog, which you can then inspect with its properties and methods.

Get-Document ./CHANGELOG.md
FileInfo       : C:\code\pwsh\Documentarian\Source\Modules\Documentaria
                 n\CHANGELOG.md
RawContent     : ---
                 title: Changelog
                 weight: 0
                 description: |
                   All notable changes to the **Documentarian** module
                 are documented in this file.

                   This changelog's format is based on [Keep a
                 Changelog][01] and this project adheres to
                   [Semantic Versioning][02].

                   For releases before `1.0.0`, this project uses the
                 following convention:

                   - While the major version is `0`, the code is
                 considered unstable.
                   - The minor version is incremented when a
                 backwards-incompatible change is introduced.
                   - The patch version is incremented when a
                 backwards-compatible change or bug fix is introduced.

                   [01]: https://keepachangelog.com/en/1.0.0/
                   [02]: https://semver.org/spec/v2.0.0.html
                 ---

                 ## Unreleased

                 - Scaffolded initial project.

ParsedMarkdown : {Markdig.Extensions.Yaml.YamlFrontMatterBlock,
                 Markdig.Syntax.HeadingBlock,
                 Markdig.Syntax.ListItemBlock, Markdig.Extensions.AutoI
                 dentifiers.HeadingLinkReferenceDefinition}
FrontMatter    : {[title, Changelog], [weight, 0], [description, All
                 notable changes to the **Documentarian** module are
                 documented in this file.

                 This changelog's format is based on [Keep a
                 Changelog][01] and this project adheres to
                 [Semantic Versioning][02].

                 For releases before `1.0.0`, this project uses the
                 following convention:

                 - While the major version is `0`, the code is
                 considered unstable.
                 - The minor version is incremented when a
                 backwards-incompatible change is introduced.
                 - The patch version is incremented when a
                 backwards-compatible change or bug fix is introduced.

                 [01]: https://keepachangelog.com/en/1.0.0/
                 [02]: https://semver.org/spec/v2.0.0.html
                 ]}
Body           : ## Unreleased

                 - Scaffolded initial project.

Links          :

Constructors

ParsedDocument()
Initializes a new instance of the ParsedDocument class.

Methods

InlineLinks()
Returns every inline (non-reference) link from the document.
ParsedLinks()
Returns the parsed links from the document, parsing if needed.
ReferenceDefinitions()
Returns every reference link definition from the document.
ReferenceLinks()
Returns every reference link from the document.
ReferenceLinksAndDefinitions()
Returns every reference link and reference link definition from the document.
ToDecoratedString()
Returns the VT100-encoded string representing the rendered markdown for the document.
UndefinedReferenceLinks()
Returns every reference link that doesn’t have a matching reference link definition from the document.
UnusedReferenceLinkDefinitions()
Returns every reference link definition that doesn’t have at least one matching reference link from the document.
ValidReferenceLinksAndDefinitions()
Returns every reference link that isn’t undefined and every reference link definition that isn’t unused from the document.

Properties

Body
The Body property contains the Markdown content of the document as a single string with the front matter removed.
FileInfo
The FileInfo property contains the document’s metadata from the file system.
FrontMatter
The FrontMatter property contains the key-value data from the document’s frontmatter. The data is stored as an ordered dictionary so it can be modified and written back to the file without changing the order of the keys in the document.
Links
The Links property contains the list of all discovered links from the document’s Markdown content.
ParsedMarkdown
The ParsedMarkdown property contains the abstract syntax tree (AST) representation of the document’s Markdown returned by Markdig.
RawContent
The RawContent property contains the document’s content as a single string, including the frontmatter and Markdown exactly as it existed in the file when it was parsed.