WordGrain

WordGrain

A JSON format for vocabulary and lyrical structure data from musical lyrics

WordGrain defines a standardized schema for storing vocabulary data extracted from musical lyrics -- word frequencies, sentiment, usage contexts, and phrase-level mood analysis.

Document Structure

A WordGrain document is a JSON file with a simple, well-defined hierarchy.

Documentobject
  • $schema
  • schema_version
  • meta
  • grains[]?
  • bars[]?
metaobject
  • source: string
  • artist: string
  • corpus_size: integer
  • total_words: integer
  • generated_at: date-time
  • generator: string
  • language: string
  • description: string
grains[]array
  • word: string *
  • normalized: string
  • pos: enum
  • frequency: integer
  • tfidf: number
  • sentiment: enum
  • categories: string[]
  • contexts: Context[]
  • collocations: Collocation[]
Contextobject
  • line: string *
  • track: string
  • album: string
  • year: integer
Collocationobject
  • word: string *
  • score: number *
  • position: enum
bars[]array
  • text: string *
  • source: BarSource *
  • metrics: BarMetrics
  • semantics: BarSemantics
  • language: string
BarSourceobject
  • track: string *
  • album: string
  • year: integer
BarSemanticsobject
  • mood: enum
  • themes: string[]
  • techniques: string[]
Root document
Object / Array
*Required field

What It Looks Like

A WordGrain file for Kendrick Lamar's discography.

kendrick-lamar.wg.json
1{2  "$schema": "https://raw.githubusercontent.com/shimpeiws/word-grain/main/schema/v0.2.0/wordgrain.schema.json",3  "schema_version": "0.2.0",4  "meta": {5    "source": "genius",6    "artist": "Kendrick Lamar",7    "generated_at": "2026-02-08T12:00:00Z",8    "language": "en"9  },10  "grains": [11    {12      "word": "hustle",13      "frequency": 47,14      "tfidf": 0.82,15      "sentiment": "positive",16      "categories": ["work", "struggle", "ambition"]17    }18  ],19  "bars": [20    {21      "text": "I got hustle though, ambition flow inside my DNA",22      "source": { "track": "DNA.", "album": "DAMN.", "year": 2017 },23      "semantics": { "mood": "aggressive", "themes": ["ambition"] }24    }25  ]26}