WordGrain
A JSON format for vocabulary and lyrical structure data from musical lyrics
WordGrain defines a standardized schema for storing vocabulary data extracted from musical lyrics -- word frequencies, sentiment, usage contexts, and phrase-level mood analysis.
Document Structure
A WordGrain document is a JSON file with a simple, well-defined hierarchy.
Documentobject
- $schema
- schema_version
- meta
- grains[]?
- bars[]?
metaobject
- source: string
- artist: string
- corpus_size: integer
- total_words: integer
- generated_at: date-time
- generator: string
- language: string
- description: string
grains[]array
- word: string *
- normalized: string
- pos: enum
- frequency: integer
- tfidf: number
- sentiment: enum
- categories: string[]
- contexts: Context[]
- collocations: Collocation[]
Contextobject
- line: string *
- track: string
- album: string
- year: integer
Collocationobject
- word: string *
- score: number *
- position: enum
bars[]array
- text: string *
- source: BarSource *
- metrics: BarMetrics
- semantics: BarSemantics
- language: string
BarSourceobject
- track: string *
- album: string
- year: integer
BarSemanticsobject
- mood: enum
- themes: string[]
- techniques: string[]
Root document
Object / Array
*Required field
What It Looks Like
A WordGrain file for Kendrick Lamar's discography.
kendrick-lamar.wg.json
1{2 "$schema": "https://raw.githubusercontent.com/shimpeiws/word-grain/main/schema/v0.2.0/wordgrain.schema.json",3 "schema_version": "0.2.0",4 "meta": {5 "source": "genius",6 "artist": "Kendrick Lamar",7 "generated_at": "2026-02-08T12:00:00Z",8 "language": "en"9 },10 "grains": [11 {12 "word": "hustle",13 "frequency": 47,14 "tfidf": 0.82,15 "sentiment": "positive",16 "categories": ["work", "struggle", "ambition"]17 }18 ],19 "bars": [20 {21 "text": "I got hustle though, ambition flow inside my DNA",22 "source": { "track": "DNA.", "album": "DAMN.", "year": 2017 },23 "semantics": { "mood": "aggressive", "themes": ["ambition"] }24 }25 ]26}Explore the Toolkit
Everything you need to work with WordGrain files.
Ecosystem
Community tools and reference implementations using WordGrain.