Back to library

Model Card Review and Writing

A model card is the technical and ethical documentation of an AI model — what it was trained on, what it's good at, what it isn't, what risks it carries, and how it should and shouldn't be used. For PMs, model cards serve two purposes: evaluating a third-party model before building on it, and documenting an internally-built or fine-tuned model before sharing it. This skill covers both.

---

Context

Why model cards matter for PMs:

Third-party model cards tell you what a model was trained on and tested against — which tells you where it will fail for your specific use case. An LLM trained on English text will underperform on multilingual tasks even if the provider's benchmarks look strong. A model evaluated on academic benchmarks may not reflect your domain.

Internally, a model card is the accountability document that describes what you built, who it was built for, what it was tested on, and what the known limitations are. Without one, no one can make an informed decision about using or deploying the model.

The standard model card sections (from Mitchell et al., 2019 — the original paper):

Model details / Intended use / Factors / Metrics / Evaluation data / Training data / Quantitative analyses / Ethical considerations / Caveats and recommendations

---

Step 1 — Mode selection

This skill has two modes:

MODE A — REVIEW an existing model card (from a third-party provider or another team)

→ Go to Step 2

MODE B — WRITE a model card (for an internally-built or fine-tuned model)

→ Go to Step 3

---

Step 2 — Review an existing model card (MODE A)

Use this checklist to evaluate a model card for completeness and relevance to your use case:

\\\

MODEL CARD REVIEW CHECKLIST: [Model name]

Reviewer: [PM name] Date: [date] Use case: [Your specific use case]

SECTION 1 — MODEL IDENTITY:

[ ] Model name and version are explicitly stated

[ ] Model architecture type is described

[ ] Provider / creator is identified

[ ] Release date or version date is stated

[ ] License terms are clearly stated

SECTION 2 — INTENDED USE:

[ ] Primary intended use cases are listed

[ ] Out-of-scope uses are listed

[ ] User population is described

SECTION 3 — TRAINING DATA:

[ ] Training data sources are described

[ ] Training data cutoff date is stated

[ ] Known biases in training data are disclosed

SECTION 4 — EVALUATION:

[ ] Benchmark datasets are named

[ ] Performance metrics are reported

[ ] Disaggregated metrics by subgroup are reported

SECTION 5 — LIMITATIONS:

[ ] Known failure modes are disclosed

[ ] Known biases are disclosed

[ ] Context length limitations are stated

SECTION 6 — RISKS AND MITIGATIONS:

[ ] Foreseeable misuses are described

[ ] Recommended mitigations are described

PM ASSESSMENT:

Gaps identified: [What the model card doesn't tell you that you need to know]

Concerns for your use case: [Specific limitations or risks]

Decision: [Suitable / Suitable with mitigations / Not suitable]

\\\

---

Step 3 — Write a model card (MODE B)

Use this template to document an internally-built, fine-tuned, or adapted model covering: Model details, Intended use, Training and fine-tuning data, Evaluation, Limitations and risks, Recommended mitigations, Caveats, and Changelog.

Quality check before delivering

MODE A (review):

Every section of the card is evaluated
Gaps specific to your use case are identified
A clear decision is stated: suitable / with mitigations / not suitable

MODE B (write):

Out-of-scope uses are explicitly listed
Training data limitations include language and demographic coverage
Evaluation is on a held-out set
Changelog is maintained
Suggested next step: Share the gap analysis with the model provider and ask specifically about your use case. Model cards are written for general audiences — the provider may have additional evaluation data for your domain.