This section is under work.
This report details the research progress for the project titled “Teaching to Understand, Understanding to Teach: Retrieval Augmented Generation for Requirements Traceability.” The research was initially conducted for the Jet Propulsion Laboratory (“JPL”), California Institute of Technology (“Caltech”), under the sponsorship of the National Aeronautics and Space Administration (“NASA”).
| Term | Definition |
|---|---|
| RAG | Retrieval-Augmented Generation; a technique enhancing LLMs with external data. |
| Embeddings | Vector representations of text segments used for semantic search. |
| Information Supply | The existing, ingested, corpus of data used by the system. |
| Traceability | The ability to link requirements to their corresponding code implementation. |
| Artifact(s) | Pieces of information pertaining to the system information supply. |
| Requirement | Formally agreed-upon shall statement defining a system condition, capability, or constraint. |
| Test Case | Detailed, step-by-step, series of instruction on an end product used to verify and validate a specific requisite or requirement. |
| Entity | A said object or thing acting individually. |
| Relationship | A link between two entities. |
| Branch Chunk | Fixed-size, overlapping, subset of tokens from an artifact. |
| Leaf Chunk | Variable-size, non-overlapping, semantic subset of tokens from a branch chunk. |
| Symbol | Description |
|---|---|
| \( N \) | Sample size of the set under consideration |
| \( A \) | Artifact |
| \( R \) | Requirement |
| \( T \) | Test Case |
| \( E \) , \( e \) | Entity set; entity |
| \( R \) , \( r \) | Relation set; relation |
| \( B \) , \( b \) | Branch chunk; leaf chunk |
The initial state of our tracing system, whether in the first or repreated usage of the tool, is the entry point. The goal of the system is to suggest traceability mappings from either an individual requirement (\(R\)) or a test case (\(T\)) to trace sets of \(R\) or \(T\), respectively. Two types of input, traceable artifacts and contextual artifacts, govern the input space. In our work, artifacts are pieces of information pertaining to the information supply, corpus of data, that the system knows of. A traceable artifact is an information artifact that represents system intent or system verification and is eligible to participate as a source or target in a traceability relationship; in this system, requirements and test cases encode such features. Contextual artifacts are support artifacts, non- requirement and test artifacts, that provide auxiliary information to the information supply for context-retrieval support. Contextual artifacts are non-traceable units, contrary to traceable artifacts, whose purpose is to be in a source or target set.