Data description
The data is provided as a CSV-table containing one row for each of the patients in the database. The table has a header with three levels that describe the columns. Below we explain each column in the form of a list with three levels. So, for example, list entry 1.1.7 refers to a column with the three-level header patient | # | hpv_status
and underneath it tha patients' HPV status is listed.
Columns
patient
: General information about the patient’s condition can be found under this top-level header.#
: The second level under patient has no meaning and exists solely as a filler.id
: Enumeration of the patientssex
: Sex of the patientage
: Patient’s age at diagnosisdiagnose_date
: Date of diagnosis (formatYYYY-mm-dd
) defined as the date of first histological confirmation of HNSCC.alcohol_abuse
:true
for patients who stated that they consume alcohol regularly,false
otherwisenicotine_abuse
:true
for patients who have been regular smokers (> 10 pack years)hpv_status
:true
for patients with human papilloma virus associated tumors (as defined by p16 immunohistochemistry)neck_dissection
: Indicates whether the patient has received a neck dissection as part of the treatment.tnm_edition
: The edition of the TNM classification used to classify the patient [1]n_stage
: Degree of spread to regional lymph nodesm_stage
: Presence of distant metastases
tumor
: Information about tumors is stored under this top-level header<number>
: The second level enumerates the synchronous tumors. In our database, no patient has had a second tumor, but this structure of the file allows us to include such patients in the future. The third-level headers are the same for each tumor.location
: Anatomic location of the tumorsubsite
: ICD-O-3 code associated with a tumor at the particular location according to the world health organization [2], [3]side
: Lateralization of the tumor. Can be“left”
or“right”
for tumors that have their center of mass clearly on the respective side of the mid-sagittal line and“central”
for patients with a tumor on the mid-sagittal line.extension
: True if part of the tumor extends over the mid-sagittal linevolume
: Volume of the tumor in cm3stage_prefix
: Prefix modifier of the T-category. Can be“c”
or“p”
t_stage
: T-category of the tumor, according to TNM staging
<diagnostic modality>
: Each recorded diagnostic modality is indicated by its own top-level header. In this file FNA, CT, MRI, PET, path (pathology) and pCT (planning CT) are providedinfo
:date
: Day on which a diagnose with the respective modality was performed
right
: All findings of involved lymph nodes on the right side of the patient’s neck<LNL>
: One column is provided for each recorded lymph node level. For each leveltrue
indicates at least one finding diagnosed as malignant lymph node in the respective LNL,false
means no malignant lymph node has been found and an empty field indicates that no diagnosis is available for this LNL according to the respective diagnostic modality.<LNL>
can be: I, Ia, Ib, II, IIa, IIb, III, IV, V, VI, VII, VIII, IX, X.
left
: Same as 3.2 but for the left side of the patient’s neck<LNL>
: same as under 3.2.1
Data repository
The data we extracted at our institution and uploaded to this interface is also available in one of our GitHub repositories: lyDATA
References
-
[1] J. D. Brierley, M. K. Gospodarowicz, and C. Wittekind, TNM Classification of Malignant Tumours. John Wiley & Sons, 2017.
-
[2] World Health Organization, Ed., International statistical classification of diseases and related health problems, 10th revision, 2nd edition. Geneva: World Health Organization, 2004.
-
[3] A. G. Fritz, Ed., International classification of diseases for oncology: ICD-O, 3rd ed. Geneva: World Health Organization, 2000.