PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 14 Jan 2013 12:00 AM by  SuperUser Account
Ability to store QA check results
 10 Replies
Sort:
You are not authorized to post a reply.
Author Messages

SuperUser Account



Basic Member


Posts:289
Basic Member


--
22 Nov 2012 12:00 AM

    Hello

    On behalf of the D:A:D datamanagers I would like to suggest a format for Error checks and the result on running them. And also a possibility of capturing feedback such as 'verified' for outliers.

    A QA check table should include:
    'Error Code' - the QA-check Identifier (e.g. 'VW004' for the weight out of acceptable range)
    the 'unique identifier' from the table that triggered the error (e.g. PATIENT+VIS_D for tblVIS)
    And the 'Verified' field to allow for feedback / Explanations

    the verified field should be numeric with coding:
    1 = Verified (in source-doc),
    2 = Plausible,
    .. and any other values we could find relevant

    An example of using the "Plausible" category is: A patient weighs 139 kg at a visit. This is confirmed. At the next visit the record says the patient weighs 140 kg. This could then be marked as plausible based on the prior knowledge of the patient.

    Likewise very high/low lab values for a patient that have had previous verified outliers could then be marked as plausible.

    The idea of keeping this Error-Table would be to limit the error-lists created at subsequent mergers. We do not need to keep checking outliers that have already been confirmed.

    I hope you will consider making this part of the HICDEP protocol.

    Best Regards, Rikke S. Brandt Datamanger for D:A:D at the Copenhagen HIV Centre.


    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    22 Nov 2012 12:00 AM


    Thank you Rikke - I fully support this proposal. Thanks, Bruno.

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    22 Nov 2012 12:00 AM

    I have constructed a tabular form of the proposal:


    Note: Fields marked bold form the unique identifier for a record of the table.

    Field name

    Format

    Description

    CHECK_ID

    character

    The QA check identifier

    ROWKEY

    character

    Encoding of multi-part keys needs further consideration

    Unique identifier of row that failed the check

    VERIFIED

    numeric:

    • 1 = Verified
    • 2 = Plausible
    • ...

    Outcome from review of this error

    COMMENT

    text

    Optional explanation of outcome


    As noted above, the encoding of the ROWKEY field needs clarification: The content of this field may be a concatenation of multiple fields, e.g. PATIENT and TEST_ID from tblLAB_RES. At the same time, it should be possible to unambiguously decode the ROWKEY value into the set of original field values so that the original observation may be determined based on the ROWKEY. To make use of data in the HICDEP format, a common encoding/decoding schema for the ROWKEY across all cohorts is necessary, but is made difficult by the presence of arbitrary characters in PATIENT and TEST_ID.

    I recommend we discuss this at the next TC.


    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    23 Nov 2012 12:00 AM


    In DAD at the moment the ROWKEY variable is converted to multiple fields instead of just one. So in fact the in the example I used to describe the issue would have two fields - PATIENT and VIS_D

    But I do realise that could be quite challenging if even more fields would go into the unique definition.

    I would be happy to be part of the next TC so that I could hear what ideas are out there.

    Thank you very much,

    Rikke

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    27 Nov 2012 12:00 AM

    A solution with multiple fields sounds easier to handle than my suggestion. The current HICDEP draft would require 5 such fields, as the longest unique identifier, from tblLAB_RES_LVL_2, uses 5 fields. The fields would be filled from lowest (ROWKEY1) to highest (ROWKEY5), in the order as they appear in the HICDEP specification of the source table, leaving the trailing superfluous fields blank (NULL for databases and . for SAS, Stata etc).

    The table would be as follows:


    Note: Fields marked bold form the unique identifier for a record of the table.

    Field name

    Format

    Description

    CHECK_ID

    character

    The QA check identifier

    ROWKEY1

    character

    First field of unique identifier of row that failed the check

    ROWKEY2

    character

    Second field of unique identifier of row that failed the check

    ROWKEY3

    character

    Third field of unique identifier of row that failed the check

    ROWKEY4

    character

    Fourth field of unique identifier of row that failed the check

    ROWKEY5

    character

    Fifth field of unique identifier of row that failed the check

    VERIFIED

    numeric:

    • 1 = Verified
    • 2 = Plausible
    • ...

    Outcome from review of this error

    COMMENT

    text

    Optional explanation of outcome


    Should the table containing the row also be provided, or is that specified implicitly by the check code? Some checks compare values from multiple tables, but the checks always refer to a specific table and row as the check violator.

    Also, are there

    • Suggestions for other values of the VERIFIED field?
    • Suggestions for a table name?

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    28 Nov 2012 12:00 AM


    You could either have one discrepancy table for each table that you run error checks on e.g. tblBAS_DISCREP, tblVIS_DISCREP etc.
    This solution would allow for less fields for the tables with fewer ID-fields - and also allows you to name the ROWKEY1-ROWKEY5 as the actual fields they relate to e.g. Patient, VIS_D etc

    Or you need a field to show what table the error is related to. This would be neccesary to decode what ROWKEY1-ROWKEY5 translates to. And at the present relaese of HICDEP the error codes can be the same for multiple tables. (It is neccesary to know which table had the duplicate records.)
    If we choose this option, perhaps we can use the name tblDISCREPANCY/tblDISCREP


    / Rikke

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    17 Dec 2012 12:00 AM
    Dear,

    I do support the proposal to create a table for discrepancy table(s). However, I am not too fond of the code values for the VERIFIED field. I would say that the code "Verified" without any further details (verified and resolved, verified and confirmed as is) would not be very informative, and the code "Plausible" is too ambiguous. I would like to suggest to use the following coding values instead:

    1 = Resolved (with data update) 2 = Confirmed as is (i.e. not a problem) 3 = Cannot be resolved (but incorrect) 4 = Data management SEC

    Thanks Monique

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    18 Dec 2012 12:00 AM


    I agree with Monique that codes could be more informative.

    What is SEC??

    Bruno

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    08 Jan 2013 12:00 AM


    SEC = Self Evident Correction

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    11 Jan 2013 12:00 AM


    Coming back to the issue of one or many tables, either solution has up- and downsides:

    For a single table

    -All keys must be strings, making queries difficult
    -The PK-column names will be generic, making it more difficult to tell which column corresponds to which PK-column in the target tables.
    +All QA check outcomes are in one place, making it easier to summarize the status/quality of data checks

    For multiple tables

    -No precise specification of the table is possible as every discrepancy table will be different
    +Each discrepancy table will use the same PK-columns as its target table, resulting in easier joins and no guessing as to the key correspondancies

    Although my personal preferance would be for the second option, this is mostly a technical issue in the end, so if either option is easier for Rikke to implement in the DDM tool I would choose that.

    SuperUser Account



    Basic Member


    Posts:289
    Basic Member


    --
    14 Jan 2013 12:00 AM


    I will see what makes most sense within the tool.

    Regards, Rikke
    You are not authorized to post a reply.