21 CFR PART 11

The 21 CFR Part 11 Regulations from the FDA

Microsoft sharepoint as an EDC or supporting an EDC ?

There has been some interest in using Sharepoint as an EDC. We have attempted doing so ourselves and found its limitations for actual data collection are significant. Put another way, the amount of effort it would take to write code on top of Sharepoint to make it useful as a data capture and management tool was substantial.

However, Sharepoint excels as a document management tool and can serve that role very well. Most EDC systems do not support document management out of the box. This makes a combination of Sharepoint and an EDC system a workable one.

This white paper describes the features and capabilities within Sharepoint and its associated technologies that can help ensure 21 CFR Part 11 compliance: http://download.ehealthinformation.ca/cite/SharePoint-Guidance-21-CFR-Part-11.pdf

October 25, 2009 | Permalink | Comments (0) | TrackBack (0)

De-identification of clinical trials data

A paper by Shostak describes a set of SAS routines for coding clinical trial datasets (paper: http://www.lexjansen.com/pharmasug/2006/publichealthresearch/pr02.pdf; web archive: http://www.webcitation.org/5jxKAuL10). As our recent overview of de-identification techniques illustrates (available here: http://www.ehealthinformation.ca/documents/DeidTechniques.pdf ), coding is one technique that is commonly used to de-identify health datasets.

The Shostak paper also provides SAS routines for anchoring dates. This is a good approach for dealing with dates in longitudinal datasets and removes much plausible risk from using dates to re-identify individuals (at least plausible from a Canadian perspective).

Of course, these are just two methods to consider when de-identifying clinical trials data and in most instances they would just be considered as a starting point since there are many other risks to consider with clinical trials data.

September 21, 2009 | Permalink | Comments (0) | TrackBack (0)

21 CFR Part 11 book

I just wanted to bring to people's attention this book which discusses 21 CFR Part 11 and computer systems validation. Although a bit dated, much of the material is still relevant and a good one to have on your part 11 reading stack:

http://download.ehealthinformation.ca/cite/p11-guide.pdf

August 13, 2009 | Permalink | Comments (0) | TrackBack (0)

How many clinical trials are using EDC ?

We recently published a study whose purpose was to estimate the number of Canadian clinical trials (i.e., those with sites in Canada) that were using EDC. We focused on trials that were running during the 2006 and 2007 calendar years. We only considered trials that were registered in one of the two main clinical trials registries: clinicaltrials.gov or controlled-trials.com.

Our main conclusion was that 41% of all phase II/III/IV trials were using EDC. As expected, larger trials were more likely to use EDC and those with a commercial sponsor were more likely to use EDC.

The full study report was published in the Journal of Medical Internet Research (ranked as the number 1 academic journal in Medical Informatics) and is available freely here: http://www.jmir.org/2009/1/e8

Therefore, it seems that the adoption of EDC is Canada is increasing and that we are entering a rapid adoption phase on the Rogers technology diffusion model.

August 11, 2009 | Permalink | Comments (0) | TrackBack (0)

Open Source EDC

I have received a number of questions about the existence of an open source EDC system that can be use din the context of academic research. The best known one is OpenClinica and the people I know who have used it found it to work well. You can get more information about that tool from here: http://www.openclinica.org/

April 22, 2009 | Permalink | Comments (0) | TrackBack (0)

De-identifying protocol amendment test data for e-clinical trials

In the last posting I spoke about masking as a technology to help protect the privacy of patients when using real clinical trial data for testing protocol amendments. Masking is not enough, however. Let me illustrate through an example.

Let's assume that we are involved in a medical device trial. The data set contains the site information, as well as patient demographics (such as date of birth, gender), physical characteristics (eg, weight, height), a list of medications being taken by the patient, and the results of sensitive medical tests. There are no names nor patient addresses in the database. So ostensibly this is considered an anonymous database.

However, knowing the site gives us information about the community where the patient lives and a set of most likely Forward Sortation Areas (the first three characters of the postal code).

If the patient is quite old (older than 89 years, for example) then it would be quite easy to know who the patient is because people at that age are quite rare in any community. If the patient was too heavy or too tall/short for their age they would also be easier to re-identify because they would stand out within the community. If a neighbour/spouse/ex-spouse/employer knew that the patient participated in the trial they could re-identify the patient's record because individuals tend to be relatively unique on birthdate, gender, and geography. Uniqueness makes it easier to re-identify individuals using these background/demographic variables.

There are a number of de-identification techniques. These will often remove the highest risk records and perturb the data to make it less likely that an individual can be re-identified. For example, by adding a bit of noise to the date of birth or changing the day to the first of every month (effectively making it to a month and year).

Proper de-identification techniques will provide good protection to the clinical trials data making it suitable for testing purposes. We have published an article on de-identification recently, which you can access from the JAMIA site. This describes some improvements to k-anonymity, a popular de-identification framework, but also gives you an extensive literature review to follow-up on.

It should be noted that we are talking about generating data for testing here. De-identification often results in some records having to be suppressed. In clinical trials the cost of collecting data for each patient is so high, that it would be quite painful to suppress records.

July 28, 2008 | Permalink | Comments (0) | TrackBack (0)

Protecting test data privacy when testing protocol amendments

When testing an electronic clinical trial system after a protocol amendment, you will probably need to mask real data so that you can use it for testing. Masking is the first step in protecting the privacy of the patient data. In this entry I will explain what masking means.

If the original clinical trial data has names and addresses, for example, you cannot send this data across to the testers to test with. But these variables cannot be removed either because the testing cannot be done properly otherwise. To take a simple example, if there is a function to allow browsing of patients by their names sorted in alphabetical order, then the best way to test this function is to have real patient names.

The masking approach is to replace the real names with random names. The Census Bureau has a list of common North American names that can be used to replace the real names with the census names. You can do it in a gender correct way as well. There is a slight chance that the replacement random name is the same as the original name, but this will likely be very small and it is not possible for the tester to know where this has happened. Also, if you randomize the first and last names independently then the chances a replacement full name matching the original is really remote.

You can also mask addresses. For example, a postal code can be replaced with another randomly selected postal code from the same Forward Sortation Area, city, or province. But if postal codes are masked, then all other geographic information also needs to be consistently masked. For example, if there is a phone number or a street address it has to be equally distorted so that it is in the same postal code as the random replaced postal code.

Masking can also deal with health insurance card numbers. I have seen clinical studies that collect this information to facilitate linking to administrative databases at a later point in time. But this kind of information is additionally sensitive because it can facilitate medical identity theft.

Such masking must be done every time production data is used in testing.

There are some freely available masking tools for Canadian data sets that have been developed by our research group. If you are interested let me know. I can also point you to commercial tools.

The other type of protection that is needed is de-identification. This deals with residual privacy risk from the remaining data. That will be my next posting.

July 25, 2008 | Permalink | Comments (0) | TrackBack (0)

Getting good data for testing protocol amendments in your e-clinical trial

After making changes to an e-clinical trial, say due to a protocol amendment, it is necessary to re-validate the system. This necessitates running tests on the data collection forms, any logic embedded within them,  real-time or batch rules for the validation of the data, alerts and notifications, randomization setup, and reports.

But you would not want to do this testing on the real "production" data from the clinical trial. It will be necessary to stage the new version of the system somewhere and run tests on that. Once the tests pass then propagate the changes to the real "production" system.

Note here that I am not talking about testing any generic EDC software functionality, but the functionality pertinent to the clinical trial itself. If the EDC software itself changes, then that introduces its own set of issues that need to be addressed, as I discuss here.

Of course, some EDC systems do not differentiate between e-clinical trial changes and EDC system changes because the EDC is custom developed for a particular trial. In such a case any changes to the e-clinical trial, say new form validation logic, entails making changes to the EDC software. In those situations then the sponsor needs to make sure that both sets of issues are dealt with appropriately.

To do proper testing on a staged version of the e-clinical trial requires some data. For example, if you will enter valid and invalid data to test form logic or test the calculations in a report, you need to either enter new data or have data already in your test database. Where do you get this data from ?

There are three options. First, you can just copy data from the real clinical trial to the staging area and use that for testing. This is a very dangerous strategy and has serious privacy implications. The first question is whether the testers are authorized to access the personal health information of the patients on the trial ? Testers are not necessarily screened as diligently as other staff, and in some cases this work is outsourced to faraway lands where labor is cheap. The second issue is whether the information security infrastructure at the testing site is sufficient to be handling real patient data ? In most organizations the answers to both of these questions is no. This increases the risk of inadvertent privacy breaches. In many jurisdictions now it is a legal requirement to notify patients if their personal health information is leaked. If this happens it will not be very helpful for recruitment and retention of subjects - patients will not be very pleased if their sensitive health information is lost. So you do not want to be giving real patient data to your testers.

Also, there will likely be many protocol amendments, so this will have to be done repeatedly as the e-clinical trial system evolves.

The second option is to create artificial data to use for testing. This can work, but it is time consuming to create artificial data that is realistic and that reflects the actual distributions of your real clinical trial data. If the data is not realistic then there is also the risk that the testing will not uncover important flaws in the new version of the e-clinical trial.

The third  option is to anonymize real production data and use the anonymized data for testing. The advantages here are that this process can be fully automated (so it is easier to do repeatedly) and the data used in testing is as realistic as it gets.

There are two ways to anonymize production data: masking and de-identification. Ideally you would want to do both.

In subsequent posts I will discuss each of these two approaches to anonymizing real clinical trial data so that you can do effective and secure testing of protocol amendments.

July 23, 2008 | Permalink | Comments (0) | TrackBack (0)

Archiving data from e-clinical trials

Most national regulations require that the data from a clinical trial be archived and available for a number of years after the trial's completion. That number of years varies by country. This means that the clinical trial needs to have an archival strategy.

When using an EDC system, it is also important to archive the meta-data used during the trial.  Meta-data includes information about the eCRFs, the questions, and the validation logic. The meta-data plays a critical role in understanding how data was collected and handled during the study.

An obvious choice of format for archiving is the CDISC Operational Data Model (ODM). This is a de facto standard for representing clinical trial data in XML. The advantage of using this is that one would expect that in five or ten years' time there will be a company out there with a viewer for ODM that will allow a regulator or auditer to look at the ODM trial data in a usable way. This is the advantage of a standard - if the standard is adopted, which ODM is, a market emerges to support it.

Of course, an ODM viewer is only the starting point. What is really needed is an ODM Navigator that allows the user to query the data to extract subsets, to understand trends, and to look at data on a site by site basis.

However, ODM has important weaknesses as well that you need to be aware of. Of course, these may be addressed in future iterations of the standard.

The first is that it is not good at representing logic. Therefore any complex validation logic, calculations, and notifications will be difficult to capture in ODM format. That information can be critical for understanding what happened in a clinical trial many years from now.

Status information is also not so straight forward to represent. For example, whether a patient is withdrawn, excluded, lost to follow-up, and so on.

A second, and perhaps more critical issue, is that to be able to truly recreate a trial at any point it is necessary to maintain site specific snapshots of its meta-data and its data. Conceptually, this means that each site would need to have an ODM export before any change was made to the data.

As I mentioned in another posting, not all sites will be ready to adopt a protocol amendment at the same time: each site may be using a different version of the eCRFs at any one point.

Let's imagine we have two sites, A and B. Because they can get ethics approvals at different times, site A is able to deploy a protocol amendment before site B.  We will denote the eCRFs before the amendment as v1 and those after as v2. To archive this data we will have four snapshots: site A with v1 data, site A with v2 data, site B with v1 data, and site B with v2 data. The archive must distinguish among these four snapshots.

When examining an archiving solution, make sure that it can address this issue. Such a capability will save a lot of agony later on.

Although I did not discuss it here, audit trails also need to be part of the archive. They will also have the same versioning issue discussed above.

July 18, 2008 | Permalink | Comments (7) | TrackBack (0)

What's in an audit trail ?

Maintaining an audit trail is a 21 CFR Part 11 compliance requirement. But what makes a good audit trail that is effective and meets the regulation's intentions ?

I have seen audit trails that capture every single transaction that runs on a database. This is important to do because in some cases people will not come in through the front door, so to speak. Therefore, a detailed audit trail is needed to forensically analyze any intrusions.

In theory if you do that then you have met the letter of the regulations. But in practice this is not enough. And some auditors will not be satisfied with an audit trail that only a database expert who understands the exact data model behind the EDC system can interpret.

Audit trails must be viewable/accessible to end-users. For example, a site coordinator should be able to see all changes made to an eCRF, by who, and when, without having to go through SQL. So a subset of the audit trail must be consumable by end-users. This subset includes:

All modification to data and meta-data (eg, someone changes an eCRF design) All system logins and attempted logins All randomizations

An audit trail must include a time stamp, as well as the account name and IP address of the user.

The above information should be viewable by an end-user. Of course there needs to be access control on the audit trails so that a user cannot view information about another user or site that they are not allowed to see.

The importance of having audit trails viewable by end-users is evident when you consider that users can check changes and see who made them. This can help catch errors or even malicious attempts to manipulate data quite quickly.

Readily accessible audit trails are very useful for investigating unexpected changes to eCRFs and data, and to determine whether a potential security or privacy breach has resulted in inappropriate disclosure of personal information.

There are issues with storing such a large volume of data, but there are also good architectural solutions to make this work. Therefore, storage should not be a reason for having good audit trails.

July 16, 2008 | Permalink | Comments (0) | TrackBack (0)

Next »

About

Recent Posts

  • Microsoft sharepoint as an EDC or supporting an EDC ?
  • De-identification of clinical trials data
  • 21 CFR Part 11 book
  • How many clinical trials are using EDC ?
  • Open Source EDC
  • De-identifying protocol amendment test data for e-clinical trials
  • Protecting test data privacy when testing protocol amendments
  • Getting good data for testing protocol amendments in your e-clinical trial
  • Archiving data from e-clinical trials
  • What's in an audit trail ?
Subscribe to this blog's feed