RePGP

Tutorials using Python

Using only command line can become a bit cumbersome so using a higher level language such as python would be sufficient. There are two modules that could be used to execute vcf or bcf files.

Python Example of reading a VCF file using the ‘pysam’ module​

Install with:

$ sudo pip install pysam

Use the following code to print chromosome, position, id and genotype

#!/usr/bin/env python
from pysam import VariantFile
from pprint import pprint

filename = "repgp-data.vcf.gz"
bcf_in = VariantFile(filename)  # auto-detect input format

# The name of the first sample in this VCF file
sample = bcf_in.header.samples[0]

for rec in bcf_in.fetch('10', 10500000, 105500000):
    # Get the Genotype value for the sample, or 'None' if missing
    gt = rec.samples[sample].get('GT',None)

    print rec.chrom, rec.pos, rec.id, gt

Python Example of reading a VCF file using the ‘pyvcf’ module

Install with:

$ sudo pip install pyvcf

Use the following code to print chromosome, position, id and genotype

#!/usr/bin/env python
import vcf
from pprint import pprint

filename = "repgp-data.vcf.gz"

vcf_reader = vcf.Reader(open(filename, 'r'))

# The name of the first sample in this VCF file
sample = vcf_reader.samples[0]

for rec in vcf_reader.fetch('10', 10500000, 105500000):
    gt = rec.genotype(sample)['GT']
    print rec.CHROM, rec.POS, rec.ID, rec.REF, gt

Python Example of using Sqlite3

#!/usr/bin/env python
import sqlite3

conn = sqlite3.connect('repgp-data.sqlite3')
cursor = conn.cursor()
cursor.execute('SELECT \
        sample, \
        height,\
        weight \
        FROM users')
for row in cursor:
        print row