# Requests, The Web, JSON, and APIs

# The Web
* Client/Server
* HTML (Hyper Text Markup Language)
* Java Script
* JSON (Java Script Object Notation)
* The Browser

# HTTP, Methods, Reponses
* Hyper Text Transfer Protocol
* GET, POST, PUT...
* Response Codes: 1xx, 2xx, 3xx, 4xx, 5xx


# GET
* The most common method
* Used to request data from a specified resource

# POST
* Method for sending data
* Typically used to update or create resources
* Is the cause of problems when you try to hit back in your browser


# Other Methods
* PUT: Update resources
* DELETE: Delete resources
* Still Other Methods: PATCH/OPTION/HEAD Less commonly used

# Response Codes
* 1xx Informational, uncommon
* 200 Success response, most common response
* 3xx Redirect, page has moved etc.
* 4xx Your problem
* 5xx Their problem

# Requests
* HTTP for Humans
* Simple methods for getting, posting, and authenticating with websites

In [1]:
import requests

In [2]:
google_response = requests.get('http://google.com')
google_response.status_code

200

In [3]:
len(google_response.content)
# google_response.content

12459

https://www.dictionary.com/browse/biology

In [6]:
import os
os.environ['http_proxy'] = "http://proxy.swmed.edu:3128" 
os.environ['https_proxy'] = "https://proxy.swmed.edu:3128" 
dictionary_search = requests.get('http://www.dictionary.com/browse/biology')
dictionary_search.status_code

200

In [8]:
dictionary_search.content

b'<!DOCTYPE html>\n    <html lang="en" prefix="og: http://ogp.me/ns#">\n      <head>\n        \n  <meta charSet="utf-8">\n  <meta name="description" content="Biology definition, the science of life or living matter in all its forms and phenomena, especially with reference to origin, growth, reproduction, structure, and behavior. See more.">\n  <link rel="canonical" href="https://www.dictionary.com/browse/biology">\n  <link rel="preload" href="https://www.dictionary.com/assets/dictionary-font-c864a1ca3bc4a1c7aef16dbbf189ccc1.woff" as="font" type="font/woff" crossorigin="anonymous">\n  \n  <meta property="og:title" content="the definition of biology">\n  <meta property="og:image" content="https://www.dictionary.com/assets/dcom-social-icon-952ee83e140e5bfbd2d4b6c28daa1a8f.png">\n  <meta property="og:site_name" content="www.dictionary.com">\n  <meta property="fb:app_id" content="127444090629600"><meta property="fb:admins" content="100000304287730;109125464873">\n  <meta name="msvalidate.01

In [9]:
start = dictionary_search.text.find('Biology')
start

151

In [10]:
finish = dictionary_search.text.find(' See more')
finish

323

In [11]:
definition = dictionary_search.text[start:finish]
definition

'Biology definition, the science of life or living matter in all its forms and phenomena, especially with reference to origin, growth, reproduction, structure, and behavior.'

In [12]:
def dictionary_search(word):
    url = 'https://dictionary.com/browse/' + word
    response = requests.get(url)
    start = response.text.find(word)
    finish = response.text.find(' See more')
    definition = response.text[start:finish]
    return definition

In [13]:
print(dictionary_search('Biology'))

Biology definition, the science of life or living matter in all its forms and phenomena, especially with reference to origin, growth, reproduction, structure, and behavior.


In [14]:
print(dictionary_search('DNA'))

DNA is encoded in the sequence of the bases and is transcribed as the strands unwind and replicate.


In [15]:
dna_search = requests.get('https://dictionary.com/browse/DNA')
dna_search.content

b'<!DOCTYPE html>\n    <html lang="en" prefix="og: http://ogp.me/ns#">\n      <head>\n        \n  <meta charSet="utf-8">\n  <meta name="description" content="Dna definition, deoxyribonucleic acid: an extremely long macromolecule that is the main component of chromosomes and is the material that transfers genetic characteristics in all life forms, constructed of two nucleotide strands coiled around each other in a ladderlike arrangement with the sidepieces composed of alternating phosphate and deoxyribose units and the rungs composed of the purine and pyrimidine bases adenine, guanine, cytosine, and thymine: the genetic information of DNA is encoded in the sequence of the bases and is transcribed as the strands unwind and replicate. See more.">\n  <link rel="canonical" href="https://www.dictionary.com/browse/dna">\n  <link rel="preload" href="https://www.dictionary.com/assets/dictionary-font-c864a1ca3bc4a1c7aef16dbbf189ccc1.woff" as="font" type="font/woff" crossorigin="anonymous">\n  \n

In [16]:
dictionary_search('biology')

''

In [17]:
# Perhaps we can fix it
def dictionary_search(word):
    url = 'https://dictionary.com/browse/' + word
    response = requests.get(url)
    start = response.text.find(word)
    finish = response.text.find(' See more')
    definition = response.text[start:finish]
    return definition

In [18]:
dictionary_search('DNA')
dictionary_search('biology')

''

# API
* Application Programming Interface
* Web API
* XML/JSON
* Much easier to use than scraping

# JSON
* Java Script Object Notation
* Similar to Python dictionaries
* Key/Value store
* Incredibly common data interchange format on the web and elsewhere

In [19]:
import json
data = {'gene_name': 'BRAF'}
json.dumps(data)

'{"gene_name": "BRAF"}'

In [20]:
data['patient_count'] = 3
json.dumps(data)

'{"gene_name": "BRAF", "patient_count": 3}'

In [21]:
data['is_alive'] = True
json.dumps(data)

'{"gene_name": "BRAF", "patient_count": 3, "is_alive": true}'

In [22]:
data['phone_number'] = None
json.dumps(data)

'{"gene_name": "BRAF", "patient_count": 3, "is_alive": true, "phone_number": null}'

In [23]:
data['more_data'] = {}
json.dumps(data)

'{"gene_name": "BRAF", "patient_count": 3, "is_alive": true, "phone_number": null, "more_data": {}}'

In [24]:
data_table = [data, 1]
json.dumps(data_table)

'[{"gene_name": "BRAF", "patient_count": 3, "is_alive": true, "phone_number": null, "more_data": {}}, 1]'

In [25]:
input_data = json.loads('{"gene_name": "BRAF"}')
print(input_data)
type(input_data)

{'gene_name': 'BRAF'}


dict

https://mygene.info

In [26]:
query = {'q': 'BRAF'}
response = requests.get('http://mygene.info/v3/query', params=query)
braf_data = response.json()
braf_data

{'max_score': 435.33853,
 'took': 11,
 'total': 386,
 'hits': [{'_id': '673',
   '_score': 435.33853,
   'entrezgene': '673',
   'name': 'B-Raf proto-oncogene, serine/threonine kinase',
   'symbol': 'BRAF',
   'taxid': 9606},
  {'_id': '109880',
   '_score': 358.14212,
   'entrezgene': '109880',
   'name': 'Braf transforming gene',
   'symbol': 'Braf',
   'taxid': 10090},
  {'_id': '114486',
   '_score': 309.06714,
   'entrezgene': '114486',
   'name': 'B-Raf proto-oncogene, serine/threonine kinase',
   'symbol': 'Braf',
   'taxid': 10116},
  {'_id': '112945628',
   '_score': 285.43057,
   'entrezgene': '112945628',
   'name': 'B-Raf proto-oncogene, serine/threonine kinase',
   'symbol': 'BRAF',
   'taxid': 30464},
  {'_id': '109296160',
   '_score': 285.43057,
   'entrezgene': '109296160',
   'name': 'B-Raf proto-oncogene, serine/threonine kinase',
   'symbol': 'BRAF',
   'taxid': 94835},
  {'_id': '103769301',
   '_score': 285.43057,
   'entrezgene': '103769301',
   'name': 'B-Raf pr

In [27]:
braf_data['hits']

[{'_id': '673',
  '_score': 435.33853,
  'entrezgene': '673',
  'name': 'B-Raf proto-oncogene, serine/threonine kinase',
  'symbol': 'BRAF',
  'taxid': 9606},
 {'_id': '109880',
  '_score': 358.14212,
  'entrezgene': '109880',
  'name': 'Braf transforming gene',
  'symbol': 'Braf',
  'taxid': 10090},
 {'_id': '114486',
  '_score': 309.06714,
  'entrezgene': '114486',
  'name': 'B-Raf proto-oncogene, serine/threonine kinase',
  'symbol': 'Braf',
  'taxid': 10116},
 {'_id': '112945628',
  '_score': 285.43057,
  'entrezgene': '112945628',
  'name': 'B-Raf proto-oncogene, serine/threonine kinase',
  'symbol': 'BRAF',
  'taxid': 30464},
 {'_id': '109296160',
  '_score': 285.43057,
  'entrezgene': '109296160',
  'name': 'B-Raf proto-oncogene, serine/threonine kinase',
  'symbol': 'BRAF',
  'taxid': 94835},
 {'_id': '103769301',
  '_score': 285.43057,
  'entrezgene': '103769301',
  'name': 'B-Raf proto-oncogene, serine/threonine kinase',
  'symbol': 'BRAF',
  'taxid': 57421},
 {'_id': '100219

In [28]:
braf_id = braf_data['hits'][0]['_id']
braf_id

'673'

In [29]:
braf_response = requests.get('http://mygene.info/v3/gene/' + braf_id)
braf_response.json()

{'HGNC': '1097',
 'MIM': '164757',
 'Vega': 'OTTHUMG00000157457',
 '_id': '673',
 '_score': 12.80121,
 'accession': {'genomic': ['AC006006.2',
   'AC006344.2',
   'AC006347.2',
   'AC079339.5',
   'CH236950.1',
   'CH471070.1',
   'EU600171.1',
   'HB432546.1',
   'HC464558.1',
   'HM459603.1',
   'HQ224878.1',
   'KF481581.1',
   'KT584890.1',
   'KY769663.1',
   'KY769664.1',
   'KY769665.1',
   'KY769666.1',
   'KY769667.1',
   'KY769668.1',
   'NC_000007.14',
   'NG_007873.3',
   'X65187.1'],
  'protein': ['AAA35609.2',
   'AAA96495.1',
   'AAD15551.1',
   'AAD43193.1',
   'AAI01758.1',
   'AAI12080.1',
   'AAS00359.1',
   'ACD11489.1',
   'ADN43065.1',
   'ADQ00186.1',
   'ADX94397.1',
   'AIE38317.1',
   'ARR27440.1',
   'ARR27441.1',
   'ARR27442.1',
   'ARR27443.1',
   'ARR27444.1',
   'ARR27445.1',
   'CAA46301.1',
   'CAB81553.1',
   'CAZ68014.1',
   'CBK51920.1',
   'EAL24023.1',
   'EAW83964.1',
   'EAW83965.1',
   'NP_001341538.1',
   'NP_004324.2',
   'P15056.4',
   'XP_0

In [30]:
braf_info = braf_response.json()
braf_info['summary']

'This gene encodes a protein belonging to the RAF family of serine/threonine protein kinases. This protein plays a role in regulating the MAP kinase/ERK signaling pathway, which affects cell division, differentiation, and secretion. Mutations in this gene, most commonly the V600E mutation, are the most frequently identified cancer-causing mutations in melanoma, and have been identified in various other cancers as well, including non-Hodgkin lymphoma, colorectal cancer, thyroid carcinoma, non-small cell lung carcinoma, hairy cell leukemia and adenocarcinoma of lung. Mutations in this gene are also associated with cardiofaciocutaneous, Noonan, and Costello syndromes, which exhibit overlapping phenotypes. A pseudogene of this gene has been identified on the X chromosome. [provided by RefSeq, Aug 2017].'

In [31]:
payload = {'ids': '673,1017', 'species': 'human'}
response = requests.post('http://mygene.info/v3/gene', data=payload)
response.json()

[{'query': '673',
  'HGNC': '1097',
  'MIM': '164757',
  'Vega': 'OTTHUMG00000157457',
  '_id': '673',
  '_score': 20.348886,
  'accession': {'genomic': ['AC006006.2',
    'AC006344.2',
    'AC006347.2',
    'AC079339.5',
    'CH236950.1',
    'CH471070.1',
    'EU600171.1',
    'HB432546.1',
    'HC464558.1',
    'HM459603.1',
    'HQ224878.1',
    'KF481581.1',
    'KT584890.1',
    'KY769663.1',
    'KY769664.1',
    'KY769665.1',
    'KY769666.1',
    'KY769667.1',
    'KY769668.1',
    'NC_000007.14',
    'NG_007873.3',
    'X65187.1'],
   'protein': ['AAA35609.2',
    'AAA96495.1',
    'AAD15551.1',
    'AAD43193.1',
    'AAI01758.1',
    'AAI12080.1',
    'AAS00359.1',
    'ACD11489.1',
    'ADN43065.1',
    'ADQ00186.1',
    'ADX94397.1',
    'AIE38317.1',
    'ARR27440.1',
    'ARR27441.1',
    'ARR27442.1',
    'ARR27443.1',
    'ARR27444.1',
    'ARR27445.1',
    'CAA46301.1',
    'CAB81553.1',
    'CAZ68014.1',
    'CBK51920.1',
    'EAL24023.1',
    'EAW83964.1',
    'EAW8

In [32]:
for gene in response.json():
    print(gene['symbol'] + ':', gene['summary'])

BRAF: This gene encodes a protein belonging to the RAF family of serine/threonine protein kinases. This protein plays a role in regulating the MAP kinase/ERK signaling pathway, which affects cell division, differentiation, and secretion. Mutations in this gene, most commonly the V600E mutation, are the most frequently identified cancer-causing mutations in melanoma, and have been identified in various other cancers as well, including non-Hodgkin lymphoma, colorectal cancer, thyroid carcinoma, non-small cell lung carcinoma, hairy cell leukemia and adenocarcinoma of lung. Mutations in this gene are also associated with cardiofaciocutaneous, Noonan, and Costello syndromes, which exhibit overlapping phenotypes. A pseudogene of this gene has been identified on the X chromosome. [provided by RefSeq, Aug 2017].
CDK2: This gene encodes a member of a family of serine/threonine protein kinases that participate in cell cycle regulation. The encoded protein is the catalytic subunit of the cyclin-d

In [33]:
braf_info['generif']

[{'pubmed': 8621729, 'text': 'MEK1 interacts with B-Raf.'},
 {'pubmed': 12068308,
  'text': 'somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers'},
 {'pubmed': 12198537,
  'text': 'BRAF mutations in colorectal cancers occur only in tumours that do not carry mutations in a RAS gene known as KRAS, and BRAF mutation is linked to the proficiency of these tumours in repairing mismatched bases in DNA'},
 {'pubmed': 12447372, 'text': 'High frequency of BRAF mutations in nevi'},
 {'pubmed': 12619120,
  'text': 'The V599E BRAF mutation appears to be a somatic mutation associated with melanoma development and/or progression in a proportion of affected individuals.'},
 {'pubmed': 12644542,
  'text': 'results demonstrate that the mutational status of BRAF and KRAS is distinctly different among histologic types of ovarian serous carcinoma, occurring most frequently in invasive micropapillary serous carcinomas and its precursors, serous bo

In [34]:
len(braf_info['generif'])

2000

In [35]:
print('https://www.ncbi.nlm.nih.gov/pubmed/12068308')

https://www.ncbi.nlm.nih.gov/pubmed/12068308


In [36]:
def gene_search(name):
    query = {'q': name}
    response = requests.get('http://mygene.info/v3/query',params=query )
    entrez_id = response.json()['hits'][0]['_id']
    gene_response = requests.get('http://mygene.info/v3/gene/' + entrez_id)
    return gene_response.json()


In [37]:
cdk2_info = gene_search('CDK2')
cdk2_info

{'HGNC': '1771',
 'MIM': '116953',
 'Vega': 'OTTHUMG00000170575',
 '_id': '1017',
 '_score': 13.168399,
 'accession': {'genomic': ['AC025162.48',
   'AC034102.32',
   'AF512553.1',
   'AJ223951.1',
   'CH471054.1',
   'KT584459.1',
   'NC_000012.12',
   'NG_034014.1',
   'U50730.2'],
  'protein': ['AAA35667.1',
   'AAH03065.1',
   'AAM34794.1',
   'AAP35467.1',
   'BAA32794.1',
   'BAF84630.1',
   'BAG56780.1',
   'CAA43807.1',
   'CAA43985.1',
   'EAW96856.1',
   'EAW96857.1',
   'EAW96858.1',
   'EAW96859.1',
   'EAW96860.1',
   'NP_001277159.1',
   'NP_001789.2',
   'NP_439892.2',
   'P24941.2',
   'XP_011536034.1'],
  'rna': ['AA789250.1',
   'AA810989.1',
   'AB012305.1',
   'AK291941.1',
   'AK293246.1',
   'BC003065.2',
   'BJ991087.1',
   'BT006821.1',
   'DA814453.1',
   'M68520.1',
   'NM_001290230.1',
   'NM_001798.4',
   'NM_052827.3',
   'X61622.1',
   'X62071.1',
   'XM_011537732.2'],
  'translation': [{'protein': 'BAA32794.1', 'rna': 'AB012305.1'},
   {'protein': 'BAF846

In [38]:
gene_info = gene_search('THP2')
len(gene_info['generif'])
gene_info['generif'][0]

{'pubmed': 17960421,
 'text': 'Recombination and transcription analyses indicate that THO/TREX mutants share a moderate but significant effect on gene conversion and ectopic recombination, as well as transcription impairment of even short and low GC-content genes.'}

In [39]:
def print_pubmeds(gene_name):
    gene_info = gene_search(gene_name)
    for reference in gene_info['generif']:
        print('https://www.ncbi.nlm.nih.gov/pubmed/' + str(reference['pubmed']))
        print(reference['text'])


In [40]:
print_pubmeds('THP2')

https://www.ncbi.nlm.nih.gov/pubmed/19151352
peptides are 36 amino acids long including a highly conserved region with 6 invariant cysteines forming the 3 disulfide bonds characteristic of defensins


In [41]:
print_pubmeds('BRAF')

https://www.ncbi.nlm.nih.gov/pubmed/8621729
MEK1 interacts with B-Raf.
https://www.ncbi.nlm.nih.gov/pubmed/12068308
somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers
https://www.ncbi.nlm.nih.gov/pubmed/12198537
BRAF mutations in colorectal cancers occur only in tumours that do not carry mutations in a RAS gene known as KRAS, and BRAF mutation is linked to the proficiency of these tumours in repairing mismatched bases in DNA
https://www.ncbi.nlm.nih.gov/pubmed/12447372
High frequency of BRAF mutations in nevi
https://www.ncbi.nlm.nih.gov/pubmed/12619120
The V599E BRAF mutation appears to be a somatic mutation associated with melanoma development and/or progression in a proportion of affected individuals.
https://www.ncbi.nlm.nih.gov/pubmed/12644542
results demonstrate that the mutational status of BRAF and KRAS is distinctly different among histologic types of ovarian serous carcinoma, occurring most frequently in invasive m

BRAF V600E mutations occurs infrequently in endometrial cancer.
https://www.ncbi.nlm.nih.gov/pubmed/23883275
Shifted termination assay fragment analysis can detect BRAF V600 mutations in formalin-fixed paraffin-embedded papillary thyroid carcinoma samples.
https://www.ncbi.nlm.nih.gov/pubmed/23887306
Classification of serrated lesions using immunohistochemical evaluation of BRAF V600E mutation may identify lesions with higher potential to progression into sessile serrated adenoma/polyp, and further to BRAF V600E-mutated colorectal cancer.
https://www.ncbi.nlm.nih.gov/pubmed/23893334
TIMP-1 protein expression is a reliable surrogate marker for BRAF(V600E)-mutated status in papillary thyroid carcinoma
https://www.ncbi.nlm.nih.gov/pubmed/23893412
Ras-mutant cancer cells display B-Raf binding to Ras that activates extracellular signal-regulated kinase and is inhibited by protein kinase A phosphorylation
https://www.ncbi.nlm.nih.gov/pubmed/23903755
Suppression of TORC1 activity in response 

In [42]:
def print_pubmeds_recent(gene_name):
    gene_info = gene_search(gene_name)
    for reference in gene_info['generif'][::-1]:
        print('https://www.ncbi.nlm.nih.gov/pubmed/' + str(reference['pubmed']))
        print(reference['text'])

In [43]:
print_pubmeds_recent('BRAF')

https://www.ncbi.nlm.nih.gov/pubmed/30224486
MET inactivation in the context of the BRAF-activating mutation is driven through a negative feedback loop involving inactivation of PP2A phosphatase, which in turn leads to phosphorylation on MET inhibitory Ser985.
https://www.ncbi.nlm.nih.gov/pubmed/30220118
The occurrence of BRAF V600E mutations in ganglioglioma is common, and their detection may be valuable for the diagnosis and treatment in ganglioglioma.
https://www.ncbi.nlm.nih.gov/pubmed/30150413
Data show that glycogen synthase kinase 3 (GSK3) and proto-oncogene proteins B-raf (BRAF)/MAPK signaling converges to control microphthalmia-associated transcription factor MITF (MITF) nuclear export.
https://www.ncbi.nlm.nih.gov/pubmed/30010109
these results indicated that STAT3-mediated downexpression of miR-579-3p caused resistance to vemurafenib. Our findings suggest novel approaches to overcome resistance to vemurafenib by combining vemurafenib with STAT3 sliencing or miR-579-3p overexp

MSI status, KRAS and BRAF mutation rates varied remarkably among the colonic carcinoma subsites irrespective of right- and left-sided origin.
https://www.ncbi.nlm.nih.gov/pubmed/22892521
The results of this study supported an important role for BRAF duplication and MAPK pathway activation in gliomas of the optic nerve proper.
https://www.ncbi.nlm.nih.gov/pubmed/22887810
A K601E BRAF mutation is associated with papillary thyroid carcinoma.
https://www.ncbi.nlm.nih.gov/pubmed/22880048
This study reveals a novel molecular mechanism underlying the regulation of feedback loops between the MAPK and AKT pathways.
https://www.ncbi.nlm.nih.gov/pubmed/22879539
High prevalence of BRAF V600E mutations is associated with Erdheim-Chester disease but not in other non-Langerhans cell histiocytoses.
https://www.ncbi.nlm.nih.gov/pubmed/22876591
Cardio-facio-cutaneous syndrome is caused by heterogeneous mutations in BRAF gene.
https://www.ncbi.nlm.nih.gov/pubmed/22870241
High-throughput genotyping in met

In [44]:
def print_pubmeds_most_recent(gene_name):
    gene_info = gene_search(gene_name)
    for reference in gene_info['generif'][:-10:-1]:
        print('https://www.ncbi.nlm.nih.gov/pubmed/' + str(reference['pubmed']))
        print(reference['text'])

In [45]:
print_pubmeds_most_recent('BRAF')

https://www.ncbi.nlm.nih.gov/pubmed/30224486
MET inactivation in the context of the BRAF-activating mutation is driven through a negative feedback loop involving inactivation of PP2A phosphatase, which in turn leads to phosphorylation on MET inhibitory Ser985.
https://www.ncbi.nlm.nih.gov/pubmed/30220118
The occurrence of BRAF V600E mutations in ganglioglioma is common, and their detection may be valuable for the diagnosis and treatment in ganglioglioma.
https://www.ncbi.nlm.nih.gov/pubmed/30150413
Data show that glycogen synthase kinase 3 (GSK3) and proto-oncogene proteins B-raf (BRAF)/MAPK signaling converges to control microphthalmia-associated transcription factor MITF (MITF) nuclear export.
https://www.ncbi.nlm.nih.gov/pubmed/30010109
these results indicated that STAT3-mediated downexpression of miR-579-3p caused resistance to vemurafenib. Our findings suggest novel approaches to overcome resistance to vemurafenib by combining vemurafenib with STAT3 sliencing or miR-579-3p overexp

In [46]:
print_pubmeds_most_recent('THP2')

https://www.ncbi.nlm.nih.gov/pubmed/24084588
The THO complex component Thp2 counteracts telomeric R-loops and telomere shortening.
https://www.ncbi.nlm.nih.gov/pubmed/17960421
Recombination and transcription analyses indicate that THO/TREX mutants share a moderate but significant effect on gene conversion and ectopic recombination, as well as transcription impairment of even short and low GC-content genes.
