Run this notebook with the Python 3 (Ubuntu Linux) jupyter kernel.
Installing python packages in a CoCalc project should be done as user or in a virtual environment, e.g. with anaconda or virtualenv. This example follows the user-install approach.
Install spacy and dependencies.
Takes about a minute. Open a .term file in CoCalc for the following steps:
~$ time pip3 install --user spacy
Installing collected packages: cymem, preshed, plac, pathlib, murmurhash, msgpack-numpy, cytoolz, thinc, regex, spacy
Successfully installed cymem-1.31.2 cytoolz-0.8.2 msgpack-numpy-0.4.1 murmurhash-0.28.0 pathlib-1.0.1 plac-0.9.6 preshed-1.0.0 regex-2017.4.5 spacy-2.
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Install one or more models.
Instructions for installing models recommend the spacy download command, which selects the model version for the current installation. This command will fail, because CoCalc user does not have permissions to the directory. But running this command gives the path to the selected model.
~$ python3 -m spacy download en
error: could not create '/usr/lib/python3.5/site-packages/en_core_web_sm': Read-only file system
3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
# Load English tokenizer, tagger, parser, NER and word vectorsnlp=spacy.load('en_core_web_sm')
# Process whole documentstext=(u"When Sebastian Thrun started working on self-driving cars at "u"Google in 2007, few people outside of the company took him "u"seriously. “I can tell you very senior CEOs of major American "u"car companies would shake my hand and turn away because I wasn’t "u"worth talking to,” said Thrun, now the co-founder and CEO of "u"online higher education startup Udacity, in an interview with "u"Recode earlier this week.")doc=nlp(text)
# Find named entities, phrases and conceptsforentityindoc.ents:print(entity.text,entity.label_)
Sebastian Thrun PERSON
earlier this week DATE
# Determine semantic similaritiesdoc1=nlp(u"my fries were super gross")doc2=nlp(u"such disgusting fries")similarity=doc1.similarity(doc2)print(doc1.text,doc2.text,similarity)
my fries were super gross such disgusting fries 0.7139700916321534