CoCalc Public FilesPublic / spacytest.ipynbOpen with one click!
Author: Hal Snyder
Views : 197
Description: Setup to use spaCy library for natural language processing
Compute Environment: Ubuntu 18.04 (Deprecated)

Natural language processing with spaCy library in CoCalc

Home page: spaCy

Jupyter kernel

Run this notebook with the Python 3 (Ubuntu Linux) jupyter kernel.

Setting up

Installing python packages in a CoCalc project should be done as user or in a virtual environment, e.g. with anaconda or virtualenv. This example follows the user-install approach.

Install spacy and dependencies.

Takes about a minute. Open a .term file in CoCalc for the following steps:

~$ time pip3 install --user spacy
...
Installing collected packages: cymem, preshed, plac, pathlib, murmurhash, msgpack-numpy, cytoolz, thinc, regex, spacy
Successfully installed cymem-1.31.2 cytoolz-0.8.2 msgpack-numpy-0.4.1 murmurhash-0.28.0 pathlib-1.0.1 plac-0.9.6 preshed-1.0.0 regex-2017.4.5 spacy-2.
0.11 thinc-6.10.2
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
 
real    0m51.041s
user    0m37.709s
sys     0m7.039s

Install one or more models.

Instructions for installing models recommend the spacy download command, which selects the model version for the current installation. This command will fail, because CoCalc user does not have permissions to the directory. But running this command gives the path to the selected model.

~$ python3 -m spacy download en
Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz
...
    error: could not create '/usr/lib/python3.5/site-packages/en_core_web_sm': Read-only file system

Now use pip3 to install the model.

$ pip3 install --user \
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz 

Ready to go.

After the above steps, this notebook, which follows the example at the spacy website, can be run.

In [1]:
# check kernel - verify we're running Python 3.5 or later import sys print(sys.version)
3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609]
In [2]:
import spacy
In [3]:
# Load English tokenizer, tagger, parser, NER and word vectors nlp = spacy.load('en_core_web_sm')
In [4]:
# Process whole documents text = (u"When Sebastian Thrun started working on self-driving cars at " u"Google in 2007, few people outside of the company took him " u"seriously. “I can tell you very senior CEOs of major American " u"car companies would shake my hand and turn away because I wasn’t " u"worth talking to,” said Thrun, now the co-founder and CEO of " u"online higher education startup Udacity, in an interview with " u"Recode earlier this week.") doc = nlp(text)
In [5]:
# Find named entities, phrases and concepts for entity in doc.ents: print(entity.text, entity.label_)
Sebastian Thrun PERSON Google ORG 2007 DATE American NORP Thrun PERSON Recode ORG earlier this week DATE
In [6]:
# Determine semantic similarities doc1 = nlp(u"my fries were super gross") doc2 = nlp(u"such disgusting fries") similarity = doc1.similarity(doc2) print(doc1.text, doc2.text, similarity)
my fries were super gross such disgusting fries 0.7139700916321534
In [ ]: