Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 17874
Image: ubuntu2004
Kernel: Python 3 (Anaconda 2020)
!pip install pandas_read_xml
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: pandas_read_xml in /home/user/.local/lib/python3.7/site-packages (0.3.1) Requirement already satisfied: pandas in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (1.1.5) Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (3.0.0) Requirement already satisfied: requests in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (2.24.0) Requirement already satisfied: xmltodict in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.12.0) Requirement already satisfied: urllib3>=1.26.3 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (1.26.4) Requirement already satisfied: zipfile36 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (0.1.3) Requirement already satisfied: distlib in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.3.1) Requirement already satisfied: python-dateutil>=2.7.3 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2.8.0) Requirement already satisfied: pytz>=2017.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2019.3) Requirement already satisfied: numpy>=1.15.4 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (1.18.5) Requirement already satisfied: idna<3,>=2.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2.8) Requirement already satisfied: certifi>=2017.4.17 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2020.12.5) Requirement already satisfied: chardet<4,>=3.0.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (3.0.4) Requirement already satisfied: six>=1.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->pandas_read_xml) (1.14.0)
import pandas_read_xml as pdxi #from pandas_read_xml import flatten, fully_flatten, auto_separate_tables
/ext/anaconda2020.02/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)
from pandas_read_xml import flatten, fully_flatten, auto_separate_tables
from pathlib import Path textfiles_directory = "/home/user/Final Project/Metadata"
files = Path(textfiles_directory) for file in files.iterdir(): print(file.name)
journal-article-10.1086_421828.xml journal-article-10.1086_381213.xml journal-article-10.1086_424411.xml journal-article-10.1086_427313.xml journal-article-10.1086_382331.xml
for file in files.iterdir(): with open(file) as f: content = f.readlines()
content
['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172402</article-id>\n', ' <article-id pub-id-type="doi">10.1086/382331</article-id>\n', ' <article-id pub-id-type="msid">JR84142</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Robert\xa0Ford</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Campany</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Indiana University.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>01</month>\n', ' <year>2004</year>\n', ' <string-date>January 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>1</issue>\n', ' <issue-id>jr.2004.84.issue-1</issue-id>\n', ' <fpage>153</fpage>\n', ' <lpage>154</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Sharf,\xa0</surname>\n', ' <given-names>Robert\xa0H.</given-names>\n', ' </string-name> \n', ' <source>Coming to Terms with Chinese Buddhism: A Reading of the Treasure Store Treatise</source>. Kuroda Institute, Studies in East Asian Buddhism 14. Honolulu: University of Hawaii Press, 2002. xiii+499 pp. $47.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/382331"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n']
all_content = {} for file in files.iterdir(): with open(file) as f: content = f.readlines() all_content[file.name] = content
all_content
{'journal-article-10.1086_421828.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172345</article-id>\n', ' <article-id pub-id-type="doi">10.1086/421828</article-id>\n', ' <article-id pub-id-type="msid">JR84239</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>David\xa0W.</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Chappell</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Soka University of America.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2004</year>\n', ' <string-date>April 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2004.84.issue-2</issue-id>\n', ' <fpage>331</fpage>\n', ' <lpage>332</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Blum,\xa0</surname>\n', ' <given-names>Mark</given-names>\n', ' </string-name>. <source>The Origins and Development of Pure Land Buddhism: A Study and Translation of Gyōnen’s “Jōdo Hōmon Genrushō.”</source> New York: Oxford University Press, 2002. xxi+470 pp. $55.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/421828"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_381213.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="review-article"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172304</article-id>\n', ' <article-id pub-id-type="doi">10.1086/381213</article-id>\n', ' <article-id pub-id-type="msid">JR840204</article-id>\n', '\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Review Article</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title>Enlightened Genealogies of Religion: Edward Gibbon and His Contemporaries*</article-title>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>W.\xa0Clark</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Gilpin</surname>\n', ' </string-name>\n', ' </contrib>\n', ' <aff id="aff_1">University of Chicago</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2004</year>\n', ' <string-date>April 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2004.84.issue-2</issue-id>\n', ' <fpage>256</fpage>\n', ' <lpage>263</lpage>\n', ' <product xlink:type="simple">*\u2009<string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>The Enlightenments of Edward Gibbon, 1737–1764</source>, vol. 1 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xv+339 pp.; <string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>Narratives of Civil Government</source>, vol. 2 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xiv+422 pp., $90.00 (cloth; vols. 1 and 2); <string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>The First Decline and Fall</source>, vol. 3 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 2003), xiii+527 pp., $60.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>© 2004 by The University of Chicago. All rights reserved.</copyright-statement>\n', ' <copyright-year>2004</copyright-year>\n', ' <copyright-holder>The University of Chicago</copyright-holder>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/381213"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_424411.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3591416</article-id>\n', ' <article-id pub-id-type="doi">10.1086/424411</article-id>\n', ' <article-id pub-id-type="msid">JR84316</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Nicholas</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Koss</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Fu Jen Catholic University.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>07</month>\n', ' <year>2004</year>\n', ' <string-date>July 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>3</issue>\n', ' <issue-id>jr.2004.84.issue-3</issue-id>\n', ' <fpage>472</fpage>\n', ' <lpage>474</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Lozada,\xa0</surname>\n', ' <given-names>Eriberto\xa0P.,\xa0Jr.</given-names>\n', ' </string-name> \n', ' <source>God Aboveground: Catholic Church, Postsocialist State, and Transnational Processes in a Chinese Village</source>. Stanford, Calif.: Stanford University Press, 2001. xii+250 pp. $45.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/424411"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_427313.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="research-article"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3591443</article-id>\n', ' <article-id pub-id-type="doi">10.1086/427313</article-id>\n', ' <article-id pub-id-type="msid">JR850201</article-id>\n', '\n', ' <title-group>\n', ' <article-title>A Correlational Model of Comparative Theology</article-title>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Hugh</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Nicholson</surname>\n', ' </string-name>\n', ' <x xml:space="preserve"> </x>\n', ' </contrib>\n', ' <aff id="aff_1">Coe College</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2005</year>\n', ' <string-date>April 2005</string-date>\n', ' </pub-date>\n', ' <volume>85</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2005.85.issue-2</issue-id>\n', ' <fpage>191</fpage>\n', ' <lpage>213</lpage>\n', ' <permissions>\n', ' <copyright-statement>© 2005 by The University of Chicago. All rights reserved.</copyright-statement>\n', ' <copyright-year>2005</copyright-year>\n', ' <copyright-holder>The University of Chicago</copyright-holder>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/427313"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', ' <back>\n', '</back>\n', '</article>\n', '\n'], 'journal-article-10.1086_382331.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172402</article-id>\n', ' <article-id pub-id-type="doi">10.1086/382331</article-id>\n', ' <article-id pub-id-type="msid">JR84142</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Robert\xa0Ford</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Campany</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Indiana University.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>01</month>\n', ' <year>2004</year>\n', ' <string-date>January 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>1</issue>\n', ' <issue-id>jr.2004.84.issue-1</issue-id>\n', ' <fpage>153</fpage>\n', ' <lpage>154</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Sharf,\xa0</surname>\n', ' <given-names>Robert\xa0H.</given-names>\n', ' </string-name> \n', ' <source>Coming to Terms with Chinese Buddhism: A Reading of the Treasure Store Treatise</source>. Kuroda Institute, Studies in East Asian Buddhism 14. Honolulu: University of Hawaii Press, 2002. xiii+499 pp. $47.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/382331"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n']}
all_content.keys()
dict_keys(['journal-article-10.1086_421828.xml', 'journal-article-10.1086_381213.xml', 'journal-article-10.1086_424411.xml', 'journal-article-10.1086_427313.xml', 'journal-article-10.1086_382331.xml'])
def load_directory(path): return(dictionary)
load_directory
<function __main__.load_directory(path)>
#content_directory = load_directory(textfiles_directory)
#df = pdxi.read_xml(all_content, ['article']) #df
def load_directory(path): return(all_content)
load_directory
<function __main__.load_directory(path)>
content_directory = load_directory(textfiles_directory)
content_directory # I'm pretty sure all I did was create content_directory = all_content, so this is probably a useless variable/function(?)
{'journal-article-10.1086_421828.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172345</article-id>\n', ' <article-id pub-id-type="doi">10.1086/421828</article-id>\n', ' <article-id pub-id-type="msid">JR84239</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>David\xa0W.</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Chappell</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Soka University of America.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2004</year>\n', ' <string-date>April 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2004.84.issue-2</issue-id>\n', ' <fpage>331</fpage>\n', ' <lpage>332</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Blum,\xa0</surname>\n', ' <given-names>Mark</given-names>\n', ' </string-name>. <source>The Origins and Development of Pure Land Buddhism: A Study and Translation of Gyōnen’s “Jōdo Hōmon Genrushō.”</source> New York: Oxford University Press, 2002. xxi+470 pp. $55.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/421828"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_381213.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="review-article"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172304</article-id>\n', ' <article-id pub-id-type="doi">10.1086/381213</article-id>\n', ' <article-id pub-id-type="msid">JR840204</article-id>\n', '\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Review Article</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title>Enlightened Genealogies of Religion: Edward Gibbon and His Contemporaries*</article-title>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>W.\xa0Clark</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Gilpin</surname>\n', ' </string-name>\n', ' </contrib>\n', ' <aff id="aff_1">University of Chicago</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2004</year>\n', ' <string-date>April 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2004.84.issue-2</issue-id>\n', ' <fpage>256</fpage>\n', ' <lpage>263</lpage>\n', ' <product xlink:type="simple">*\u2009<string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>The Enlightenments of Edward Gibbon, 1737–1764</source>, vol. 1 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xv+339 pp.; <string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>Narratives of Civil Government</source>, vol. 2 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xiv+422 pp., $90.00 (cloth; vols. 1 and 2); <string-name name-style="western">\n', ' <given-names>J.\xa0G.\xa0A.\xa0</given-names>\n', ' <surname>Pocock</surname>\n', ' </string-name>, <source>The First Decline and Fall</source>, vol. 3 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 2003), xiii+527 pp., $60.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>© 2004 by The University of Chicago. All rights reserved.</copyright-statement>\n', ' <copyright-year>2004</copyright-year>\n', ' <copyright-holder>The University of Chicago</copyright-holder>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/381213"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_424411.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3591416</article-id>\n', ' <article-id pub-id-type="doi">10.1086/424411</article-id>\n', ' <article-id pub-id-type="msid">JR84316</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Nicholas</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Koss</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Fu Jen Catholic University.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>07</month>\n', ' <year>2004</year>\n', ' <string-date>July 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>3</issue>\n', ' <issue-id>jr.2004.84.issue-3</issue-id>\n', ' <fpage>472</fpage>\n', ' <lpage>474</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Lozada,\xa0</surname>\n', ' <given-names>Eriberto\xa0P.,\xa0Jr.</given-names>\n', ' </string-name> \n', ' <source>God Aboveground: Catholic Church, Postsocialist State, and Transnational Processes in a Chinese Village</source>. Stanford, Calif.: Stanford University Press, 2001. xii+250 pp. $45.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/424411"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n'], 'journal-article-10.1086_427313.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="research-article"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3591443</article-id>\n', ' <article-id pub-id-type="doi">10.1086/427313</article-id>\n', ' <article-id pub-id-type="msid">JR850201</article-id>\n', '\n', ' <title-group>\n', ' <article-title>A Correlational Model of Comparative Theology</article-title>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Hugh</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Nicholson</surname>\n', ' </string-name>\n', ' <x xml:space="preserve"> </x>\n', ' </contrib>\n', ' <aff id="aff_1">Coe College</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>04</month>\n', ' <year>2005</year>\n', ' <string-date>April 2005</string-date>\n', ' </pub-date>\n', ' <volume>85</volume>\n', ' <issue>2</issue>\n', ' <issue-id>jr.2005.85.issue-2</issue-id>\n', ' <fpage>191</fpage>\n', ' <lpage>213</lpage>\n', ' <permissions>\n', ' <copyright-statement>© 2005 by The University of Chicago. All rights reserved.</copyright-statement>\n', ' <copyright-year>2005</copyright-year>\n', ' <copyright-holder>The University of Chicago</copyright-holder>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/427313"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', ' <back>\n', '</back>\n', '</article>\n', '\n'], 'journal-article-10.1086_382331.xml': ['<?xml version="1.0" encoding="UTF-8"?>\n', '\n', '<article xmlns:xlink="http://www.w3.org/1999/xlink"\n', ' xmlns:mml="http://www.w3.org/1998/Math/MathML"\n', ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n', ' article-type="book-review"\n', ' dtd-version="1.0"\n', ' xml:lang="en">\n', ' <front>\n', ' <journal-meta>\n', ' <journal-id journal-id-type="publisher-id">jreligion</journal-id>\n', ' <journal-title-group>\n', ' <journal-title>The Journal of Religion</journal-title>\n', ' </journal-title-group>\n', '\n', '\n', ' <publisher>\n', ' <publisher-name>The University of Chicago Press</publisher-name>\n', ' </publisher>\n', ' <issn pub-type="ppub">00224189</issn>\n', ' <issn pub-type="epub">15496538</issn>\n', ' <custom-meta-group/>\n', ' </journal-meta>\n', ' <article-meta>\n', ' <article-id pub-id-type="jstor-stable">3172402</article-id>\n', ' <article-id pub-id-type="doi">10.1086/382331</article-id>\n', ' <article-id pub-id-type="msid">JR84142</article-id>\n', ' <article-categories>\n', ' <subj-group subj-group-type="heading">\n', ' <subject>Book Review</subject>\n', ' </subj-group>\n', ' </article-categories>\n', ' <title-group>\n', ' <article-title/>\n', '\n', '\n', '\n', '\n', ' </title-group>\n', ' <contrib-group>\n', ' <contrib contrib-type="author" xlink:type="simple">\n', ' <string-name>\n', ' <given-names>Robert\xa0Ford</given-names>\n', ' <x xml:space="preserve">\xa0</x>\n', ' <surname>Campany</surname>\n', ' </string-name>\n', ' <x xml:space="preserve">, </x>\n', ' </contrib>\n', ' <aff id="aff_1">Indiana University.</aff>\n', ' </contrib-group>\n', '\n', '\n', '\n', ' <pub-date pub-type="ppub">\n', ' <month>01</month>\n', ' <year>2004</year>\n', ' <string-date>January 2004</string-date>\n', ' </pub-date>\n', ' <volume>84</volume>\n', ' <issue>1</issue>\n', ' <issue-id>jr.2004.84.issue-1</issue-id>\n', ' <fpage>153</fpage>\n', ' <lpage>154</lpage>\n', ' <product xlink:type="simple">\n', ' <string-name name-style="western">\n', ' <surname>Sharf,\xa0</surname>\n', ' <given-names>Robert\xa0H.</given-names>\n', ' </string-name> \n', ' <source>Coming to Terms with Chinese Buddhism: A Reading of the Treasure Store Treatise</source>. Kuroda Institute, Studies in East Asian Buddhism 14. Honolulu: University of Hawaii Press, 2002. xiii+499 pp. $47.00 (cloth).</product>\n', ' <permissions>\n', ' <copyright-statement>Permission to reprint a book review printed in this section may be obtained only from the author.</copyright-statement>\n', ' </permissions>\n', ' <self-uri xlink:href="https://www.jstor.org/stable/10.1086/382331"/>\n', '\n', ' <custom-meta-group>\n', ' <custom-meta>\n', ' <meta-name>lang</meta-name>\n', ' <meta-value>en</meta-value>\n', ' </custom-meta>\n', ' </custom-meta-group>\n', ' </article-meta>\n', ' </front>\n', '\n', '</article>\n', '\n']}
df = pdxi.read_xml(content_directory, ['article', 'front', 'journal-meta']) df
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-22-3a9c615b1a43> in <module> ----> 1 df = pdxi.read_xml(content_directory, ['article', 'front','journal-meta']) 2 3 df ~/.local/lib/python3.7/site-packages/pandas_read_xml.py in read_xml(path_or_xml, root_key_list, root_is_rows, transpose, encoding) 23 24 def read_xml(path_or_xml: str, root_key_list: Optional[List[str]]=None, root_is_rows: bool=True, transpose: bool=False, encoding: Optional[str]=None) -> pd.DataFrame: ---> 25 if urllib.parse.urlparse(path_or_xml).scheme in ['http', 'https']: 26 if path_or_xml.endswith('.zip'): 27 with get_zip_file_from_url(path_or_xml) as zf: /ext/anaconda2020.02/lib/python3.7/urllib/parse.py in urlparse(url, scheme, allow_fragments) 365 Note that we don't break the components up in smaller bits 366 (e.g. netloc is a single string) and we don't expand % escapes.""" --> 367 url, scheme, _coerce_result = _coerce_args(url, scheme) 368 splitresult = urlsplit(url, scheme, allow_fragments) 369 scheme, netloc, url, query, fragment = splitresult /ext/anaconda2020.02/lib/python3.7/urllib/parse.py in _coerce_args(*args) 121 if str_input: 122 return args + (_noop,) --> 123 return _decode_args(args) + (_encode_result,) 124 125 # Result objects are more helpful than simple tuples /ext/anaconda2020.02/lib/python3.7/urllib/parse.py in _decode_args(args, encoding, errors) 105 def _decode_args(args, encoding=_implicit_encoding, 106 errors=_implicit_errors): --> 107 return tuple(x.decode(encoding, errors) if x else '' for x in args) 108 109 def _coerce_args(*args): /ext/anaconda2020.02/lib/python3.7/urllib/parse.py in <genexpr>(.0) 105 def _decode_args(args, encoding=_implicit_encoding, 106 errors=_implicit_errors): --> 107 return tuple(x.decode(encoding, errors) if x else '' for x in args) 108 109 def _coerce_args(*args): AttributeError: 'dict' object has no attribute 'decode'
#I thought mabye I could just try and see if I could get this to work by bring the xml format directly into the kernel but nope, I keep getting an error test_xml = "<?xml version="1.0" encoding="UTF-8"?> <article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="review-article" dtd-version="1.0" xml:lang="en"> <front> <journal-meta> <journal-id journal-id-type="publisher-id">jreligion</journal-id> <journal-title-group> <journal-title>The Journal of Religion</journal-title> </journal-title-group> <publisher> <publisher-name>The University of Chicago Press</publisher-name> </publisher> <issn pub-type="ppub">00224189</issn> <issn pub-type="epub">15496538</issn> <custom-meta-group/> </journal-meta> <article-meta> <article-id pub-id-type="jstor-stable">3172304</article-id> <article-id pub-id-type="doi">10.1086/381213</article-id> <article-id pub-id-type="msid">JR840204</article-id> <article-categories> <subj-group subj-group-type="heading"> <subject>Review Article</subject> </subj-group> </article-categories> <title-group> <article-title>Enlightened Genealogies of Religion: Edward Gibbon and His Contemporaries*</article-title> </title-group> <contrib-group> <contrib contrib-type="author" xlink:type="simple"> <string-name> <given-names>W. Clark</given-names> <x xml:space="preserve"> </x> <surname>Gilpin</surname> </string-name> </contrib> <aff id="aff_1">University of Chicago</aff> </contrib-group> <pub-date pub-type="ppub"> <month>04</month> <year>2004</year> <string-date>April 2004</string-date> </pub-date> <volume>84</volume> <issue>2</issue> <issue-id>jr.2004.84.issue-2</issue-id> <fpage>256</fpage> <lpage>263</lpage> <product xlink:type="simple">*<string-name name-style="western"> <given-names>J. G. A. </given-names> <surname>Pocock</surname> </string-name>, <source>The Enlightenments of Edward Gibbon, 1737–1764</source>, vol. 1 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xv+339 pp.; <string-name name-style="western"> <given-names>J. G. A. </given-names> <surname>Pocock</surname> </string-name>, <source>Narratives of Civil Government</source>, vol. 2 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 1999), xiv+422 pp., $90.00 (cloth; vols. 1 and 2); <string-name name-style="western"> <given-names>J. G. A. </given-names> <surname>Pocock</surname> </string-name>, <source>The First Decline and Fall</source>, vol. 3 of <source>Barbarism and Religion</source> (Cambridge: Cambridge University Press, 2003), xiii+527 pp., $60.00 (cloth).</product> <permissions> <copyright-statement>© 2004 by The University of Chicago. All rights reserved.</copyright-statement> <copyright-year>2004</copyright-year> <copyright-holder>The University of Chicago</copyright-holder> </permissions> <self-uri xlink:href="https://www.jstor.org/stable/10.1086/381213"/> <custom-meta-group> <custom-meta> <meta-name>lang</meta-name> <meta-value>en</meta-value> </custom-meta> </custom-meta-group> </article-meta> </front> </article> "
df = df.pipe(flatten) df
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-20-ee60437d25eb> in <module> ----> 1 df = df.pipe(flatten) 2 3 df NameError: name 'df' is not defined