Tagged: ruby Toggle Comment Threads | Keyboard Shortcuts

  • mcphersonz 7:22 pm on March 23, 2008 Permalink | Reply
    Tags: ActiveRecord, API, parse, PubMed, ruby   

    Parse PubMed database using their API + ruby + ActiveRecord 

    In case any of you need to parse the PubMed database, here’s what I came up with as a first try. Run via a ruby script. You will need ruby, ActiveRecord (if you have rails setup, you should be good), and a few libraries — use gem install [x] for the libraries that you don’t have — if you have rails installed, you probably only need to get “xmlsimple”.

    gem install xmlsimple

    I used 3 tables in my mysql database: articles, authors, and a join table articles_authors

    Here’s the schema:

    DROP TABLE IF EXISTS `pubmed`.`articles`;
    CREATE TABLE `pubmed`.`articles` (
    `id` int(11) NOT NULL auto_increment,
    `pubmed_id` int(11) NOT NULL,
    `source` varchar(50) character set latin1 default NULL,
    `title` varchar(255) character set latin1 default NULL,
    `full_journal_name` varchar(255) character set latin1 default NULL,
    `author_list` varchar(255) character set latin1 default NULL,
    `pub_date` date default NULL,
    PRIMARY KEY (`id`)

    DROP TABLE IF EXISTS `pubmed`.`articles_authors`;
    CREATE TABLE `pubmed`.`articles_authors` (
    `article_id` int(11) NOT NULL,
    `author_id` int(11) NOT NULL

    DROP TABLE IF EXISTS `pubmed`.`authors`;
    CREATE TABLE `pubmed`.`authors` (
    `id` int(11) NOT NULL auto_increment,
    `name` varchar(255) default NULL,
    `pubmed_id` int(11) default NULL,
    PRIMARY KEY (`id`)

    Here’s the ruby script:

    # #!/usr/local/bin/ruby -w
    # Require Files & Libs
    require 'net/http'
    require 'rubygems'
    require 'active_record'
    require 'xmlsimple'
    # App Configuration Settings
    empty_tables = true
    # Search Settings
    searchRelDate = 7
    searchLimit = 100000
    batch_size = 100
    searchUrlBase = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=#{searchLimit}&"
    summaryUrlBase = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=xml&id="
    # Database Settings:
    db_host    = "localhost"
    db_user    = "root"
    db_pass    = "password"
    db_schema  = "pubmed"
    # Header Output:
    puts  "#"
    puts  "# PubMed parsing tool."
    dash = "========================================================================="
    # Connect to DB using Active Record:
    ActiveRecord::Base.establish_connection(:adapter=>"mysql", :database=>db_schema, :username=>db_user, :password=>db_pass, :host=>db_host)
    # Define SearchResults model:
    class Article < ActiveRecord::Base
      set_table_name "articles"
    class Author < ActiveRecord::Base
      set_table_name "authors"
    class ArticleAuthor  0
              property['Item'].each{|author| article[:authors] < "#{article_summary['Id']}",
            :source => article['Source'],
            :title => article['Title'],
            :full_journal_name => article['FullJournalName'],
            :pub_date => article['PubDate'],
            :author_list => article[:authors].join(', ')
          # Add Authors to database:
            # See if the author exists already:
            exists = Author.find_all_by_name("#{author}")    
            if exists.length == 0
              new_author = Author.create(
                :name => author,
                :pubmed_id => "#{article_summary['Id']}"
              new_author = exists[0]
            # Add to join table:
              :author_id =>  new_author.id,
              :article_id =>  new_article.id
      batch_current += 1

    From there, you can run queries like this:

    SELECT author_id, authors.name, count(author_id)
    FROM articles_authors, authors
    WHERE articles_authors.author_id = authors.id
    GROUP BY author_id
    ORDER BY count(author_id) desc

    SELECT articles.pubmed_id, authors.name, articles.title, articles.source, articles.pub_date, articles.id
    FROM articles_authors, authors, articles
    WHERE articles_authors.author_id = authors.id and articles_authors.article_id = articles.id
    and authors.name = “Li Y”
    GROUP BY articles.pubmed_id

    • Anonymous 3:25 pm on November 23, 2011 Permalink | Reply

      How do I get to the VIP section? I would like to talk about the 7 pm news if possible

    • Phoebe 2:49 am on February 25, 2013 Permalink | Reply

      Hi there, i read your blog occasionally and i own a similar one and i was just

      wondering if you get a lot of spam responses? If so how do
      you prevent it, any plugin or anything you can
      recommend? I get so much lately it’s driving me crazy so any assistance is very much appreciated.

    • http://www.fameb.ufba.br 7:40 pm on April 26, 2013 Permalink | Reply

      I’m still learning from you, but I’m trying to reach

      my goals. I certainly liked reading everything

      that is written on your website.Keep the aarticles coming.
      I liked it!

    • isabella 6:29 am on May 6, 2014 Permalink | Reply

      I’m planning to create my own blog, and a question comes up to my mind..

  • mcphersonz 4:37 am on March 18, 2008 Permalink | Reply
    Tags: ActiveScaffold, deployment, rails, ruby, security   

    AjaxScasffold plugin & Deployment Problem (and solution!) 

    ActiveScaffold is a great plugin for rails that I have found myself using over and over again. If you have not heard of it, check it out — it’s bomb.

    I discovered a problem with deploying the application to a QA or Production type environment — AjaxScaffold copies it’s files into the application’s /public directory when the server is restarted — and it needs r/w access to a /public directory to do so. Not too secure….

    I friend of mine found great article that outlines this problem & the solution here.

    The fix essentially involves skipping the creation of files when the app is running in production mode. The file patched is init.rb or install.rb — the later file is patched instead for edge versions of ActiveScaffold.

    Here’s the mod:

    # Include hook code here
    require 'ajax_scaffold_plugin'
    ActionController::Base.send(:include, AjaxScaffold)
    ActionView::Base.send(:include, AjaxScaffold::Helper)
    # copy all the files over to the main rails app, want to avoid .svn
    # Do not copy in production mode!!! And catch errors and log them
    if ENV['RAILS_ENV'] != 'production'
        source = File.join(directory,'/app/views/ajax_scaffold')
        dest = File.join(RAILS_ROOT, '/app/views/ajax_scaffold')
        FileUtils.mkdir(dest) unless File.exist?(dest)
        FileUtils.cp_r(Dir.glob(source+'/*.*'), dest)
        source = File.join(directory,'/public')
        dest = RAILS_ROOT + '/public'
        FileUtils.cp_r(Dir.glob(source+'/*.*'), dest)
        source = File.join(directory,'/public/stylesheets')
        dest = RAILS_ROOT + '/public/stylesheets'
        FileUtils.cp_r(Dir.glob(source+'/*.*'), dest)
        source = File.join(directory,'/public/javascripts')
        dest = RAILS_ROOT + '/public/javascripts'
        FileUtils.cp_r(Dir.glob(source+'/*.*'), dest)
        source = File.join(directory,'/public/images')
        dest = RAILS_ROOT + '/public/images'
        FileUtils.cp_r(Dir.glob(source+'/*.*'), dest)
      rescue Exception => ex
        RAILS_DEFAULT_LOGGER.error "AjaxScaffold error while copying the AjaxScaffold files to the application directory. (#{ex.t_s})"

    Again, thanks to devblog.famundo.com for this solution!

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc