CoCalc -- 4850.sagews

All published worksheets from http://sagenb.org

Project: sagenb.org published worksheets

Views: ¹⁶⁸⁸²²
Image: ubuntu2004

"""we want to investigate how the algorithm that we developed in the first marksheet behaves on other networks

the following code create a network with 20 nodes"""

import networkx
G=DiGraph(networkx.scale_free_graph(20, alpha=0.8, beta=0.1, gamma=0.1))

#remember to press evaluate below each box (though, again, there is no output here)

"""let me say a little more about the network just created; note that the code contains four numbers --- '(20, alpha=0.8, beta=0.1, gamma=0.1)'

the number 20 is the number of nodes of the network created -- you can replace 20 by any other positive integer

the network is created randomly in a series of steps; at each step one of three things can happen:
    --first, a new node can be created and joined by a link to an existing node; the node to link to is chosen at random but a node that already has many other nodes linking to is more likely to be chosen,
    --second, a link between two existing nodes can be created,
    --third, a new node can be created and joined by a link to an existing node; the node to link to is chosen at random but a node that already has many outgoing links is more likely to be chosen,

which of the three things you decide to do is determined by the values alpha, beta and gamma -- alpha is the probability you do the first, beta the second, and gamma the third;

so in our example above we are most likely to do the first; you can change these values -- the only requirement is that they add up to 1

networks created in this way are thought to be a (naive) model for the web"""

"""now let's look at the network created, and meet some of the built-in functions we can use to refer to it; first let's list the nodes (or vertices)"""

G.vertices()

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

"""and the links (also called edges)"""

G.edges(labels=False)

[(0, 1), (1, 2), (2, 0), (3, 0), (4, 0), (5, 0), (5, 8), (6, 0), (7, 2), (9, 8), (10, 0), (11, 0), (11, 1), (12, 2), (13, 2), (14, 2), (15, 0), (16, 4), (17, 10), (18, 0), (19, 0)]

"""to find out which other nodes a node links to we use neighbors_out"""

for node in G.vertices():
    print node, ": ", G.neighbors_out(node)

:  [1]
:  [2]
:  [0]
:  [0]
:  [0]
:  [0, 8]
:  [0]
:  [2]
:  []
:  [8]
:  [0]
:  [0, 1]
:  [2]
:  [2]
:  [2]
:  [0]
:  [4]
:  [10]
:  [0]
:  [0]

"""and neighbors_in tells us which nodes link to it"""

for node in G.vertices():
    print node, ": ", G.neighbors_in(node)

:  [2, 3, 4, 5, 6, 10, 11, 15, 18, 19]
:  [0, 11]
:  [1, 12, 13, 14, 7]
:  []
:  [16]
:  []
:  []
:  []
:  [9, 5]
:  []
:  [17]
:  []
:  []
:  []
:  []
:  []
:  []
:  []
:  []
:  []

"""but of course the easiest way to see these things is to draw the network

this time we don't specify the positions for the nodes and a standard 'layout' algorithm is applied; the algorithm has a degree of randomness so if you press evaluate a second time you might get a different drawing"""

G.plot(edge_color="purple", edge_style="solid", vertex_labels=True, graph_border=True, vertex_size=300).show(figsize=[8,5])

"""now let's use the same procedure as before to give the nodes a score"""

def update_score(score):
    new_score={}
    for node in G.vertices():
        new_score[node] = 0.4
        for page in G.neighbors_in(node):
            new_score[node] = new_score[node] + 0.6 * score[page] / len(G.neighbors_out(page))
    return new_score
    

score={} 
for node in G.vertices(): 
    score[node] = 1

for i in range(50):
    score = update_score(score)

for node in G.vertices():
    print node, ": ", round(score[node], 2)

:  8.06
:  6.32
:  11.98
:  0.4
:  0.4
:  0.4
:  1.91
:  1.46
:  0.88
:  0.64
:  0.52
:  0.4
:  0.4
:  0.53
:  0.4
:  0.76
:  0.4
:  2.36
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  4.47
:  0.4
:  0.4
:  0.4
:  1.7
:  0.4
:  0.4
:  0.4
:  0.52
:  0.4
:  0.4
:  1.36
:  0.4
:  0.4
:  0.4
:  0.4
:  0.56
:  0.4
:  0.64
:  0.4
:  0.4
:  0.4
:  0.91
:  0.4
:  0.76
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.64
:  0.4
:  0.4
:  0.4
:  1.0
:  0.4
:  0.4
:  0.4
:  0.52
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.52
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.52
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.4
:  0.48
:  0.4
:  0.4
:  0.48
:  0.4
:  0.53
:  0.4

"""we would like to see the values of score in a drawing of the network, but rather than labelling the nodes we will colour them

nodes with a high score will be coloured red; the next most important will be orange, then yellow, then white

we create a dictionary of colours"""

colours={    "red": [],
             "orange": [],
             "yellow": [],
             "white": []}

#and then determine the colour of each node by looking at its score
for node in G.vertices():
    #if the score is more than 2 we colour it red
    if score[node] > 2: colours["red"] += [node]
    #if it is between 1 and 2 we make it orange
    elif score[node] > 1 and score[node] <= 2: colours["orange"] = colours["orange"] + [node]
    
    elif score[node] <= 1 and score[node] > 0.5: colours["yellow"] = colours["yellow"] + [node]
    else: colours["white"] += [node]

#let's see which nodes are which colour (the syntax here makes the printing a little neater)
for colour in colours:
    print "{0:7} {1} {2}".format(colour, ":", colours[colour])

orange  : [6, 7, 29, 36]
white   : [3, 4, 5, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 34, 35, 37, 38, 39, 40, 42, 44, 45, 46, 48, 50, 51, 52, 53, 54, 55, 56, 58, 59, 60, 62, 63, 64, 66, 67, 68, 69, 70, 71, 73, 74, 75, 76, 77, 78, 79, 80, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 99]
yellow  : [8, 9, 10, 13, 15, 33, 41, 43, 47, 49, 57, 61, 65, 72, 81, 98]
red     : [0, 1, 2, 17, 25]

"""now let's draw it"""

G.plot(edge_color="gray", edge_style="solid", vertex_labels=False, graph_border=True, vertex_colors=colours, vertex_size=40).show()

"""let's do everything at once, with more nodes"""

G=DiGraph(networkx.scale_free_graph(100, alpha=0.8, beta=0.1, gamma=0.1))

score={} 
for node in G.vertices(): 
    score[node] = 1
for i in range(50):
    score = update_score(score)
    
colours={    "red": [],
             "orange": [],
             "yellow": [],
             "white": []}
             
for node in G.vertices():
    if score[node] > 2: colours["red"] += [node]
    elif score[node] > 1 and score[node] <= 2: colours["orange"] = colours["orange"] + [node]
    elif score[node] <= 1 and score[node] > 0.5: colours["yellow"] = colours["yellow"] + [node]
    else: colours["white"] += [node]


G.plot(edge_color="gray", edge_style="solid", vertex_labels=False, graph_border=True, vertex_colors=colours, vertex_size=40).show()

"""to investigate further, go back to the previous box and alter the number of nodes or change the values of alpha, beta and gamma

refine the definition of the colours dictionary to improve the visualisation

or investigate whether it matters how many times update_score is called

a much more ambitious project (which would require considerable research) would be to replace the definition of the network with a more precise model for the web"""

Product

Resources

Company